Abstract
The analysis of lifetime value of property insurance company customers can not only help the company to allocate customer relationship management resources reasonably, save the management cost, but also help the company to identify risk timely and effectively, so that the risk control and management can be implemented. In this paper, based on RFM model, adding claim index of evaluating clients’ risk is to evaluate the lifetime value of property insurance customers quantitatively. At the same time, in view of massive uncertainties in practical decision-making, with hesitant fuzzy theory, the attributes will be weighted by hesitant fuzzy entropy. Secondly, the similarity measure theory based on hesitant fuzzy set is used to do cluster analysis and four customer homogeneous groups are obtained. Finally, calculate the lifetime value score of these four groups based on a quantitative method and analyze their characteristics from the quantitative perspective.
Keywords
Introduction
With the development of economy and the enhancement of people’s insurance consciousness, the property insurance products are becoming more and more important in people’s insurance purchase [1]. Property insurance companies will face increasing amount of customer information, and increasingly fierce competitions in the market [2]. Therefore, good customer relationship management(CRM) and customer lifetime value(CLV) assessment can help those companies to identify the more valuable customers, implement classified management strategies, and develop different marketing strategies to save cost and increase revenue [3].
O.I. Turkel and A. Dixit suggested that CLV can assess the value of a customer during his lifetime in company [4]. J.R. Segarra-Moliner thought that companies can adopt different customer management strategies and input unequal resources according to different customer value [5]. A. Hiziroglu and S. Sengul suggested that an appropriate and reasonable calculation of CLV can help companies quickly and effectively identify different customer groups, and then take a different marketing strategy [6]. While CLV calculation for homogeneous customer groups is simpler than for individual customers as M.S. Kahreh suggested, company can classify customers into different groups firstly [7]. The existing classification algorithms are mostly hard classification technologies. That is, each customer belongs to a certain customer group. However, customer classification often has ambiguity or uncertainty in reality(one person with the age of 20, we can’t set him in the teens or the youth clearly), and fuzzy sets can describe these uncertainties by membership degree [8–12]. But, the determination of membership degree is still uncertain. In order to solve this problem, V. Torra and S. Narukawa proposed the concept of hesitant fuzzy sets, which is an expansion of fuzzy sets, and it allow the membership of the element have several different values [13, 14].
Y.H. Hu and T.W. Yeh pointed out that the RFM model is often used to assess the CLV [15]. RFM model include three factors: Recency, Frequency and Monetary. Many Chinese and foreign scholars have studied the scores of these three indicators, and pointed out that in different areas, these three indicators should be given different weights and can add a new indicator to RFM. S.M. Rezaeinia et al. assigned the RFM model weights by the Analytic Hierarchy Process(AHP) and cluster bank customers by K-Means [16]. M. Saeedpoor et al. used fuzzy TOPSIS decision-making method to sort the value of life insurance companies in Iran. In their study, the Fuzzy Analytic Hierarchy Process(FAHP) was used to weight the evaluation indicators [17]. F. Safari et al. used the Fuzzy C-Means algorithm to classify customers and calculate the CLV of each customer group based on the FAHP and RFM model [18]. Chinese scholars Y. Sun and B.L. Ma proposed a customer value identification method considering customer loyalty based on RFM model and clustering technology [19]. M. Zhao, J.Y. Qi extended the RFM model to the RFMP model and used K-Means algorithm to analyze the customer value [20]. K.N. Fang et al. introduced nonparametric random forest regression into the profit contribution of insurance customers based on the RFM model [21]. And X.J. Shi et al. extended the RFM model to the RFMA model to study the effectiveness of online customers [22]. In the insurance industry, Chinese scholars such as P.R. Li used the RFM model to identify the core customers of insurance companies [23]. While scholars S. Singh added two indicators in the RFM model to measure customer risk, and proposed FARFM model which is a more comprehensive model to analyze characteristics of insurance customers [24].
Studies about CLV evaluation based on RFM model, customer classification and hesitant fuzzy theories have been paid more attention to. But in above researches, there are few scholars combine the CLV evaluation or customer classification with the hesitant fuzzy theories. In fact, there are many uncertainties in the process of customer classification and weighting CLV evaluation indicators. So, RFMC model is proposed by adding the customer risk measurement index(C, Claim) to RFM model in this paper that considers the characteristics of property insurance industry and information uncertainty in practice. Meanwhile, the weights of RFMC are determined by hesitating fuzzy entropy, and four homogeneous customer groups are obtained by Network clustering based on hesitant fuzzy sets. Finally, characteristics of customer groups are analyzed and the management strategies are given on the basis of calculating the CLV score of each customer groups.
Reach methodology
Customer lifetime value and RFM model
In the marketing competition and customer management, the company should adopt a customer-centric approach to achieve good and effective customer relationship management. The CLV model is a customer oriented approach that can help company improve CRM level. Customer lifetime value(CLV) refers to the total interests that customers can bring during their whole lifetime from the entering moment to the leaving moment. And it includes historical value, current value and potential value.
In CLV evaluation process, the company must consider which indicators should be taken into account. Some scholars have pointed out that CLV as a concept in CRM, the relevant indicators of marketing evaluation can be used to customer value evaluation. RFM is a behavior-based model whose meaning is to analyze the behaviors the customer engaged in and to make predictions based on these behaviors.
RFM model contains three indicators that are Recency(R), Frequency(F) and Monetary(M). R refers to a customer’s recent consumption behaviors, including the last consumption’s time, location, product and other related information. F refers to the customer’s total consumption times. The more frequent the consumption the more loyal the customer is. M is the total amount of money spent by a customer to buy a product or service over a certain period of time. The higher the amount of money, the higher the interests the customer may bring to the company.
It can be seen that the RFM model can dynamically and comprehensively reflect the characteristics of a customer’s consumption behavior. Moreover, if we can get customers’ long-term consumption data, to fully understand the customers, the value or even life-long value of customers will be measured more accurate. Therefore, the RFM model can be applied to the calculation of CLV of the property insurance customers, and can expand the RFM model by adding indicators to measure the risk to make it more in line with the characteristics of property insurance industry.
Hesitant fuzzy set and hesitant fuzzy entropy
Hesitant fuzzy set theory is one of the methods to deal with uncertainty problems. Its main idea is to transform the problem of uncertainty into the membership degree by membership function. Fuzzy sets are widely used in decision-making, but the determination of membership is also uncertain because when experts make decisions they should rely on their experience, and the score they give is not a certain value. In order to describe these hesitant when men making decisions, the concept of hesitant fuzzy sets is proposed by Torra as Definition 1.
Let l(h
H
(x)) represent the number of values in the fuzzy element h
H
(x). Arrange these values in ascending order and let
In the CLV evaluation of property insurance customers, the indexes R, F, M and C are different in the model and need to be measured with different weights. Entropy weight method is an objective weighting method. It uses the entropy of information to calculate the entropy weight of each index according to the variation degree of each index, and then corrects the weight of each index through the entropy weight, and then obtains the objective weight of each index. The concept of fuzzy entropy is proposed combined with hesitant fuzzy set theory [25].
Let α = { ασ(1), ασ(2), …, ασ(l α ) } be a hesitation fuzzy element, l α be the number of membership degrees in the hesitation fuzzy element. Then index entropy of α is defined as:
In which σ(i) is the ith membership after arrangement from small to large.
Steps of hesitant fuzzy multiple attribute weights determination based on entropy weight:
Where
Customer classification is one of the effective methods to manage different types of customers with different preferences. The heterogeneous customer groups are divided into cohorts based on common characteristics and attributes. Different marketing strategies can be developed according to the common attributes of different customers’ groups and implement different customer management strategies to save companies’ costs.
CLV evaluation of homogeneous customer groups is simpler than of a single customer, therefore we firstly divide the customers into several homogeneous groups, and then evaluate the customer lifetime value of each group based on the hesitant fuzzy sets. Further, we analyze the attribute characteristics of customers within each group to help companies better predict a new customer’s category. The clustering method based on hesitant fuzzy symmetric entropy is adopted.
The concept of relative entropy is given by Definition 2 [26].
Constructing symmetric interaction entropy:
Let X = { x1, x2, …, x
m
}, h
A
(x) and h
B
(x) be two different hesitant fuzzy sets defined on X, R(h
A
(X), h
B
(X)) and R(h
B
(X), h
A
(X)) respectively represent the relative entropy of h
A
(X) and h
B
(X), then the symmetric interaction entropy of h
A
(X) and h
B
(X) is given by:
The similarity formula(5) under the hesitant fuzzy environment is given based on the symmetric interaction entropy as:
Where
In practical problems, x
i
∈ X should be given different weights because their different important position. Let w
i
be the weight of x
i
(i = 1, 2, …, m), and
Where,
General steps of clustering in hesitant fuzzy environment:
RFM model with three indicators can measure the profitability of property insurance customer effectively, but it is lack of direct indicators measuring risks that customers bring. Especially in the insurance industry, it is very difficult to fully measure customer’s value by RFM model. For instance, if a customer with higher Monetary, Frequency, and closer Recency, have higher claim rate, he/she will bring higher risk for the company, and the customer’s value should be lower correspondingly. Therefore, this paper combines the characteristics of property insurance business with RFM model by adding one indicator to measure customer’s claim risk. This index is denoted as C(Payment of Claims), which represents the total amount of claims incurred by a customer over a period of time. We call this new model as RFMC model.
After the CLV evaluation criteria are defined(R, F, M, C), the CLV valuation steps of property insurance customers are given as follows:
Let min A and max A be the maximum and minimum values of attributes respectively, and map the original values to the values in interval [new min, new max] using the standard formula:
Where, NR ci , NF ci , NM ci , NP ci represent the Recency, Frequency and Monetary of customer group ci respectively.
Data collection and preparation
Part information of the property insurance customer data sets are given in literature [27]. The total sample number is 3000 including 42 customers having made claim. The mean of claims number is 2, with the maximum 5 and the minimum 1. The products categories customers bought are the person account loss of funds insurance, the express postal package insurance, the family property insurance, the home treasure comprehensive protection scheme, the home comprehensive protection scheme, and the earthquake and property insurance. According to the customer purchase ratio of these types which are 0.49, 0.07, 0.11, 0.12, 0.13, 0.08 to calculate the number of customers to buy various types of insurance. The insurance period is one year, due renewal.
The data generation process of customer purchase numbers and the payment of each purchase, claim times and the amount of each claim, the latest purchase records and other relative data are:(1) The payment of each purchase. The purchase payment is related to the type customers bought. Only the postal package insurance fee is described here. The other insurance premiums are the minimum premiums for the corresponding insurance types published by China Ping An Insurance Company in 2016. In the express postal package insurance, the premium charged according to the identified value of goods, the premium of RMB 0∼500 yuan is 1 yuan, 501∼1000 yuan is 2 yuan, 1001∼5000 yuan is 5 yuan, 5001∼10000 yuan is 8 yuan and 10001∼20000 yuan is 12 yuan. Insurance period for the goods is from the shipment of goods to the goods arrived in the hands of customers.(2) Claim times and the amount of each claim. The claim times are randomly generated according to the characteristics of the existing data and the proportion of claims, the amount of claims determined by the insurance, different types of claims with different insurance requirements. For the types of property insurance studied in this article, the claim amount does not exceed the maximum value of the subject matter of insurance, where the data of claims are randomly generated for different insurance types below its maximum.(3) Last time of purchase. In the obtained data set, no relevant data was found. In this paper, we analyze the retention situation of customers within 3 years, randomly generate the value in interval(0∼3) to represent the distance of the last purchase from the present point. Keep a decimal number to represent the month. Different insurance companies have different provisions, other types of insurance as well.
In order to facilitate the numerical experiment and analysis, this paper adopts the generated random data, which is very easy to obtain in reality in the insurance company. In this paper, the data generated only be used to measure the applicability of the proposed model.
Hesitant fuzzy data sets
There are many ways to determine fuzzy membership, and most methods are based on expert experience. Each method has its own advantages or disadvantages, and there is no method being more effective. The hesitant fuzzy sets allow the existence of multiple fuzzy membership degrees, so it can integrate several membership functions to make the processing of uncertain information more flexible and effective.
In this paper, two fuzzy membership methods are adopted, and two fuzzy elements are obtained, that is, each indicator has two membership degrees.
max - min method of membership calculation
For a certain index value X
i
, if X
i
is positive correlation with the research object, then calculate the membership according to formula(10), the negative correlation according to the formula(11).
In our example, the F and M indexes are positively related to the evaluation of the lifetime value of the customers, so the membership degree is given by the formula(10), and R and C are negatively correlated, and given by the formula(11).
Intuitive method
The research and development about RFM model is relatively mature, the evaluation of three indicators of scholars is more perfect.
According to the consumption of insurance customers, as well as the experience of experts in history, the consumption of near degree R is less than 60 days for the low value, higher than 2 years for the high value, consumption frequency F a year less than 1 to 2 times for the low value higher than 2 times for the high value; consumption amount M is lower than 5,000 yuan for the low value, higher than 5,000 yuan for the high value.
According to above-mentioned experience and taking into account the expert given the score in a certain range of values due to subjective factors, there will be some randomness, therefore, using a random number to produce a set of membership in a numerical range divided before.
R: Since the values in this example are expressed as decimal values, 0 to 0.3 are set to random values between [0.8, 1), 2 to 3 within the random value of(0, 0.2], 0.3 to 1 within the random value of(0.5∼0.8), 1 to 2 within the random value of [0.2, 0.5]. F: The membership of value 0 is 0, the value of 0 to 2 are set to random values between(0, 0.5], more than 2 between [0.5, 1); M: The mean of M is 698, so we set the membership degree lower than 700 random values between(0, 0.5], higher than 700 between [0.5, 1). C: Using the median 16776 as the cut-off point, lower than this point the membership degree are set between(0.5, 1), higher between(0, 0.5].
Part hesitant fuzzy data sets we got According to above rules are shown in Table 1.
Hesitant fuzzy data sets
The ID in Table 1 represents the Customer Number. We use the form of the matrix D =(h ij ) 3884×4 to represent the data in Table 1, call h ij as the hesitant fuzzy element, h21 ={ 0.23, 0.10 } means that in indicator R, the degree customer 2 satisfied this indicator have two values as 0.23 and 0.10.
Cluster clustering can handle hesitant fuzzy information well, but it is not suitable for large data volume, and K-Means clustering has the characteristics of fast convergence and easy to implement in clustering analysis, and its clustering effect is quite good. So the K-Means clustering is used to preprocess the data without considering the fuzzy information, and compress the amount of data that the cluster can handle.
K-Means clustering results
K-Means clustering analysis was carried out for the data sets of claim and no claims claim respectively. Each data set was divided into four categories, and the eight categories of customers were obtained as shown in Table 2.
Clustering centers
Clustering centers
Then its corresponding hesitant fuzzy membership matrix is as shown in Table 3.
Hesitant fuzzy membership matrix of clustering centers
The Y
i
, i = 1, …, 8 represent eight cluster centers in Tables 2 and 3 which is called the scheme set in the following article. The hesitant fuzzy membership degrees are obtained by averaging the samples near the cluster center.
Clustering analysis under hesitant fuzzy information
According to the formula(6) of the hesitant fuzzy symmetric cross-entropy introduced in Section 3, the similarity degrees S w (Y i , Y l ) between the hesitant fuzzy sets Y i , i = 1, 2, …, 8 are calculated.
The similarity coefficient matrix P of 8 kinds of customers is obtained.
Clustering results
In theory or in practice, it is reasonable for companies to get four categories. From Table 4, the four classification results are {Y1}, {Y2, Y3, Y4, Y5, Y8}, {Y6}, {Y7}.
Comparison of clustering effects
In regard to comparing with K-Means clustering, the average accuracy rate of K-Means and hesitant fuzzy clustering is calculated separately [28].
In order to explain the meaning of the average accuracy rate, relevant definitions are described as follow.
Supposing the sample set is X = {x1, x2, …, x n }, each element x i in set X is assigned to a certain class C i .
Let C = { C1, C2, …, C m } represent the set of classes obtained. C is called clustering structure which is the cluster obtained by clustering algorithm. There are two clusters in this paper. One is the K-Means clustering cluster, and the other is hesitant fuzzy clustering cluster.
Let P = { P1, P2, …, P s } represent the artificially determined class structure, that is, the classification of artificial classification which is obtained by the random forest classification in this paper.
For two different data objects(x i , x j ) in X, which belongs to one cluster in C structure or P structure, there are following definitions:
Considering any two different data objects(x
i
, x
j
), in accordance with, whether it belongs to a same cluster, there are followings:
Both objections in two different structures belongs to a same cluster, let the number represent a; Both the objections in C structure belongs to a same cluster, let the number represent b, but it does not belong to a same cluster in P, let the number represent c; Both in different structures belong to different clusters, let the number represent d.
Define the index accuracy rate.
Positive Accuracy:
Negative Accuracy:
Average Accuracy:
The higher the average accuracy is, the better the clustering effect is.
P-structure of four classifications of prediction results in initial clustering center follows as Table 5. The results of the clustering are given by K-Means follows as Table 6. The structure of the clustering results of hesitant fussy is shown in Table 7. The high and low categories in the table indicate high profit and low risk, high -high indicate high profit and low risk, low yields, low-low low profit and low risk, low -high low profit and high risk.
P structure
C1 structure
C2 structure
According to the above evaluation, statistics the a, b, c, d, which correspond C1 structure and C2 structure, then calculate the Average Accuracy AA1 and AA2.
Statistics shows, a1 = 2, b1 = 12, c1 = 5, d1 = 9, a2 = 3, b2 = 7, c2 = 4, d2 = 13.
Accuracy of C1 structure:
Accuracy of C2 structure
Whether positive or negative accuracy, hesitant fuzzy clustering showed better results than K-Means clustering. In fact, from both the clustering results in Tables 6 and 7, it can also be seen that hesitant fuzzy clustering results are closer to classification results of random forest directly.
RFMC weights determination based on hesitant fuzzy entropy
CLV scores of all customers categories
The value of R, F, M, and C of four customer categories centers are obtained from the results shown in Table 4 in Section 4.3, and shown in Table 8.
Classification centers
Classification centers
In order to eliminate the dimensional effect, the F and M attributes which are positively related to the CLV score are normalized according to formula(10), and the R and P indexes inversely proportional are normalized according to (11). The compression interval is(new min, new max), where(–1, 1) is taken. The results of CLV calculated according to formula(9) are shown in Table 9.
CLV scores
Attribute analysis of various customers and classification management strategy recommendations
Customers in {Y1}: The CLV score is 0.0732, with the features of higher Frequency higher Monetary, higher Claim. Such customers are the coexistence of revenue and risk. And the company should carry out risk control measures in the basis of avoiding customer churning. Take a mitigating risk aversion system. Require clients to provide more contractual proof of personal loss in the event of a claim.
Customers in {Y2, Y3, Y4, Y5, Y8}: The CLV score is –0.5692. Such customers have features of higher Recency, lower Frequency, lower Monetary and lower Claim. They are low-yield and low-risk customers. For these customers, should adopt strategies to prevent customer churning, enhance enterprise service awareness and strengthen communications with customers, and create conditions to promote such customers create more values.
Customers in {Y6}: The CLV score is –0.8819. Compared to the previous category of customers, such customers with the higher Recency, lower Monetary, and higher Claim, are low-yield and high-risk customers. For such customers, appropriate risk aversion strategies should be taken to refuse insurance contracts. Improve the customer information rating system, and strictly regulate the reporting of such customers audit system.
Customers in {Y7}: Such customers with the most higher CLV score is 0.3452, and have the features of lower Recency, higher Frequency, higher Monetary, and lower Claim, belonging to high-yield low-risk customers. These customers are the main target group of corporate management, can bring higher profits for the enterprise, and easy to become the company’s fixed customers. For these customers can set up VIP accounts and personal business manager. Recommend new products and get customer feedback timely.
In the classification of property insurance customers and the evaluation of their lifetime value, there is a large number of uncertainty information. In this paper, hesitant fuzzy set theory is introduced to realize the classification and lifetime value of these customers with uncertain information. The determination of the fuzzy membership degree is the key in the process of converting the original data set into a hesitant fuzzy set. In this paper, two membership degree determination methods are used to produce the hesitant fuzzy element with length 2. This method avoids the difficulty of experts’ scoring in a certain degree. In the following research, we can obtain more experienced and time-oriented expert scoring by hesitation questionnaire or expert interview. The length should also be expanded appropriately. Finally, by calculating the CLV of the homogenous customer cluster centers, the lifetime value of each customer group is evaluated more clearly and intuitively. Combined with the qualitative analysis, the attribute characteristics of each customer group are clarified, and the rationality of the classification results are verified from the actual situation.
Footnotes
Acknowledgments
This work was financially supported by the Project of National Natural Science Foundation of China(Nos. 61502280, 61472228), Natural Science Foundation of Shandong province(No. ZR2014FM009).
