Abstract
Property insurance companies in China have accumulated certain customer resources, and these resources generate greater competitive challenges. In view of this, it is highly significant to the development of these companies to deeply analyze the individual demands of existing customers and to develop a broader cross-selling business based on the effective means of data mining tools. In this paper, the fuzzy c-means algorithm is introduced to association rules mining. Additionally, the improved Apriori algorithm-Fuzzy Association Rules Mending Apriori Algorithm based on fuzzy c-means is presented. The time complexity and space complexity of the proposed algorithm is reduced, and the application scope is expanded to uncertain environment. Furthermore, an example is given to illustrate the use of the proposed methods. With the help of data mining tools, six main valuable fuzzy association rules are mined, and one cross-selling model is built based on property insurance customers’ data sets.
Keywords
Introduction
According to the Pareto’s law, 80 percent of a company’s profits comes from 20 percent of its customers. Meanwhile, the cost of developing new customers is far higher than maintaining existing customers. For property insurance companies, cross-selling is an effective method for determining existing customers’ potential consumption abilities.
Studies concerning cross-selling have been very prosperous, and the results of these studies have been applied in financial institutions for a long time. W. Karakul optimized the allocation of cross-selling resources through analyzing existing customers’ demands and their characteristics [16]. O’Donnell and Anthony proposed advice on cross-selling to insurance companies [9]. Furthermore, D. Lee, proposed a utility-based association-rules mining method that valuates association rules by measuring their specific business benefits accruing to firms [3]. T. Ted proposed three steps and four ways to help generate effective cross-selling results [14, 13]. L. Dave analyzed the methods that are used by investors in IBC (the flagship bank of international bancshares) to distinguish different customers’ demands to recommend different product portfolios to them [7]. Cross-selling can effectively prevent customer loss as well as lessen companies’ operating costs.
However, some problems exist in the implementation process. T. Halah suggested that Wells Fargo Bank’s cross-selling strategy results in insufficient customer growth [12]. T. Zhao and K. Matthews analyzed the British Bank’s cross-selling performance as well as the conversion costs of cross-selling between 1993–2008 [15]. K. Berry suggested that the cross-selling strategy of Wells Fargo has caused customers’ shortages [6].
The development and application of cross-selling in the insurance industry in China are very extensive. Fei introduced three modes and development status of cross-selling insurance in China, while also pointing out some issues and providing suggestions for these issues [19]. Additionally, many scholars have researched cross-selling with data mining tools. Wang and Hu used clustering technology to carry out cross-selling and analyzed the cross-selling of life insurance in practice [18]. Association rules can be a good way to tap the potential interest of customers; thus, they can be used in cross-selling. The more commonly used algorithm to determine association rules is the Apriori algorithm, but the form of data has hindered its development.
In the twentieth century, the occurrence of fuzzy set theory has provided a new method for improving the Apriori algorithm. Additionally, fuzzy theories have often been used in other process improvement tolls in insurance companies. G. Nilay established some alternatives for investors who want to purchase insurance companies according to established criteria with the VIKOR (VlseKriterijumska Optimizacija I Kompromisno Resenje) method completely under fuzzy environment with fuzzy sets [4]. C. Kahraman proposed an integrated methodology composed of the fuzzy analysis hierarchy process (AHP) and fuzzy technique for ordering preference by similarly to ideal solution (TOPSIS) to select the best health insurance option [2]. S.H. Chen considered the fuzzy correlations between the indicators and the weighted distance of fuzzy modified TOPSIS to evaluate the alternatives of insurance companies [11]. J. Vaziri used triangular fuzzy logic throughout several quality management tools, including service quality (SERVAUAL) and others, to address data uncertainty and increase model flexibility. The proposed model was implemented in a case study of life-insurance services [5]. M. Saeedpoor utilized the fuzzy AHP to determine the importance weight of each criterion of the SERVQUAL model [8]. Additionally, the “strengths, weakness, opportunities and threats” (SWOT) is one of the most significant analytic tools for defining corporate strategies; however, it has certain drawbacks. As a result, M. Saeedpoor utilized the intuitionistic fuzzy sets (IFS) theory to overcome the drawbacks of SWOT [1].
From former studies of cross-selling and association rules, various data mining methods have been used. These studies address numerous different fields, such as banking, insurance, telecommunications, and so on. In this paper, the combination with fuzzy set theory is used to overcome the limitations of the Apriori algorithm. Fuzzy association rules are applied to cross-selling in the fields of property insurance products, which were seldom considered in former studies.
Concepts of association rules and apriori algorithm
The association rules were originally proposed for shopping baskets. In mathematics, it is expressed as X ⇒ Y. X is the antecedent of association rules, and Y is consequent. The relationship between item X and item Y is measured by support and confidence. The definitions of which are shown in Equations (1) and (2).
The Apriori algorithm is the most widely used method to mine association rules. Its principle is simple and easily implemented. First, the database is searched to determine all of the frequent 1-item sets L1 whose support ≥ min _ sup(minimum support given by the user), and then, the frequent 2-item sets L2 is generated based on L1. In the K time scanning, (K-1)-item sets are taken as seed sets, and they are connected to produce a potential candidate set C k . Then, the database is scanned again, all supports of item sets in C k are recalculated, and all K-item sets whose support ≥ min _ sup are identified.
It can be seen that the Apriori algorithm needs to scan all of the records in the database to calculate the support of each item set, which greatly increases the I/O overhead of the computer system. With the advent of big data, the application of the Apriori algorithm is subject to its high I/O overhead [10, 17].
Dealing with numerical data, the Apriori algorithm usually converts them into Boolean data by partition. However, this is associated with hard classification. The first problem is the rationality of the partition and, moreover, the boundary of the district. There is a great deal of uncertainty in the attribution of the sample. Fuzzy theory provides a reference for solving these problems. The improvement of the Apriori algorithm-FARMA is proposed in this paper based on fuzzy c-means clustering (FCM). Furthermore, in order to improve the shortcomings of the FCM algorithm, the main component analysis is used to eliminate the noise variables that affect the clustering effect.
The original data set D is scanned, and numerical data is set into the data set FD. Then, the FCM algorithm is used to transform each numerical attribute into fuzzy records, which contains the fuzzy attribute of the data and the corresponding membership function u (u ∈ [0, 1]). The intermediate data set D1 is generated, which is the fuzzy version of the original data. Then, Apriori is used to identify association rules based on fuzzy data sets.
The FCM algorithm based on fuzzy set theory is an extension of the K-Means algorithm. The main idea of FCM is to minimum the sum of squares of distance between partitions and then to determine the degree of each data belonging to each cluster partition according to the degree of membership function. Finally, the fuzzy partition matrix is generated. The objective function of the FCM algorithm is defined as the sum of the square of distance and the SSE as follows:
The distance of data point x
i
and cluster center c
j
is shown in Equation (5).
The disadvantage of this algorithm is that it very easily falls into the local optimal solution. The number of cluster categories c and the selection of weight m is very important. At the same time, the algorithm does not take into account the correlation between indicators. When a large number of relevant indicators and noise data exist, the classification accuracy of the algorithm is greatly reduced. In order to improve the defects of FCM, the dimension reduction idea of principal component analysis is introduced to remove the noise data.
To effectively analyze and demonstrate the process of FARMA, R was first used to carry out principal component analysis to select variables that are effective for FCM implementation. Second, FCM clustering was implemented. Finally, the association rules mining of the fuzzy database on the Clementine software was carried out, and the corresponding results were obtained through analyzing the interest rules. We set the number of clusters as four types (i.e. c = 4) and the fuzzy weight m as 2 because many scholars have proven that an m value between (1.5∼2.5) is more reasonable.
Association rules mining based on FARMA
In the given data set, there are a variety of indicators: customer number, age, gender, family address (area), marital status, birth month, educational level, employment status (whether retired), housing property, income, debt, type of work, vehicle with number, vehicle type, vehicle value, credit card types, types of property insurance, the insured amount, and so on. In total, there are more than 100 targets. In order to reduce the workload of the FCM algorithm, the principal component analysis was first performed to reduce the dimension of the data variables. We identified the main 11 factors (the original index) that influence principal component through the principal component analysis. In Table 1, only the first four indicators are shown.
The 11 variables represent the customers’ characteristics of age, gender, region where they live, marital status, education level, retirement or not, housing property rights, income, types of insurance products, insurance claims, and insured amount. Every variable was assigned to a numerical indicator.
In the process of fuzzy data, the variable age was clustered into three types: youth, middle age, and old age. Likewise, the variable income was clustered into three types as low-income, middle-income, and high-income. As for the insured amount, FCM clustered this category into a low amount and high amount of two types. Each of the various indicators of the sample corresponded to a membership function value vector, and the maximum value of each vector was the partition of the sample indices. For example, for the sample numbered 001, the membership degree function of the age index is (0.1, 0.7, 0.2), and the maximum value is 0.7, so it (52 years old) was categorized into middle age.
In Table 2, variables X1 ∼ X11 represent the age, gender, region where they live, marital status, education level, retirement or not, housing property rights, income, types of insurance products, insurance claims, and insured amount, respectively. 001∼005 represent five transactions.
Indices X1X8X11 are numerical data. These variables were stored in data set FD, and the FCM algorithm was used to fuzzy the set. Some results concerning their membership function are shown in Table 3.
All of the attributes were re-numbered, and the three stages of age were labeled as three variables A1 ∼ A3, gender A4, and so on. Each was numbered in turn, and 29 variables were obtained. Thus, the original data set was transformed into a data set that contained only Boolean type data. Due to paper limitations, we have only shown the first ten attributes’ values in Table 4.
The analysis of sample 001 is as follows: the age is 52, the membership degree function is (0.1,0.7,0.2); thus, this sample was classified into middle age class. That is, variable A2 should have 1 as its value, whereas the other variables representing age should have 0 as the value. Further, we can similarly interpret the relationship between other attributes and variables.
In the above three tables, ID represents the transaction number, X i represent variables in the raw data, and A i represents the renumbered variables in the fuzzy data. One X i may corresponds with several A i .
Because the fuzzy data sets have been obtained, the Apriori algorithm can be used to mine association rules. The minimum support degree was set to 15%, and the minimum confidence was set to 90%. Theresults are shown in Table 5. The rows represent the six association rules, and the columns representthe consequence, the antecedent, and the value of support and confidence, respectively.
The analysis of association rule 1 is as follows: customers who are unmarried (M status = 0), high income earners are more likely to purchase personal account funds insurance. Association rule 2 points out that many customers whose age group is 1 and married women are prone to buying express parcel insurance. This may be related to the fact that women are more inclined to go online shopping. From association rule 12, customers who have an education degree of 3 (high school education), live in region 1, and are married mostly buy earthquake insurance. We can similarly analyze other associationrules.
Cross-selling analysis and policy suggestions
The purpose of analyzing cross-selling is to identify a method that simultaneously expands sales, actively encourages certain categories of customers to buy certain company products, boosts the revenue of the company, and offers feasible insurance varieties that meet customers’ demands.
In the example used in this paper, there were six types of insurance products: personal account funds loss insurance, express parcel post insurance, property insurance for home, family wealth treasure comprehensive plan, integrated home protection plan, and Ping An earthquake insurance.
From the above six association rules, it can be seen that customers that buy Ping An Earthquake insurance are more likely from region 1. This may because of the geographical position of the region. It can also be seen that the higher education level group and higher income level group have a strong sense of insurance. Therefore, it should be recommended to customers with high income and high education levels who live in region 1 that they buy safe earthquake insurance.
Customers who buy home comprehensive protection plans have the following characteristics: married, have housing property, and have a higher income. At the same time, it can be seen that these people tend to buy family wealth treasure planning insurance. Therefore, we can achieve cross-selling between these two kinds of products.
Females tend to purchase express parcel post insurance. This insurance type is a smaller category that can be mutually cooperative with companies business and can carry out a more convenient sales form.
High income, unmarried customers tend to purchase personal account funds insurance. By fully analyzing the customer requirements, we can identify their parents and their family status and achieve cross-selling by selling family property insurance and home comprehensive guarantee insurance to such customers.
Theoretically, the above examples demonstrate good results. However, in actual business, there are some problems and obstacles in the implementation of cross-selling in insurance companies.
First, the position of cross-selling is not accurate enough. Group companies tend to focus on brand building, and they may pay attention to the development of cross-selling; however, the subsidiary may focus on its own main business sales performance and may be indifferent to cross-selling.
Second, China’s double tax law will improve business costs. The income from policy sales is required to pay business tax, and the implementation of cross-selling between the life insurance subsidiary and insurance subsidiaries requires agents to pay two types of tax. This double taxation system causes agent fees to dramatically shrink; thus, agents may not be willing to carry out cross-selling business. In order to reduce the loss incurred by agents, some companies increase the amount of subsidies, but by doing so, the business costs are directly improved, and business benefits are difficult to guarantee.
Finally, the companies’ internal mechanisms are not perfect, and communication is often lacking between various departments. Thus, it may be necessary to improve business ability. Because of the lack of incentive, the willingness of business personnel to carry out cross-selling is lower. To solve these problems, companies can carry out training to improve the business level of staff. Employees should to be familiar with various products and with the interpretation of insurance responsibility.
Conclusions
The FARMA proposed in this paper offer a more effective remedy for the shortcomings of the Apriori algorithm and expand its application scope by reducing time complexity and space complexity. The reach within the company’s internal products and use of association rules to identify customer interests was shown to be a more effective method. Furthermore, compared to practical experience, the result of our analysis has certain practical value.
In this paper, this cross-selling analysis studies the correlations between the property insurance companies’ internal products. For the insurance group, there are many inter-related products between subsidiaries; subsidiaries and headquarters; property insurance and life insurance; and between automobile insurance. All of these products may achieve cross-selling. Thus, future research will focus on broadening the application scope of the FARMA algorithm and improving the choice standard of related parameters of FCM.
Footnotes
Acknowledgments
This work was financially supported by the Project of National Natural Science Foundation of China (Nos. 61502280, 61472228), the Project of Qingdao Applied Basic Research of Qingdao (special youth project, No. 14-2-4-55-jch), Natural Science Foundation of Shandong province (Nos. ZR2014FM009, ZR2015FM013), and the Graduate Education Innovation Program Project of Shandong University of Science and Technology (No. KDYC14016).
