Abstract
In the current competition environment, organizations have realized that to gain profit, in addition to attract customers, they should have a good relationship with them Understanding the needs of customers and providing services for them are important factors in the success or failure of any organization. Therefore, we need to a standard measure to assess the value of customers and, as a result, establish a profitable and long-term relationship with them. Customer lifetime value is a standard measure used to predict the value and segmentation of customers in future. In this paper, we collected all English articles in this field from 2001 to 2019, that most of them had only examined the Recency, Frequency, and monetary features. But in this research, we have explored new features of customers and their accounts that have identified profitable customers and, consequently, clustered them with more accurate customer information. Two clustering methods (K-mean and CPSOII) have been used to examine customers. The advantage of CPSOII compared with the K-means is that CPSOII is able to determine the number of clusters automatically. By using algorithms assessment criteria such as SSE, VRC and DBI, we have reached to this result that CPSOII with DBI = 0.44 is the most suitable clustering algorithm. By using the result of CPSOII, we calculated the customers’ longevity, and we found that customers with the highest values of RFM indexes, have the longest lifetime and the bank should plan for their maintenance.
Keywords
Introduction
Customer relationship management helps to understand the needs of customers and to provide appropriate services to these needs, ultimately leads to increase in their lifetime value [1, 2].
In part of customer relationship management, companies focus on long-term relationship with customer for increasing profitability. As a result, companies take different decisions about the prospects of obtaining customers, the amount of cost to obtain and maintain them, which aims to promote the company [3, 4].
Today, customer behavior is often uncertain and changes over time, so organizations can use a number of methods to predict changes in customer behavior and the company needs to focus on mining changes in databases [5, 6, 7, 8].
Many studies have employed DM to analyze customers data but some have attempted to discover new DM techniques [9, 10, 11]. The important issue is how we can gain better insight and knowledge through using DM techniques to improve CRM and increase Customer Lifetime Value (CLV) and thereby create more profit for customers and company [12].
In the previous studies,there are several models for estimating the customer lifetime value, but in retail banking environment, a model should focus on the assessment of homogeneous segments of customers rather than customers individually for easy implementation. It also requires an easy understanding and natural perception for its application in many commercial areas. RFM model is one of the useful behavioral models for the customer lifetime value [13, 14, 15]. RFM model is one of the most basic and famous models to measure the customer lifetime value, i.e., the predicted value a customer is going to generate in his entire lifetime. It has been used in many studies to determine the customers’ loyalty [6, 14, 16, 17, 18].
According to the reviews conducted on most of the recent research in the field of bank’s customer analysis, it is observed that researchers have considered RFM as the most appropriate method for classifying customers and introduced K-means algorithm as the most appropriate algorithm that partitions customers into high quality clusters. For example, in the latest research conducted in this area in 2017, the researcher used RFM model to find customers’ segments within eight 3-month periods and considered K-means algorithm appropriate for clustering customers. This method is based on partitioning from set data into cluster K as the initial center of the cluster. Other elements are placed in the nearest cluster based on the distance between the element and the cluster center. It is a distance-based clustering algorithm that divides data into a number of clusters in numerical attributes [12, 19].
The purpose of the current research is to introduce appropriate characteristics of customers for clustering, as well as to introduce an algorithm more appropriate than K-means algorithm to increase the quality of clusters. Studies conducted so far in this area show that bank strategies have been formed only by investigating the characteristics of persons’ account transaction. In our view, these characteristics are not sufficient. Identifying and investigating other characteristics of customers can help bank managers identify their valuable customers more carefully, present more appropriate services to them by identifying and investigating more characteristics of them, and finally, satisfy them and lead to the profitability of the bank and the customer.
In addition to investigating the individuals’ financial transactions characteristics in this research, we investigated their personal characteristics and different types of their accounts and used them for clustering to have clusters with more exact high-quality characteristics. Moreover, we proposed Combined Particle Swarm Optimization Algorithm (CPSOII) in this area and compared it with K-means algorithm. In most recent studies conducted in this area, K-means-algorithm has been introduced as an appropriate algorithm for clustering customers, for example to [20]. For this reason, we make a comparison between the proposed CPSOII algorithm and K-means algorithm in the following.
The rest of the article is organized as follows: Section 2 is assigned to the literature review. The proposed method is introduced in Section 3. Section 4 has to do with the implementation and assessment of the proposed method. Finally, the results and suggestions for further research are presented in Section 5.
Literature review
Many studies have employed DM to analyze customer data but some have attempted to discover new DM techniques. The important issue is how we can gain better insight and knowledge through using DM techniques to improve CRM and increase CLV and thereby create more profit for customers and company [1, 21, 22, 23].
Theoretically, the customer lifetime value model should calculate a customer’s value across the entire customer life, while in practice most of researchers use a limited time interval of 3–4 years. Considering that decision makers use customer lifetime value criterion in the database, it is obvious that the prediction accuracy of the customer lifetime value is of paramount importance. Additionally, these predictions are often used as an instruction for investment in the customers segment [24, 25, 26, 27].
In the present research, the data mining methods, which have been so far investigated to analyze the behavior of bank customers, are reviewed. These methods are divided into 4 groups of Supervised, Un-Supervised, evolutionary learning, and other methods. In the following, some of the methods used in this area are investigated in brief.
Supervised: In [10], CHAID and C5.0 decision tree techniques are used for classification of customers and help the banking industry to make decision. They provided a set of rules that can be applied to a new dataset to predict which records will have a given outcome. In [28], authors have used the C5.0 model to produce rules for predicting the level of loyalty based on demographic variables, on the obtained clusters from k-means and two-step algorithms Neural network has been used for creating predictive models such as customer lifetime value. Neural network has a wide range and can be applied to both supervised and unsupervised DM and to solve estimation problems [29, 30, 31]. In [32], K-Nearest Neighbor (KNN) technique is used to classify and identify the goods that are more favorable to customers. Un-Supervised: In [33], a model has been used with a combination of first-order Markov chain modeling and CART. This model is based on the analysis of homogeneous groups instead of individual customers. The chain model has been used in marketing, including customer lifetime valuation. In [34], FCM algorithm is used to cluster data into nine optimum clusters based on three values of recency, frequency, and monetary. In [35], fuzzy clustering was applied to collect and normalize data from 120 customers based on four different variables, namely length of the relationship, recency of trade, frequency of trade, and monetary value. In [36], K-means algorithm was applied for customer segmentation in order to assess CLV for each segment. In [37, 38], K-means algorithm was used as clustering algorithm for clustering the bank’s customers. K-means algorithm was executed eight times and Dunn index was calculated in each time, because this algorithm needed this calculation. Evolutionary learning: This technique can be used in any classification-based prediction scenario. For example, the banks have used this technique to predict credit. Genetic algorithm (GA) is a meta-heuristic algorithm used for data clustering. In [35], this algorithm was used for customer clustering. Other: In addition to the above-mentioned techniques, some researches have attempted to suggest alternative techniques, which we have categorized as “Other”.
RFM can be considered as the most powerful and behavior-based model to implement CRM [36] . In [39], the RFM scoring model is used to transform the customer behavioral variables. By RFM scoring, customers are segmented into various target markets in terms of customer value. Therefore, in [38], RFM variables were extracted from the Export Development Bank’s database and were accordingly normalized. Then, the variable weights were calculated using FAHP (Fuzzy AHP), and finally, a value for each customer was estimated. In [12], the RFM model is used to detect customer segment over 8 three-month periods. Moreover, in [40], the RFM values are used to classify three profitable groups of customers. Authors in [41] represented a model to calculate customer life time value (CLV) based on LRFM customer relationship model, which consists of four dimensions: relation length (
Some studies have added the count item to RFM model and implemented the RFMC model. The results revealed that the count item is not so useful and the outcome of RFM model was better than RFMC model [36]. Therefore, CLV is calculated based on weighted RFM method for each segment.
In [41], researchers have proposed a combination of RFM with AHP and K-means algorithm and then segmented it with a group of customers from one of the big national banks in the country.
In this section, the proposed framework of the research is introduced. This framework is based on data mining techniques and the analysis of the customer’s past behavior to predict their behavior in the future. Moreover, RFM model and clustering algorithms are used.
Various methods have been presented for implementation of data mining projects, but one of the most powerful methods is CRISP-DM. The proposed method has been also designed based on CRISP-DM method with some small changes made to it. It should be noted that this general framework has been extracted, with a little change, based on the prior studies (Fig. 1) and contains 11 steps. These steps include: 1. Business recognition, 2. data collection, 3. pre-processing of data, 4. normalization of indicators, 5. weighting indicators, 6. determining the value of indicators, 7. determining the average value of indicators for each customer, 8. customer clustering, 9. determining the average value of indicators in each cluster, 10. calculation of the consumer lifetime value for each cluster, and 11, clusters analysis.
The overall framework of the article.
First step, Business recognition: This step focuses on the recognition of the projects objectives and needs from the organization perspective. After identifying business objectives, recognizing the status quo is important to identify existing opportunities. In order to attain its goals, the organization should use data mining methods to extract the appropriate pattern from the existing data.
Second step, Data collection: This step is related to data collection and investigation of the dataset. Then data are entered into Excel Software as a set of rows and columns.
Third step, data pre-processing: In this step, at first, incomplete and misleading data and some of the existing records which have missing data or are inconsistent with other data are removed in order to discover the latent knowledge in the existing data. Moreover, data should be converted to a format to be used in RFM model.
Fourth step, normalization of indicators: Due to the difference in the unit of the used indicators, the indicators value should be normalized based on an identical unit. These indicators are normalized between 0–1 using the following Eq. (1).
Fifth step, weighting indicators: In order to obtain the relative weights of indicators, a questionnaire for pairwise comparison of analytic hierarchy process (AHP) was used.
Sixth step, determining the value of indicators for each customer: The value of each indicator of RFM model is determined by multiplying its normalized value by its weight. These values are shown as
Seventh step, determining the average value of indicators: The average value of each indicator was determined by dividing the sum value of that indicator in all customers by the total number of the customers.
Eighth step, customers clustering: In this stage, data obtained from bank database are clustered based on the algorithm used. The aim of this research is to classify customers based on their lifetime value. Therefore, each customer value should be measured based on the cluster it belongs to. The selection of the appropriate algorithm is thus of considerable importance.
Ninth step, determining the average value of indicators in each cluster: in this stage, the average value of each indicator in each cluster was determined by dividing the sum value of indicator in that cluster by the number of the customers of that cluster.
The proposed characteristics vector.
Tenth step, calculation of customer lifetime value in each cluster: Finally, the customer lifetime value (as shown in the following) in each cluster is calculated from the sum of the average values of RFM indicators in that cluster.
Eleventh step: Clusters analysis: To perform this analysis, the average value of indicators in each cluster was compared with the average value of indicators in the whole data. If the average value of an indicator in a cluster is greater than the average value of that indicator in the whole data, the situation is desirable. If the average value of an indicator in a cluster is less than the average value of that indicator in the whole data, the situation is thus undesirable.
In the present study, we investigated the issue of ranking bank customers and by differentiating between customers based on their characteristics and marketing in the form of targeted funding allocated [4]. To this purpose, the required data for this study were collected and investigated. The data of those customers who have joined to the bank prior to the beginning of the time interval are intended. The data belong to 27829 customers. The characteristics under investigation were sorted in the form of characteristics vector shown in Fig. 2. According to this vector, 13 characteristics were proposed as useful data of customers. As aforementioned, these characteristics include 3 groups of personal characteristics of customers, financial transaction characteristics of customers, and customers’ accounts characteristic. This is the first time these features have been investigated and used in customer surveying and clustering, leading to more accurate customer information.
In order to use these characteristics, they should be assimilated. Our suggestion is shown in Table 1.
Classification of customer information
Classification of customer information
In the present study, by the person’s account, we mean the investigation of all accounts of that person. To this purpose, the transaction characteristics should be redefined as follows:
Recency (R): Here, recency is equal to the last date each person has referred to one of the accounts. Frequency (F): Instead of computing the number of times each person refers to each account, the number of person’s referrals to all accounts is calculated. Monetary (M): Rial equivalence for the sum of remaining money in all accounts of the person.
After preparing data, they were converted to the format to be used in RFM model and those indicators were normalized. The mean value of indicators for all customers under investigation is shown in Table 2 these results are used in the following for the analysis of customers’ value.
Determining the average value of indicators
As aforementioned, the aim of the present study is to compare CPSOII algorithm with K-means algorithm to introduce the algorithm that produces higher quality clusters and is more appropriate to determine the bank customer lifetime value. The advantage of the CPSOII is that it is able to detect the number of clusters automatically. In the following, we compared the convergence rate of these algorithms toward the optimal solution. At first, both algorithms were run 30 times. The results of K-means algorithm were constant, but due to the difference of the results obtained from CPSOII implementation, we ran it 250 times until we, finally, reached to constant results from 220
In this study, we used the assessment criteria of SSE,VRC, andDBI to investigate the quality of algorithms. As shown in Table 3, the investigation of the algorithms assessment criteria, it is revealed that CPSOII algorithm presents a more precise solution than other algorithms used in this area. Additionally, results show that the proposed algorithm CPSOII has a better value in Best column.
Investigation of the assessment criteria of algorithms
Moreover, Figs 3 and 4 show the graph obtained from SSE, VRC, DBI, and the speed of algorithms, respectively.
The difference between values of evaluation criteria in CPSOII and K-means algorithms.
Calculation of the average value of indicators in each cluster
Comparison of the status of indicator in each cluster with the status of indicator in the whole data
The difference between speeds of algorithms in each criterion.
As the appropriate CPSOII clustering algorithm has been identified in this area, we will use the results from this algorithm in the next steps.
The average value of each indicator of recency, frequency, and monetary in each cluster was determined by dividing the sum value of indicators in each cluster by the number of customers of that cluster, which is shown in Table 4.
In the following, the customer lifetime value in each cluster is calculated. Figure 5 shows number of clusters, number of customers in each cluster, and in the right side, customer lifetime of each cluster. Clusters on the top have longer lifetime. As we move toward bottom, we reach clusters with shorter lifetime.
Bank customer lifetime value in the dataset under investigation.
In order to analyze the status of clusters, we compared the average value of RFM indicators in each cluster with average value of indicators in the whole data. The results of this comparison are shown in Table 5.
This comparison shows the status of the average value for each of RFM indicators in each cluster relative to the average value of these indicators in the whole data. In comparison, if the average value of an indicator in a cluster is greater than the average value of that indicator in the whole data, the situation is desirable and is shown by (M). If the average value of an indicator in a cluster is less than the average value of that indicator in the whole data, the situation is undesirable and shown by (L).
The investigation of the results from the two last steps shows that all clusters are in a desirable status in terms of RFM indicators except the 5
More exactly, it can be said that Cluster 2 contains customers who had the highest turnover, and due to the large number of members, this cluster provides us with more exact results. Cluster 4 includes customers who had the maximum monetary balance in all of their accounts. Cluster 1 includes customers who have done the last transaction in the closer time.
In the current study, we tried to identify and present an appropriate clustering algorithm for this important issue using a valid dataset. In the studies [6, 12, 28, 37], K-means algorithm was introduced as the most well known and most appropriate algorithm for customers’ clustering. To this purpose, we compared the proposed CPSOII algorithm with the K-means algorithm. The investigation of the results from these algorithms, as well as the results obtained from the assessment criteria, showed that, CPSOII algorithm has lower DBI values compared to K-means algorithm, and therefore, the quality of this algorithm is 51% better than the K-means algorithm. As a result, CPSOII algorithm provides us with correct and appropriate clustering to calculate the customer’s lifetime.
After obtaining the appropriate clusters, we calculated the customers’ lifetime and reached to conclusions by investigating the obtained lifetimes. It can be said that customers in Clusters 1, 2, 3, and 4 are in a desirable status in terms of RFM indicators; that is why they are considered valuable for banks.
Generally, it can be suggested that banks assign special services to the customers with high M values, so that their M value and their profitability for the bank can be increased by increasing the number of the individuals’ referral and their loyalty.Banks can increase their customer satisfaction by providing dedicated services to each customer. To this purpose, the bank should attempt to create a complete profile including all the required data for these services for its customers. For this purpose, banks should have earned the trust of their customers so that they can provide banks with the correct information; this important requires extensive research.
For further research, the appropriate banking services are offered, and with the help of bank managers, appropriate strategies for improving the services of banks are adopted and presented.
