A study on the stratification of long-tail customers in civil aviation based on a cluster ensemble

Abstract

Stratifying long-tail customers and identifying high-quality customers with high growth potential are crucial for civil aviation companies to explore new profit growth points. This paper proposes a long-tail customer stratification model based on clustering ensemble to address the problems of insufficient attention to long-tail customers in previous studies and the low accuracy and lack of accuracy testing of single clustering algorithms. First, the Bayesian information criterion is used to determine the optimal number of clusters. Then, an ensemble framework integrating the Gaussian mixture model, spectral clustering, Two step clustering and K-means algorithm is constructed, and the stacking and bagging ensemble methods are used for the cluster ensemble. Finally, three different indicators are used to evaluate the algorithm performance. Experimental results indicate that compared with single clustering algorithms, the Stacking algorithm increases the silhouette coefficient by 14.77% to 27.11%, the Calinski-Harabasz index by 38.83% to 122.18%, and the Davies-Bouldin Index by 19.38% to 98.04%. This indicates that each clustering has high cohesion and separation, with samples within a category being more closely related and those between categories having clear boundaries. It shows that the Stacking algorithm more accurately stratifies long-tail customers with similar consumption behaviors into different categories, achieving customer stratification.

Keywords

Customer stratification long tail theory ensemble learning stacking algorithm bagging algorithm

1 Introduction

With the development of society, China’s aviation passenger transport business has developed rapidly, and market demand continues to increase, making customers the most important asset of each airline company [23]. In fierce market competition, small and medium-sized customers in the tail market have become the new growth point for enterprise profits [20]. However, the existing studies related to customer stratification in the civil aviation field focus on the identification and maintenance of high-value customers of enterprises [30] and pay insufficient attention to low-value long-tail customers, suggesting that enterprises should not pay too much attention to them [10]. As a supplement and improvement of Pareto’s law in the Internet economy [21], the long-tail theory emphasizes that although the base of long-tail customers is large, there are also high-quality customers with great potential value, who are either new customers or old customers who have become loyal to the company after multiple comparisons. Therefore, it is necessary to stratify long-tail customers and then carry out differentiated management to realize customer value enhancement [6]. How to effectively differentiate and stratify long-tail customers with highly similar consumption behaviors is an urgent problem for companies to solve.

Machine learning is one of the most important technologies in the 21st century [1]. As an unsupervised machine learning technique, clustering can divide data into several clusters based on similarity so that objects in the same cluster are similar to each other, while objects in different clusters are different from each other [24]. Chen et al. (2016) used hierarchical clustering and K-means algorithms to segment the environmental behaviors of Taiwan civil aviation customers [7]. Chiang (2017) used the Ward method to divide airline customers into four categories and created association rules using the supervised Apriori algorithm [30]. Dehghani Zadeh et al. (2018) improved the K-means algorithm using the imperial competition algorithm to achieve civil aviation customer stratification [23]. These studies either combined algorithms to obtain more comprehensive customer behavior characteristics or improved algorithms to obtain better results but only used a single clustering algorithm, ignoring the inherent ambiguity of the algorithm and its matching with the data set, which cannot fully reveal the intrinsic structure of the data set and obtain accurate clustering results and has low robustness [29]. For a large amount of highly similar data, the clustering accuracy and efficiency of a single method cannot be balanced [16].

Ensemble learning is a research hotspot in the field of machine learning, where the results of each machine learning algorithm are integrated into a framework in some way so that the complementary information of each algorithm can be used effectively to improve the performance of the overall model [32]. In the unsupervised field, it is called the clustering ensemble, which uses clustering algorithms to generate a series of clustering partitions and combines these partitions together to obtain a consensus solution [32]. As it fully utilizes the information provided by clustering members, clustering ensemble algorithms can significantly improve the accuracy, robustness, and stability of a single clustering algorithm [32]. There have been studies using cluster ensemble techniques to achieve customer stratification, such as Farvaresh & Sepehri (2011), who first used a self-organizing map neural network and K-means algorithms to cluster customer data of a telecommunications company and added category labels and then used decision trees, neural networks, and support vector machines as individual classifiers using stacking and other strategies for integration [12]. Yang et al. (2020) used a semi supervised spectral clustering ensemble (SSSCE) algorithm to stratify automotive after-sales service customers and demonstrated that the customer stratification results of the SSSCE algorithm outperformed those of a single spectral clustering algorithm [33]. Huang et al. (2020) integrated the Clara algorithm and K-means to build a customer stratification model and achieved precision marketing for tobacco retailing [11]. Although cluster integration is widely used in major fields, no study has been conducted to stratify civil aviation long-tail customers using cluster ensemble algorithms.

In summary, in previous studies on the stratification of civil aviation customers, there has been insufficient attention paid to long-tail customers and insufficient depth of value mining, as well as the problem of using a single clustering algorithm with low accuracy and a lack of accuracy testing. From the perspective of the long tail theory, this paper uses clustering ensemble algorithms to stratify civil aviation long tail customers, discover high-quality customers with growth potential among long tail customers, and manage customer relationships well, in order to maximize the retention of valuable existing customer resources for civil aviation enterprises.

The main contents include: Firstly, based on Lawrence’s classic customer stratification framework and the RFM model, considering the impact of service cost on customer value, we construct the RFMC model and use self-organizing map neural network to identify long-tail customers; Secondly, the LTPNDG model reflecting the growth potential of long-tail customers in civil aviation was constructed, and clustering ensemble algorithms based on Stacking and Bagging were used to stratify long-tail customers; Then, three different clustering performance evaluation metrics are used to assess the effectiveness of clustering ensemble, and the optimal long-tail customer stratification result is selected accordingly; Finally, according to the consumption behavior characteristics and growth potential of different levels of customers, targeted marketing management measures are proposed.

2 Relevant introduction

The performance of ensemble learning largely depends on the diversity between individual classifiers [26]. To achieve better clustering results, different types of classifiers can be combined. Classic clustering algorithms include partition-based, hierarchical, density-based, and model-based methods [16].

2.1 Single clustering algorithm

2.1.1 Gaussian mixture model

Gaussian Mixture Model, or GMM for short. It is a probabilistic model that uses a probability density function obeying a Gaussian distribution to describe the cluster centers, assigns similarity to data samples by calculating their probability of satisfying the Gaussian distribution, and then corrects the probability to divide the clusters [17]. The basic functions are as follows: $p (x) = \sum_{k = 1}^{K} π_{k} N (x | μ_{k}, Σ_{k})$ (1) where μ _k and Σ_k represent the mean and covariance matrix of the k Gaussian distribution, π_k represents the weight of the k Gaussian distribution, and N(x|μ _k , Σ_k) represents the probability of x in the k Gaussian distribution model.

2.1.2 Spectral clustering

The core idea of spectral clustering, as a division-based clustering algorithm, is to transform the unsupervised clustering problem into a division of undirected graphs, which can achieve clustering in sample spaces of arbitrary shapes and converge to the global optimum [19].

For a data set X = (x₁, x₂, . . . , x_n) ∈ R^n×m, each sample has m attributes, and a certain sample x_i can be represented as x_i = (x_i1, x_i2, . . . , x_im), i = 1, 2, . . . , n. The basic principle of classic spectral clustering is as follows:

First, construct the similarity matrix A ∈ R^n×n: $A_{ij} = {\begin{matrix} exp (- d^{2} (x_{i}, x_{j}) / (2 δ^{2})), i \neq j, \\ 0, i = j . \end{matrix}$ (2) Second, construct the normalized Laplacian matrix $L = D^{- \frac{1}{2}} A D^{- \frac{1}{2}}$ of A, where D is the degree matrix, $D_{ij} = \sum_{j = 1}^{n} A_{ij}$ .

Then, select the top K largest eigenvalues of L and form an n×K matrix Y with their corresponding eigenvectors. Normalize Y to obtain matrix V, where $V_{ij} = Y_{ij} / \sqrt{\sum_{j} Y_{ij}^{2}}$ , i = 1, 2, . . . , n, j = 1, 2, . . . , K. Each row in V is treated as a sample, and the K-means algorithm is used to cluster matrix V.

2.1.3 Twostep clustering

Twostep is an improved clustering algorithm that uses hierarchical clustering ideas. It can handle mixed data of categorical and continuous variables and automatically determines the optimal number of clusters in the clustering process based on the Bayesian Information Criterion (BIC) [28], solving the problem of having to determine the number of clusters beforehand when facing unknown data set distributions.

If for the existing class j and class s, the clusters after the merger of the two classes are denoted as < j, s >, the distance between them is defined as the difference between the log-likelihood estimate $\overset{Λ}{l}$ before the merger and the log-likelihood estimate $l_{new}^{Λ}$ after the merger, which is called the log-likelihood distance, defined as: $\begin{matrix} d (j, s) = \overset{Λ}{l} - l_{new}^{Λ} = l_{j}^{Λ} + l_{s}^{Λ} - l_{< j, s >}^{Λ} \\ = ξ_{j} + ξ_{s} - ξ_{< j, s >} \end{matrix}$ (3) where ξ is the specific form of the log-likelihood function, defined as: $ξ_{v} = - N_{v} [\sum_{k = 1}^{K^{Λ}} \frac{1}{2} log (\overset{Λ}{σ} k 2 + \overset{Λ_{vk}^{2}}{σ}) + \sum_{k = 1}^{K^{B}} {\overset{Λ}{E}}_{vk}]$ (4) $E_{vk}^{Λ} = - \sum_{l = 1}^{L_{k}} \frac{N_{vkl}}{N_{v}} log (\frac{N_{vkl}}{N_{v}})$ (5) where $\overset{Λ_{k}^{2}}{σ}$ and $\overset{Λ_{vk}^{2}}{σ}$ represent the total variance of the k numeric variable and the variance in the v category, respectively. N_v and N_vkl represent the sample size of category v and the sample size of the k subtype variable taking the l category in category v, respectively. The k subtype variable has the L_k category. The purpose of introducing $\overset{Λ_{k}^{2}}{σ}$ in Equation (4) is to solve the problem that the logarithm cannot be calculated when the variance of the v category is 0.

2.1.4 K-means algorithm

K-means is a classical division-based clustering method that uses the Euclidean distance as the similarity measurement method and iteratively searches for cluster centers [17]. It has the advantages of high efficiency, low computational complexity, easy implementation, and applicability to large-scale datasets [25] and is widely used in clustering integration research [1]. The basic functions are as follows:

Assuming that the data set is X = {x₁, x₂, . . . x_i, . . . , x_n}, x_i ∈ R^d, n represents the number of samples, d represents the feature dimension of the sample, x_i = (x_i1, x_i2, . . . , x_id). C ={ C₁ , C₂, . . . , C_j, . . . , C_k } denotes the division of X into k clusters, then: $E = \sum_{j = 1}^{k} \sum_{x_{i} \in C_{j}} d (x_{i}, C_{j})$ (6) where $d (x_{i}, C_{j}) = \sqrt{\sum_{r = 1}^{d} {(x_{ir} - C_{jr})}^{2}}$ represents the Euclidean distance between sample x_i and the belonging cluster center C_j, and E represents their distance sum.

These four clustering algorithms are heterogeneous and perform well in a wide range of applications, and they have significant differences in learning rate, efficiency, ability to handle outliers, discrete values and noisy data, and efficiency of processing large sample data to meet the requirements of ensemble learning for individual learners.

2.2 Ensemble learning algorithm

2.2.1 Stacking ensemble learning

Stacking [5] and bagging [22] are two mainstream ensemble learning methods that are widely used. Stacking is popular among hybrid models by fusing multiple classification models through a meta-learner, which can build all-purpose algorithms in the form of complementary strengths and improve prediction efficiency [29] and has the advantages of simple structure, high performance, and the ability to fuse multiple heterogeneous learners [14]. As shown in Fig. 1, it usually contains two layers. In the first layer, several base learners are trained based on complete raw data to obtain corresponding prediction results, and then in the second layer, the meta-learner is fitted based on individual outputs and outputs the final results [14].

Fig. 1

Schematic diagram of the stacking ensemble learning structure.

Stacking ensemble learning requires that the base learners in the first layer have high diversity and accuracy, and the meta-learner in the second layer is concise, efficient, and low in complexity [14]. Therefore, in this paper, we comprehensively consider using GMM, Spectral, and Twostep as base learners and using K-means as the meta-learner.

2.2.2 Bagging ensemble learning

Bagging is the most famous representative of parallel ensemble learning methods, where there is no dependency between the base learners and strong learner models can be efficiently constructed by parallel training [13]. As shown in Fig. 2, during the bagging ensemble learning process, the base learners are trained in parallel and obtain prediction results. Then, the majority voting method is often used to integrate the prediction results of the base learners and obtain the final class label [26].

Fig. 2

Schematic diagram of the bagging ensemble learning structure.

3 A clustering-based ensemble model for stratifying long-tail customers in civil aviation

Algorithm 1: Stacking clustering ensemble process.
Input:
S = {s₁,s₂,s₃,. . . . . .,s_n } // Customer behavior characteristics data, where s_i represents different customers (sample points) and n represents the number of samples.
Output:
C = {C₁,C₂,. . . . . .,C_K} // K clusters, where C_i represents the different clusters.
1: S’ = { r₁,r₂,r₃,. . . . . .r_n } // Define a new data set of the same size as S, initially empty.
2: BIC←S // Determine the optimal number of clusters k according to BIC.
3: return k
//Three different base learners are used to cluster the data set.
4: H₁ = GMM(S,k) // The base learner GMM clusters S and obtains the clustering result H₁.
5: H₂ = Spectral(S,k) // The base learner Spectral clusters S and obtains the clustering result H₂.
6: H₃ = Twostep(S,k) // The base learner Twostep clusters S and obtains the clustering result H₃.
// Use a for loop to obtain the category res of each sample in H₁, H₂, and H₃, and add each sample point s_i and its corresponding category res to the new data set S’.
7: for i = 1 to len(S) do
8: res = getBaseClusterClasses(s_i, H₁,H₂,H₃)
9: S’.append((s_i, res))
10: end for
//Use the meta-learner K-means to cluster the new data set S’ and obtain the clustering result H₄.
11: H₄ = K-means(S’,k)
//Use a for loop to obtain the final class of each sample point in H₄, and then, based on the obtained class, group all data belonging to the same class into a cluster C_j.
12: for i = 1 to len(S’) do
13: class = getFinalClusterClasses(r_i, H₄)
14: j = getFinalClass(class)
15: C_j = C_j{r_i}
16: end for
17: return C_j // Return the clustering results of the Stacking algorithm.

The long tail theory, first proposed by Chris Anderson in 2004, suggests that as long as storage and distribution channels are large enough, poorly selling products will occupy a market share similar to that of a few popular products [4]. In the Internet era, the success of online retailers is largely attributed to the profitable and previously untapped “tail market” [27]. The implication of this for customer relationship management is that the large base of long-tail customers as a large group is a potentially more concentrated market that can be lucrative for companies to value and tap into long-tail customers [31]. Moreover, in the era of big data, enterprises can make use of big data for customer maintenance and market expansion and tap long-tail customers to further enhance their competitiveness and operational efficiency [34]. Therefore, in this paper, from the perspective of long-tail theory, an ensemble learning method is used to establish a long-tail customer stratification model to more accurately distinguish long-tail customers with similar consumption behaviors.

First, civil aviation long-tail customers are identified, then stratification indicators are constructed based on the consumption behavior characteristics of long-tail customers, and the optimal number of clustering clusters is determined based on the Bayesian Information Criterion (BIC). Next, each base learner unfolds according to the optimal number of clustering clusters to obtain the primary clustering result, and two ensemble methods are used for ensemble clustering. Finally, the clustering performance evaluation indicators are used to evaluate the clustering effect and obtain the optimal stratification result. The process of long-tail customer stratification is shown in Fig. 3. The pseudo code for the Stacking clustering ensemble process is shown in Algorithm 1, and the pseudo code for the Bagging clustering ensemble process is shown in Algorithm 2.

Fig. 3

Long-tail customer clustering ensemble hierarchy flow chart.

Algorithm 2: Bagging clustering ensemble process.
Input: S = {s₁,s₂,s₃,. . . . . . ,s_n } // Customer behavior characteristics data, where s_i represents different
customers (sample points) and n represents the number of samples.
Output: C = {C₁,C₂,. . .. . .,C_K} // K clusters, where C_i represents the different clusters.
1: S’ = { r₁,r₂,r₃,. . .. . .r_n } // Define a new data set of the same size as S, initially empty.
2: BIC←S // Determine the optimal number of clusters k according to BIC.
3: return k
//Three different base learners are used to cluster the data set.
4: H₁ = GMM(S,k) // The base learner GMM clusters S and obtains the clustering result H₁.
5: H₂ = Spectral(S,k) // The base learner Spectral clusters S and obtains the clustering result H₂.
6: H₃ = Twostep(S,k) // The base learner Twostep clusters S and obtains the clustering result H₃.
//Use a for loop to assign each data point to the cluster category with the highest occurrence count, resulting in the clustering results of Bagging ensemble learning.
7: for i = 1 to len(S) do
8: res = getBaseClusterClasses(s_i,H₁,H₂,H₃) // For each data point s_i in the data set, use the
get Base Cluster Classes function to obtain its category res in the
three clustering results of H₁, H₂, and H₃.
9: m = get Max Occurrence Class(res) // Use the get Max Occurrence Class function to obtain the
category m with the highest occurrence of each data
point s_i in the three clustering results.
10: Cm = Cm∪{si} // Assign s_i to the cluster category with the highest occurrence count.
//Use the meta-learner K-means to cluster the new data set S’ and obtain the clustering result H₄.
11: end for
12: return C_m // Return the clustering results of the Bagging algorithm.

3.1 Long-tail customer identification

To accurately identify long-tail customers, this paper improves the classic RFM model based on the classic customer stratification framework proposed by Lawrence and other scholars, as well as the consumption behavior characteristics of civil aviation customers. The RFMC model is constructed, and then the Self-Organizing Map (SOM) neural network clustering is adopted.

3.1.1 Data preprocessing

The original data consisted of 62,988 records with 44 attribute features. Among them, there are 689 records containing null values and 1036 records with abnormal values, totaling 1725 records, which account for a relatively small number. The total number of valid data after cleaning is 61263.

3.1.2 RFMC model construction

Transaction time probability denotes the likelihood that a customer will spend money again, and is denoted by the letter R. Therefore, it is assumed that the consumption behavior of the same customer conforms to the standard normal distribution, and the probability of repurchase reaches the peak of the standard normal distribution curve when the time interval between the latest consumption is equal to the average consumption interval.

Let the distance from the last transaction time be x, and t be the average flight interval of the customer. Because it is a standard normal distribution, the mean u = 0 and the standard deviation σ = 1. The formula is as follows: $f (x) = \frac{1}{σ \sqrt{2 π}} \times e^{\frac{- {(3 \frac{x σ}{t} - 3 σ - μ)}^{2}}{2 σ^{2}}}$ (7) When the actual flight interval is equal to the average flight interval, the possibility of customers making another purchase is the highest.

Transaction frequency indicates the number of customer transactions and is denoted by the letter F.It reflects the level of customer activity. The formula is as follows: $F_{i}^{'} = \frac{F_{i} - \bar{F}}{S}$ (8) where F_i denotes the total number of flights for the i customer, $\bar{F}$ denotes the mean of the total number of flights for all customers, S denotes the standard deviation of the total number of flights for all customers, and F_i′ is the normalized transaction frequency score.

The average kilometre price represents the average unit fare per kilometre flown and is denoted by the letter M. The calculation is as follows: $M = \frac{SUM_YR_1 + SUM_YR_2}{SEG_KM_SUM}$ (9) where SUM_YR_1 and SUM_YR_2 denote the total fares in the first and second years, respectively, and SKG_KM_SUM denotes the total number of flight kilometers within the observation window.

Cost-to-service indicates the cost paid by the company in the whole process of selling products and providing services to customers and is denoted by the letter C. The airlines provide various value-added services to retain customers, such as customers can redeem their points for a proportion of goods or enjoy services such as upgrades in the company’s official website mall, or enjoy different discounts on air ticket services through membership levels. This is all a cost of service paid by the company.

According to the research of Lawrence and other scholars, when selecting indicators to measure the service cost of enterprises, the four aspects of feasibility, measurability, reliability and commonality of data indicators can be considered [8], and the actual situation of airline enterprises and the measurability of the value of value-added services provided are combined to calculate the service cost of customers by using the total accumulated points cost and the average discounted fare, with the formula: $C = a X_{1} + b X_{2}$ (10) where X₁ represents the total cumulative point cost, X₂ represents the average discount ticket price, and a and b respectively represent the weights of indicators X₁ and X₂. The redemption policy of a certain airline is 40 : 1, which means that products can be purchased with points that are 40 times the value of the product price, so: $X_{1} = \frac{Po int s_Sum}{40}$ (11) $X_{2} = avg_discount \times (SUM_YR_1 + SUM_YR_2)$ (12) where Points_Sum represents the total accumulated points, avg_discount represents the average discount rate, and SUM_YR_1 and SUM_YR_2 respectively represent the total ticket price for the first and second year.

Indicator weights a and b were determined using a high confidence objective indicator weight determination method, the entropy weight method [36], in the following steps:

First, the data are standardized to eliminate the effects of magnitudes: $x_{ij} = \frac{v_{ij} - min (v_{j})}{max (v_{j}) - min (v_{j})}$ (13) where v_ij and x_ij denote the values of the i evaluation object before and after standardization under the j indicator (assuming there are m evaluation objects and n evaluation indicators), respectively.

Second, the information entropy e_ij of each indicator is: $e_{j} = - \frac{1}{ln (m)} \sum_{i = 1}^{m} p_{ij} ln p_{ij}$ (14) $p_{ij} = \frac{x_{ij}}{\sum_{i = 1}^{m} x_{ij}} (i = 1, 2, 3 . . . n)$ (15) where P_ij denotes the weight of the j indicator in the i record, and if P_ij = 0, then $lim_{p_{ij} \to 0} p_{ij} ln p_{ij} = 0$ .

Finally, the weights of each indicator w_j are determined: $w_{j} = \frac{1 - e_{j}}{\sum_{j = 1}^{n} 1 - e_{j}}$ (16)

3.1.3 Standardization

$z_{i} = \frac{i - \bar{i}}{s}$ (17)

The four indicators after feature extraction are standardized by Z-score to eliminate the effect of different magnitudes. Where Z_i denotes the value of the i feature after standardization, i denotes the value of the four feature metrics taken, $\bar{i}$ denotes the mean value of each feature, S denotes the standard deviation of each feature value, and Z_i denotes the value of each feature after standardization.

3.1.4 SOM neural network clustering

Using the Self-Organizing Map (SOM) neural network to cluster the standardized RFMC model, as shown in Fig. 4, four customer groups with significantly different behavioral differences were obtained.

According to Lawrence’s classic customer stratification framework [9] and their mean scores on the four RFMC indicators to stratify them as core, opportunistic, service drain and marginal customers, the results are shown in Table 1.

Fig. 4

Radar chart of customer characteristics.

Table 1

SOM neural network stratification results

Category	R-index mean	F-index mean	M-index mean	C-index mean	Number	Percent	Type of customer
0	0.3938	3.6559	0.8177	4.0217	1799	2.9%	Core Customers
1	–0.4289	0.4145	1.1247	0.3266	9215	15.0%	Opportunistic Customers
2	–0.3116	–0.3895	–0.3245	–0.3425	39722	64.8%	Marginal Customers
3	1.4841	0.4821	0.1001	0.3194	10527	17.1%	Service drain customers
Total					61263	100%

As seen from Table 1, the average value of marginal customers on each indicator is very low, indicating that they have contributed little value to the enterprise in the past. According to the long tail theory, although the marginal customers bring little value to the enterprise in the past, they are large in number, and the total value they contribute to the enterprise is almost half of the total value contributed by all customers. Therefore, this large base of marginal customers is the enterprise’s long-tail customers, and the other three categories are called the enterprise’s high-value customers. In this paper, we will dig deeper into the growth potential of long-tail customers according to the long-tail theory.

3.2 Long-tail customer evaluation index construction

According to the characteristics of long-tail customers’ consumption behavior and long-tail theory, the long-tail customers’ stratification evaluation index model is constructed.

Membership length indicates the length of time from when a customer joins the company’s membership to the statistical deadline, and is denoted by the letter L. The longer the time, the higher the customer’s dependence on the company, and the higher the potential contribution to the company’s revenue [35]. The formula is as follows: $L = LOAD_TIME - FFP_DATE$ (18) where LOAD_TIME denotes the statistical cut-off time and FFP_DATE denotes the membership entry time.

The membership level indicates the level of the member’s customer [18], and is denoted by the letter T. The higher the level, the more customers understand the enterprise-related membership hierarchy and marketing strategies and are more loyal to the enterprise, with certain growth potential.

The ticket price difference represents the total ticket price of the member customer in the second time period minus the total ticket price in the first time period [3], and is denoted by the letter P.The larger the difference, the more it indicates that the customer has growth potential and is likely to bring more value to the business in the future. The formula is as follows: $P = SUM_Y R_2 - SUM_Y R_1$ (19) where SUM_YR_2 denotes the total fare spent on air travel in the second year over the two-year statistical period and SUM_YR_1 denotes the total fare in the first year.

The difference in the number of flights represents the number of flights taken by the member customer in the latter period minus the number of flights taken in the former period in two identical time periods [3], which is denoted by the letter N. The greater the difference, the more it indicates that the customer’s frequency of flying with the airline is constantly increasing and is more likely to generate more consumption in the future, bringing more profits to the company. The formula is as follows: $N = L 1 Y_F light_C ount - P 1 Y_F light_C ount$ (20) where L1Y_Flight_Count denotes the number of flights taken in the second year during the two-year statistical period and P1Y_Flight_Count denotes the number of flights taken in the first year.

The difference in mileage points represents the total points obtained by a member customer during the second time period of two identical time periods minus the total points obtained during the first time period [3], which is denoted by the letter D. The greater the difference, the more it indicates that the member customer has certain growth potential. The formula is as follows: $D = L 1 Y_B P_S UM - P 1 Y_B P_S UM$ (21) where L1Y_BP_SUM indicates the mileage credit for the second year within a two-year statistical period and P1Y_BP_SUM indicates the mileage credit for the second year.

Cross-purchase represents the number of other points earned by a member customer through cross-purchase in the latter period minus the number of other points earned in the former period during two identical time periods, and is denoted by the letter G. The greater the difference in other points earned through cross-purchase, the greater the growth potential of the member-customer, who buys other items from partners and accumulates points in addition to airline tickets. The point accumulation policy of a loyalty alliance is “accumulate 1 point for every 1 yuan spent”, so the formula for calculating cross-purchase G is as follows: $\begin{matrix} G = ADD_P OINTS_S UM_Y R_2 - \\ ADD_P OINTS_S UM_Y R_1 \end{matrix}$ (22) where ADD_POINTS_SUM_YR_2 denotes other points earned in the second year through cross-purchase during the two-year statistical period and ADD_POINTS_SUM_YR_1 denotes other points earned in the first year.

After extracting the indicators, standardize them to eliminate the dimensional influence according to formula (17), and then use the entropy weight method shown in formulas (13)-(16) to determine the weights of each indicator.

3.3 Determination of the optimal number of clusters

The Bayesian Information Criterion (BIC) is one of the most popular model selection criteria [15], designed based on the idea of Bayes’ theorem and information entropy, which can predict the number of clusters based on the data distribution with the following equation: $BIC = kln (n) - 2 ln (\hat{L})$ (23) where n is the sample size, k is the number of parameters of the model, and $\hat{L}$ is the maximum value of the maximum likelihood estimate. A smaller BIC value indicates that the model is able to reduce the model complexity while maintaining the goodness of fit, so a smaller BIC value corresponds to a better model.

In this paper, the number of clusters is determined according to the BIC in the Twostep clustering process with the following steps:

First, perform preclustering: This part adopts the idea of clustering feature tree (CF) growth in the Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) algorithm. It scans the data records in sequence from the data set and judges whether the current record should be merged with any previously constructed dense area or formed into a singleton based on the distance standard. During the construction of the CF tree, the data points in the dense areas are preclustered to form many small subclusters. In this process, the rough estimation of the number of clusters in the data is calculated based on the BIC [28].

Then, formal clustering is performed: Using the subclusters obtained in the preclustering stage as objects, the agglomerative hierarchical clustering method is used to merge the subclusters one by one. Each time a merge is performed, the number of clusters is adjusted based on the distance standard to reduce the estimated number of clusters in the first stage to the true number of clusters, which is the optimal number of clustering clusters [28].

3.4 Clustering integration model

3.4.1 Stacking-based clustering ensemble

The ensemble process is shown in Fig. 5 and is as follows:

Fig. 5

Stacking-based clustering ensemble process.

Step 1: The input data are the LTPNDG model indicator data, represented by S, where S = {s₁,s₂, . . . ,s_n}. There are a total of 39,722 long-tail customer data points, so n = 39722. Each base learner is represented by H_j(j = 1, 2, 3), and the meta-learner K-means is represented by H₄. The clustering results of each base learner are represented by R_j ={ α_j⁽ⁱ⁾, β_j⁽ⁱ⁾, γ_j⁽ⁱ⁾ }, where i = 1, 2, 3, j = 1,2,3. Here, i represents different clusters, and j represents different base learners. For example, the output result of H₁ can be represented as R₁ ={ α₁⁽¹⁾, β₁⁽²⁾, γ₁⁽³⁾ }.

Different algorithms have different output results. For example, the sample size and similarity of clusters α₁⁽¹⁾ in the output results of H₁ and cluster α₂⁽¹⁾ in the output results of H₂ have great differences, making it impossible to directly integrate them. Therefore, the relabeling method is needed to make similar clusters in different clustering results obtain the same label.

Step 2: Relabeling method to calibrate similar cluster labeling: randomly select the results of a base learner as a reference, such as the H₁ clustering results as a benchmark, the clusters similar to α₁⁽¹⁾ are relabeled as α₁^(1)’, α₂⁽¹⁾’ and α₃⁽¹⁾’, then α_j^(1)’, (j = 1, 2, 3) indicates the 1st class in the different clustering results, corresponding to Fig. 5 shows that the clusters similar to α₁⁽¹⁾, β₁⁽²⁾ and γ₁⁽³⁾ are relabeled as the same color and denoted by α_j^(1)’, (j = 1, 2, 3), β_j^(2)’, (j = 1, 2, 3) and γ_j^(3)’, (j = 1, 2, 3) respectively.

Step 3: Weight the output results of each base learner using silhouette coefficients and then merge them into data set S. First, calculate the silhouette coefficient λ_j (j = 1, 2, 3) for each base clustering algorithm and then multiply by the corresponding weight coefficient w : w = 0.3 × λ_jfor each element in R_j. Then, merge the output results of each base learner to obtain result set R = R₁ ⋃ R₂ ⋃ R₃, and finally combine it with the original data set S to obtain the new data set S′, S′ = S ⋃ R.

The multiplication of the silhouette coefficient by 0.3 is based on the extreme value distribution of elements in the data set S to avoid the difference in scale between newly inserted data and original data from affecting the clustering results.

Step 4: The new data set is used as the input of meta-learner H₄ for the cluster ensemble to obtain the final clustering results.

3.4.2 Bagging-based clustering ensemble

The ensemble process is shown in Fig. 6 and is as follows:

Fig. 6

Ensemble process of clustering based on bagging.

Step 1: Input data set S. After clustering by each base learner H_j(j = 1,2,3), the results are R_j = { α_j⁽ⁱ⁾, β_j⁽ⁱ⁾, γ_j⁽ⁱ⁾ } , i = 1, 2, 3, j = 1, 2, 3.

Step 2: Use the relabeling method to calibrate the similar cluster labels as above.

Step 3: Using the voting method for the clustering ensemble: For the same sample point, if the clustering results of three base learners all consider it to belong to category 1, then the voting ensemble result is 1. If two base learners consider it to belong to category 3 and one considers it to belong to category 2, then the minority follows the majority, and this customer belongs to category 3. If the output results of three base learners are all different, then a category is randomly selected as the final category of this sample. From this, we obtain the final clustering result for each sample.

3.5 Clustering performance evaluation metrics

Cluster evaluation is a crucial step in assessing the performance of clustering methods in identifying relevant groups, which helps to analyse whether one method is superior to another. The following indicators are often used to evaluate the performance of clustering algorithms.

3.5.1 Silhouette coefficient

The silhouette coefficient (SC) consists of two components: the degree of cohesion and the degree of separation, with the degree of cohesion reflecting the closeness of a sample point to the intraclass elements and the degree of separation reflecting the closeness of a sample point to the extra class elements [2]. Therefore, the silhouette coefficient evaluates the clustering effect comprehensively by calculating the dissimilarity within and between clusters. The formula is as follows: $SC (k) = \frac{b (k) - a (k)}{max {a (k), b (k)}}$ (24) Where the cohesion a(k) denotes the average of the distance between sample point X_k to other sample points in the same cluster; the separation b(k) denotes the average of the distances between all sample points in the nearest class. The silhouette coefficient of all sample points in the data set is the average value of the silhouette coefficient SC of the data set. Therefore, SC ∈ [-1, 1], and the larger the value is, the better the clustering effect.

3.5.2 Calinski-harabasz coefficient

The Calinski-Harabasz (CH) coefficient judges the degree of compactness within each cluster and the degree of separation between clusters by the ratio of the difference between intercluster distance and intracluster distance [2]. The formula is as follows: $CH (k) = \frac{tr (B_{k})}{tr (W_{k})} \times \frac{n - k}{k - 1}$ (25) where tr(.) represents the trace of the matrix, B_k represents the covariance matrix of each cluster, W_k represents the covariance matrix within the cluster, n represents the number of samples, and CH (k) ∈ [-1, 1]. Therefore, the larger the CH coefficient is, the better the clustering quality.

3.5.3 Davies-bouldin index

The Davies-Bouldin Index (DBI) reflects the tightness of samples within the same cluster and the separation of samples between other clusters, and the smaller the value is, the better the clustering effect [24]. The formula is as follows: $DBI (k) = \frac{1}{k} \sum_{i = 1}^{k} max_{j \neq i} {\frac{S_{i} + S_{j}}{d_{ij}}}$ (26) where $S_{i} = \frac{1}{n} \sum_{x \in C_{i}} ∥ x - z_{i} ∥$ represents the tightness of samples within cluster C_i, d_ij =∥ z_i - z_j ∥represents the dispersion between cluster C_i and C_j, n_i represents the number of samples contained in cluster C_i, z_i represents the mean of C_i for the i cluster, z_j is the mean of C_j for the j cluster, and ∥… ∥ represents the calculation of Euclidean distance.

4 Empirical study

4.1 Experimental environment and data

The experiments were implemented under Python 3.8 on an AMD Ryzen 7 4800 U with Radeon Graphics1.80 GHz processor. The data is sourced from the “Teddy Cup” National College Student Data Mining Competition website, with the URL being http://www.tipdm.org/ts/661.jhtml. The data contains the flight records of all customers within two years, with a total of 62988 records and 44 attributes. Some of the data is shown in Table 2.

Table 2
Raw data

MEMBER _NO FLIGHT_COUNT SUM_YR_1 AVG_INTERVAL avg_discount Points_Sum . . .

54993 210 239560 3.483253589 0.961639043 619760 . . .

28065 140 171483 5.194244604 1.25231444 415768 . . .

21189 23 116350 27.86363636 1.090869565 372204 . . .

. . . . . . . . . . . . . . . . . . . . .

MEMBER _NO	FLIGHT_COUNT	SUM_YR_1	AVG_INTERVAL	avg_discount	Points_Sum	. . .
54993	210	239560	3.483253589	0.961639043	619760	. . .
28065	140	171483	5.194244604	1.25231444	415768	. . .
21189	23	116350	27.86363636	1.090869565	372204	. . .
. . .	. . .	. . .	. . .	. . .	. . .	. . .

4.2 Experimental results analysis

After constructing the LTPNDG index by the above steps, the optimal number of clusters determined by the BIC was 3. Each clustering algorithm was clustered by the optimal number of clusters, and then three clustering performance evaluation indexes were used to evaluate the clustering effect. The comparison results are shown in Fig. 7.

Fig. 7

Algorithm performance comparison chart.

From Fig. 7, it can be seen that compared with single clustering algorithms, the Stacking algorithm increases the silhouette coefficient by 14.77% to 27.11%, the CH coefficient by 38.83% to 122.18%, and the DBI index by 19.38% to 98.04%. The experimental results show that each cluster generated by S tacking integrated clustering has higher cohesion and separation. This indicates that within the same category, the similarity between data samples is higher and the distribution is more compact, so customers belonging to the same category have higher similarity and similar consumption behavior habits. On the other hand, between different categories, data samples have higher heterogeneity and the distribution is more scattered, indicating that customers’ consumption preferences and purchasing habits vary greatly between different categories, requiring companies to manage them differently. Therefore, the Stacking clustering ensemble algorithm more accurately divides long-tail customers into different categories and achieves customer stratification.

In the Stacking ensemble learning process, the first layer uses three heterogeneous base learners for primary clustering to explore the distribution structure of the data from different perspectives, providing richer clustering information for the second-layer clustering. The meta-learner integrates these information using weighted integration to fully utilize the advantages of the base learners. Compared with a single clustering algorithm, ensemble learning can more accurately stratify clusters based on the distribution of the data, and is less affected by outliers in the data, improving clustering accuracy.

Compared with the clustering ensemble algorithm based on Bagging, the Stacking algorithm increases the silhouette coefficient by 29.45%, the CH coefficient by 50.64%, and the DBI index by 21%, significantly improving clustering performance. This indicates that the S tacking algorithm’s integration method is clearly superior to the Bagging algorithm’s integration method. The Stacking algorithm uses a more complex hierarchical model for integration. After the first-layer base learners complete clustering, the silhouette coefficients of each base clustering result are calculated and used to weight each clustering result accordingly. The greater the silhouette coefficient value, the better the clustering effect of the base learner, and the greater its weight. Therefore, in the second layer, weighted integration of the base learners’ clustering results with the original data is performed and input into the meta-learner for clustering. The greater the weight of a base learner, the greater its contribution to improving clustering effectiveness, thereby enhancing the accuracy of the ensemble learning model. The Stacking integration method takes into account the accuracy of each base learner’s clustering and weights can reflect each base learner’s contribution, highlighting better-performing base learners and weakening those with poor performance, thereby improving the clustering effectiveness of integrated learning models. In contrast, the Bagging algorithm adopts a majority voting strategy after each base learner completes clustering, only considering the result with the most votes and not considering differences in clustering performance among different base learners. Therefore, it cannot fully leverage the strengths of each base learner, resulting in poor ensemble effects.

Therefore, the stacking clustering ensemble result is the final stratification result for long-tail customers, and the clustering centers are shown in Table 3.

Table 3

Stacking clustering ensemble results

Category	L-index mean	T-index mean	P-index mean	N-index mean	D-index mean	G-index mean	Number	Percent
0	0.068368644	0	–0.000007301	–0.000040570	–0.000002036	–0.000000130	16359	41.18%
1	–0.049026810	0	–0.000021878	0.000006067	–0.000006348	0.000000078	22901	57.65%
2	0.009351380	1.297008441	0.001343031	0.001135778	0.000386791	0.000000749	462	1.16%
Total							39722	100%

From Table 3, it can be seen that the 39,722 long-tail customers are divided into three customer groups with significant behavioral differences. Among them, the membership time L of Category 0 customers is the longest, indicating that they are old customers of the company; however, their membership level T is relatively low, indicating that the membership level of these customers is also relatively low. Although they became members of the company very early, they do not understand various marketing policies such as membership upgrades and points. The number of times they fly with company N, the ticket price P, the accumulated mileage points D, and the cross-purchase volume G at the partner of the alliance are all decreasing, indicating that they have no growth potential and no willingness to deepen their relationship with the company. They are old customers with no growth potential.

Category 1 customers have the shortest membership time L, indicating that they are new customers of the company; their membership level T is also low, indicating that they do not yet understand various point promotions and membership upgrade systems. However, the number of times they fly with company N is increasing, and the difference in ticket prices P and mileage points D is decreasing, indicating that they are travelling more frequently over short distances. The cross-purchase volume G at the partner of the alliance is also increasing. Overall, although they are new customers of the company, they have a certain degree of recognition and satisfaction with the company’s products and services and may have a strong willingness to continuously understand and improve their relationship with the company. They have great growth potential and should be given priority by the company.

Category 2 customers have a longer membership time L, indicating that they are old customers of the company; their membership level T is the highest, indicating that the membership level of these customers is very high, and they are quite familiar with the company’s membership upgrade and related systems. The number of times they fly N is significantly increasing, and the ticket price P and mileage points D are also significantly increasing, indicating that they have high consumption ability and demand for air travel. For the company, they are of great value and can bring substantial profits to the company. At the same time, they often purchase other types of goods at the company’s partner, with a large cross-purchase volume G. Overall, this group of customers is more familiar with the company, actively participating in and familiar with various systems, and has high recognition and loyalty to the company. They also have great growth potential and are old customers with high growth potential, which can bring substantial profits to the company in the future.

4.3 Marketing response

4.3.1 Old customers with high growth potential

Through the above analysis, it can be seen that the customers in category 2 belong to the long-tail customers with high growth potential. The existing customer stickiness, recognition, loyalty or consumption habits of these customers have been increasing their consumption in the company and have the potential to continue to grow. Companies should strengthen their communication actively manage customer relationships, and attach great importance to the establishment, maintenance and development of long-term and deep relationships with these customers. The number of such customers is relatively small, accounting for only 1.16%. Enterprises can adopt one-to-one and other refined marketing management, targeted to explore their deep or new consumption and service needs, the development of differentiated marketing strategies to stimulate them in the enterprise, the same business alliance or partners of different business alliances to generate new consumption and the maintenance and increase of the original consumption, and constantly bring profits to the enterprise.

4.3.2 New customers with high growth potential

Through the above analysis, it can be seen that the customers in category 1 belong to the new customers with high growth potential in the long-tail customers, their understanding of the company is not deep enough, and the relationship they have established with the company is relatively shallow at present. However, from a comprehensive point of view, they have the willingness to further deepen their understanding and develop long-term relationships with the company and have great growth potential, which can continuously bring profits to the company in the future. Companies should actively communicate, maintain and develop long-term and positive relationships with these customers to maximize their value. For example, enterprises can popularize their marketing policies on time and regularly remind and help customers to redeem points and upgrade their membership. It can avoid wasting points and increase customers’ activity and loyalty at the same time. It can also track their consumption dynamics and later recommend different products or service combinations according to different customers’ consumption levels and abilities to gain more profits through cross-selling.

4.3.3 Old customers without growth potential

The above analysis shows that customers in category 0 belong to the long-tail customers with no growth potential. These customers not only bring little value to the company in the past but also have no potential for growth, which means it is difficult to bring profits to the company in the future. From a comprehensive perspective, they basically have no exploitable value, and companies do not need to pay more to manage them.

5 Conclusion

5.1 Research conclusion

Faced with the problems of insufficient attention to long-tail customers in previous research on civil aviation, shallow value mining, low accuracy and lack of accuracy verification of single clustering algorithms, this paper innovatively constructs a civil aviation long-tail customer stratification model based on a clustering ensemble from the perspective of long-tail theory. The model determines the optimal number of clustering clusters through the BIC, solves the problem that different base learners require the same and reasonable number of clustering clusters, and improves the clustering accuracy. Second, two mainstream ensemble learning methods are used for clustering ensemble, solving the problems of low accuracy and robustness of single clustering algorithms. Then, with the help of three clustering performance evaluation indicators, the clustering effect is evaluated to solve the problem of accuracy verification. The experimental results show that the clustering ensemble algorithm based on Stacking achieves better clustering results than traditional algorithms. This indicates that compared to single algorithms, Stacking can emphasize base learners with better performance and weaken those with poor performance during the ensemble process, thereby improving clustering accuracy and enabling more accurate stratification of long-tail customers with similar consumption behavior.

The limitation of this research is that it only focuses on one data set from the civil aviation industry, lacking considerations of diversity. Future research will consider the validation of machine learning models using different datasets to improve the accuracy and generalization of the models. In addition, algorithm improvement will also be considered to improve the accuracy of clustering, and to build a long-tail customer stratification management system that better fits the actual needs of the enterprise, so that the enterprise can better adapt to the development of the digital era.

5.2 Theoretical and practical implications

5.2.1 Theoretical implications

Firstly, this paper combines the long-tail theory and the customer stratification research in civil aviation, which enriches the application research of long-tail theory in civil aviation customer stratification. Secondly, it considers the impact of service cost on civil aviation customer value, enriches the indexes of civil aviation customer stratification, and constructs the RFMC model, which is in line with the characteristics of civil aviation customers’ consumption behaviour, and the LTPNDG model, which reflects the growth potential of civil aviation long-tailed customers, and can measure the value of civil aviation customers in a more comprehensive way. Finally, the ensemble learning method in machine learning is applied to solve the problem of civil aviation long-tail customer stratification, which promotes the cross-fertilization between disciplines.

5.2.2 Practical implications

Customers are an important channel for companies to gain market share and profits. The cost of keeping an old customer is much lower than the cost of getting a new customer. Moreover, for a service-oriented industry like civil aviation enterprises, it is more important to retain more valuable old customers. Therefore, from the perspective of long-tail theory, this paper explores the value of long-tail customers in depth, which can help civil aviation enterprises retain more valuable customer resources. Taking different marketing management measures for the value and consumption behaviour characteristics of different types of customers can also help civil aviation enterprises optimize customer relationship management and resource allocation. This is of great practical significance for them to cultivate competitive advantages in the fierce market competition. In the era of big data, the use of machine learning methods for the optimization of enterprise marketing management strategies can break through the limitations of traditional marketing planning, help enterprises achieve digital transformation, and better adapt to the development of the times.

6 Funding

The work in this paper was supported by the Research on customer stratification management based on cost to service measurement in Internet Era (Philosophy and social science project (TJGL18-036), Tianjin, China), the Stratification research based on civil aviation customer value: from the perspective of long tail theory (2022 Tianjin postgraduate scientific research innovation project (2022SKYZ311), Tianjin, China) and the Impact of utilitarian versus hedonistic goal conflict on continuous consumption decisions: the chain mediation role of purchase decision involvement and consumer participation (2022 Tianjin postgraduate scientific research innovation project (2022SKY336), Tianjin, China).

References

Banerjee

, Pujari

A.K.

, Panigrahi

C.R.

et al. A new method for weighted ensemble clustering and coupled ensemble selection, Connection Science 33(3) (2021), 623–644. doi: 10.1080/09540091.2020.1866496.

Yang

C.M.

, Liu

, Wang

Y.T.

et al. A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm, Symmetry 14(7) (2022), 1442. doi: 10.3390/sym14071442.

Yanjie

Research on the value of airline member customers based on data mining technology, Fuzhou University (2015).

Anderson

, The long tail, Wired Magazine 12(10) (2004), 170–177.

Wolpert

D.H.

, Stacked generalization, Neural Networks 5(2), (1992), 241–259. doi: 10.1016/s0893-6080(05)80023-1.

Weichen

, Long tail theory perspective of commercial banks’ personal customer development and services, Fujian Finance 445(05) (2022), 73–76.

Chen

F.Y.

, Tu

S.L.

and Wang

H.E.

, Green Market Segmentation: A Caseof Airline Customers in Taiwan, Journal of SustainableDevelopment 9(1) (2016), 99. doi: 10.5539/jsd.v9n1p99.

Lawrence

F.B.

, Gunasekaran

, Krishnadevarajan

Customer stratification: Best practices for boosting profitability, NAW Institute for Distribution Excellence (2011).

Lawrence

F.B.

, Pa

Sales and Marketing Optimization, HVACR Distribution Business (2010), 30–34.

10.

Tirenni

, Kaiser

and Herrmann

, Applying decision trees for value-based customer relations management: Predicting airline customers’ future values, Journal of Database Marketing & Customer Strategy Management 14(2) (2007), 130–142. doi: 10.1057/palgrave.dbm.3250044.

11.

Feijie

, Xuming

et al. Application of clustering integrationalgorithm in customer segmentation model, Journal of Southwest University of Science and Technology 35(01) (2020), 75–80.

12.

Farvaresh

and Sepehri

M.M.

, A data mining framework for detecting subscription fraud in telecommunication, Engineering Applications of Artificial Intelligence 24(1) (2011), 182–194. doi: 10.1016/j.engappai.2010.05.009.

13.

Fei

, Zhengyang

et al. Optimal Bagging integrated ultra-short-term multivariate load forecasting considering minimum mean envelope entropy load decomposition, Proceedings of the CSEE (2023), 1–17, https://doi.org/10.13334/j.0258-8013.pcsee.223470.

14.

Tang

J.J.

, Liang

, Han

C.Y.

, Li

Z.B.

and Huang

H.L.

, Crash injury severity analysis using a two-layer Stacking framework, Accident Analysis & Prevention 122 (2019), 226–238. doi: 10.1016/j.aap.2018.10.016.

15.

Zhao

J.H.

, Jin

L.B.

and Shi

, Mixture model selection via hierarchical BIC, Computational Statistics & Data Analysis 88 (2015), 139–153. doi: 10.1016/j.csda.2015.01.019.

16.

, Mao

, Xin

Short-term prediction of PV power based on integrated clustering and improved Markov chain model, Southern Power System Technology (2023), 1–10, https://kns-cnki-net.web.bisu.edu.cn/kcms/detail/44.1643.TK.20230228.1127.006.html.

17.

Shuai

, Jincai

et al. Research on Stacking Ensemble Clustering Algorithm Based on Differential Privacy Preservation, Computer Engineering & Science 44(08) (2022), 1402–1408.

18.

Xiao

and Xiaoli

, Research on airline customer value classification based on k-means and neighborhood rough set, Operations Research and Management Science 30(03) (2021), 104–111.

19.

Wang

L.J.

, Ding

S.F.

and Jia

H.J.

, An improvement of Spectral Clustering via Message Passing and Density Sensitive Similarity, IEEE Access 7 (2019), 54–62. doi: 10.1109/ACCESS.2019.2929948.

20.

Hongbin

Customer Relationship Management of ZG Bank in the Internet Era, Beijing University of Technology (2019), doi:10.26935/d.cnki.gbjgu.2019.000503.

21.

Dempsey

, Libraries and the Long Tail, D-Lib Magazine 12(4) (2006). doi: 10.1045/april2006-dempsey.

22.

Breiman

, Bagging predictors, Machine Learning 24(2) (1996), 123–140. doi: 10.1007/bf00058655.

23.

Dehghani Zadeh

, Fathian

and Gholamian

, LDcFR: A new model to determine value of airline passengers, Tourism and Hospitality Research 18(3) (2018), 357–366. doi: 10.1177/1467358416663821.

24.

Wang

, Shi

, Yang

X.B.

and Mi

J.S.

, Three-way k-means: integrating k-means and three-way decision, International Journal of Machine Learning and Cybernetics 10(10) (2019), 2767–2777. doi: 10.1007/s13042-018-0901-y.

25.

Lin

, Menghan

, Zhanao

Feature Selection by Combining Artificial Bee Colony and K-means Clustering, Journal of Frontiers of Computer Science and Technology (2023), 1–18, https://kns-cnki-net.web.bisu.edu.cn/kcms/detail/11.5602.TP.20230407.1412.004.html.

26.

Agarwal

, Chowdary

C.R.

A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection, Expert Systems with Applications (2019), 113160, doi:10.1016/j.eswa.2019.113160.

27.

Goel

, Broder

, Gabrilovich

, Pang

Anatomy of the long tail: Ordinary people with extraordinary tastes, Proceedings of the Third ACM International Conference on Web Search and Data Mining-WSDM’10 (2010). doi:10.1145/1718487.1718513.

28.

Chiu

, Fang

D.P.

, Chen

, Wang

, Jeris

A robust and scalable clustering algorithm for mixed type attributes in large database environment, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’01 (2001), doi:10.1145/502512.502549.

29.

Xinzhang

, Zeyu

et al. Dual integrated PV power prediction based on heterogeneous clustering and Stacking, Power System Technology 47(01) (2023), 275–284. doi: 10.13335/j.1000-3673.pst.2022.0408.

30.

Chiang

W.Y.

, Discovering customer value for marketing systems:an empirical case study, International Journal of Production Research 55(17) (2017), 5157–5167. doi: 10.1080/00207543.2016.1231429.

31.

Hailin

, The inspiration of “Long Tail Theory” to modern enterprise customer relationship management, Business Culture 184(07) (2011), 232.

32.

Dong

X.B.

, Yu

Z.W.

, Cao

W.M.

et al. A survey on ensemble learning, Frontiers of Computer Science 14(2) (2019), 241–258. doi: 10.1007/s11704-019-8208-z.

33.

Jingya

, Fulin

and Qishi

, Aftermarket customer segmentation based on semi-supervised spectral clustering integration, Computer Engineering and Applications 56(02) (2020), 266–271.

34.

Sujuan

, Analysis of the reasons for the prevalence of Internet finance and the strategies of commercial banks from a long-tail perspective, Economic Forum 550(05) (2016), 65–67.

35.

Guowei

Airline customer value assessment and customer churn prediction model based on data mining analysis, Northwestern University (2022), doi: 10.27405/d.cnki.gxbdu.2022.000880.

36.

Liu

, Xie

, Dai

et al. Research on comprehensive evaluation method of distribution network based on AHP-entropy weighting method, Frontiers in Energy Research 10 (2022), 975462. doi: 10.3389/fenrg.2022.975462.