Abstract
Data objects with both numeric and categorical attributes are prevalent in many real-world applications. However, most of the partitional clustering algorithms dealing with such data may trap into local optima. To further promote the performance, a novel clustering algorithm, called ABC-K-Prototypes (Artificial Bee Colony clustering based on K-Prototypes), is presented on the basis of the K-Prototypes algorithm, the search strategy of the artificial bee colony, and the chaos theory. In the presented approach, the one-step k-prototypes procedure is first given, and then this procedure is combined with the search strategy of the artificial bee colony to address the mixed numeric and categorical data. In the search process of scout bees, the chaotic map is utilized to generate chaotic sequences for substituting the random numbers. To accelerate the convergence of the ABC-K-Prototypes algorithm, the multi-source search is adopted in the search process of scout bees. Finally, the performance of the ABC-K-Prototypes algorithm is demonstrated by a series of experiments on mixed numeric and categorical data in comparison with that of the other popular algorithms.
Introduction
Clustering analysis is one of the most important techniques in data mining [11, 17]. Many fields including information retrieval [10, 27], social media analysis [26], privacy preserving [31], image analysis [9], text analysis [6], and bioinformatics [7, 28], are benefited from the algorithms in clustering analysis. The target of clustering is to allocate a set of data objects into clusters such that the data objects in the same clusters are more similar to each other than those in other clusters [21]. Clustering algorithms generally be categorized into two types: hierarchical and partitional. In hierarchical clustering algorithms, data objects are distributed into a dendrogram of the nested partitions in the light of a divisive or agglomerative strategy [14]. While in partitional clustering algorithms, data objects are divided into the given number of clusters by optimizing an objective cost function.
The k-means algorithm is an extensively utilized center-based partitional clustering algorithm owing to its simplicity and high efficiency [18]. Taking into account the uncertainty of data objects, Bezdek et al. proposed the fuzzy k-means algorithm [4]. The k-means algorithm and its fuzzy version are designed for the datasets with numeric attributes. In many real-world applications, the collected data are described by both numeric and categorical attributes. The k-prototypes algorithm proposed by Huang is one of the well-known algorithms for clustering this type of data [15]. Considering the fuzzy nature of the data object, the fuzzy k-prototypes algorithm are given by Bezdek et.al. [5]. Several extensions of the k-prototypes algorithm are proposed by taking the significance of attribute and the representation of cluster’s center into account [1] [19, 21] [12]. However, one issue associated with these algorithms is that they are prone to trap into local optima.
Over the last decade, several approaches, which imitate the interesting foraging behavior of social animals including birds and ants, have been introduced for the optimization issue [20, 25]. In swarm intelligence, investigating the collective behavior of honeybees, such as the foraging, learning, memorizing, and information sharing, has become an interesting research issue [34]. Lucic and Teodorović presented a bee colony optimization metaheuristic, which is inspired by the foraging behavior of a bee swarm in the real world [29]. This metaheuristic has been utilized to solve various engineering and management problems. Karaboga and Basturk devised an artificial bee colony (ABC) algorithm [22] to address the numerical optimization problems. Alatas introduced chaotic bee colony algorithms for numeric optimization [2]. Karaboga and Ozturk presented an artificial bee colony clustering approach on the basis of the ABC optimization strategy [23]. Zhang, Ouyang and Ning proposed an artificial bee colony clustering approach, which adopts Deb’s rules to guide the search for candidate food sources [34]. However, most of these heuristic approaches are devised for the data with numeric attribute, and they might be unsuitable to address the data with both numeric and categorical attribute. It is necessary to develop an ABC-based clustering algorithm for the data with both numeric and categorical attributes since this type of data is prevalent in real-world applications.
Chaos depicts nonlinear systems which have deterministic dynamic behavior [13]. This behavior is ergodic and stochastic. The chaos is sensitive to the initial conditions. Many chaotic maps have certainty, ergodicity, and the stochastic property. Chaotic sequences have been adopted to replace random sequence, and successfully applied to many applications including secure transmission, DNA computing, and image processing [3]. Many optimization results exhibit that chaotic sequences work better than random sequences [32].
In this paper, a novel artificial bee colony clustering approach for mixed numeric and categorical data is presented. In the proposed approach, the one-step k-prototypes procedure is given first, and then this procedure is integrated with the artificial bee colony heuristic to cluster mixed data. In the search process of scouts, the chaotic sequences substitute the random numbers for the parameters where it is necessary to make a random-based choice. Then, the time complexity, space complexity, and the convergence of the proposed approach is analyzed. Finally, the proposed approach is applied to cluster the mixed data.
The rest of this paper is organized as follows: some related work is briefly reviewed in Section 2. Then, the proposed approach is described in Section 3, and the experimental results are reported in Section 4. The conclusion and future direction of this paper are given in Section 5.
Related work
The k-prototypes algorithm
This algorithm was first introduced by Huang in [15] for clustering the data with both numeric and categorical attributes. Assume X = {X1, X2,. . . , X
n
} indicate a dataset including n data objects, and X
i
(1 ≤ i ≤ n) be a data object with m attributes. Each attribute A
j
(1 ≤ j ≤ m) has a domain of values denoted by Dom (A
j
) The domains of attributes related to the mixed data has two types: numeric and categorical. The numeric domain consists of continuous values, and the categorical domain consists of a finite set without any natural ordering (such as color, gender), which commonly denoted as
where Q l is the prototype of the cluster l; u il (0 ≤ u il ≤ 1) is an element of the partition matrix Un×k, and d (x i , Q l ) is the dissimilarity measure which is formulated as:
In the above,
where A l is the lth attribute; δ (p, q) =0 if the values of p and q are the same, while δ (p, q) =1 if the values of p and q are different; μ l is the weight for categorical attributes in the cluster l. When x ij is a numeric value, q lj is the mean of the jth numeric attribute in the cluster l; when x ij is a categorical one, q lj is the mode of the jth categorical attribute in the cluster l The procedure of the k-prototypes algorithm is given by:
Step 1. Randomly select k data objects from the dataset X as the initial prototypes of clusters.
Step 2. Allocate each data object in the dataset X to the cluster which has the nearest prototype according to Eq. (2). Update the prototype of the cluster after each assignation.
Step 3. Once all data objects have been allocated, reevaluate the similarity of data objects against the current prototypes. If a data object is found that its nearest prototype locates in another cluster rather than the current one, reallocate this data object to that cluster and update the prototypes for both clusters.
Step 4. If no data objects have changed clusters after a full circle test of X, terminate the algorithm; otherwise, go to Step 3.
The artificial bee colony (ABC) algorithm is introduced by Karaboga and Basturk to optimize the numeric problems [22]. This algorithm is well-known for its simplicity and robustness [23]. In the ABC algorithm, the artificial bees are divided into three types: employed bees, onlookers, and scouts. The employed bee exploits a particular food source, and shares the information of this food source with onlookers; the scout seeks a new food source in the search space; the onlooker waits in the nest and discovers a food source via the shared information. The artificial bee colony is divided into two halves: the first half is the employed bees and the rest half is the onlookers. There are three essential components (i. e. food sources, employed foragers, and unemployed foragers) and two modes of the behavior (recruitment to a food source and abandonment of a food source) in the model of forage selection. The value of a food source is relevant to many factors including its proximity to the nest, nectar amount and the ease of extracting its nectar. The unemployed forgers are divided into two types: scouts and onlookers. One food source is gathered by one employed bee. The number of employed bees therefore equals the number of food sources. Onlookers fly onto a food source in term of a probability-based selection strategy. The employed bee becomes a scout bee once its food source’s nectar is exhausted. The exploitation and exploration processes are implemented together in the ABC algorithm. Concretely speaking, the exploitation process is executed by the employed bees and onlookers, and the exploration process is carried out by the scouts. The bee colony exploits and explores the food sources in a manner to maximize the nectar being stored in the nest. In an optimization problem, a food source represents a possible solution, the nectar amount of a food source indicates the quality of the corresponding solution, and the aim is to achieve the optimal value of the objective function. The main steps of ABC algorithm are described as follows:
Step 1. Initialize the population of food sources.
Step 2. Dispatch the employed bees onto the food sources and assess the nectar amount of these food sources.
Step 3. Calculate the probabilities of all food sources to be picked up by the onlooker bees;
Step 4. Dispatch the onlookers onto the food sources: each onlooker will select a food source according to the probabilities obtained from Step 3, exploit this food source, and assess the nectar amount of the obtained food source;
Step 5. If a food source is exhausted, the corresponding employed bee ceases its exploitation process and becomes a scout bee;
Step 6. Dispatch the scouts into the search space to forage for new food sources randomly;
Step 7. Memorize the best food source obtained so far;
Step 8. If the requirements are satisfied, terminate the algorithm and output the best food source; otherwise go to Step 2.
Chaos theory
Chaos theory proposed by Edward Lorenz is focused on the behavior of nonlinear dynamical systems. The chaos behavior is a deterministic, random-like process, and is highly sensitive to its initial condition [3, 13]. The important characteristics of chaos include ergodicity, pseudo-randomness, irregularity, and strange attractor with self-similar fractal pattern [32]. Due to its ergodicity, chaos provides great diversity. Chaotic sequences have been utilized to replace the random sequences, and achieved good results in many applications including secure transmission, DNA computing and image processing [2, 3]. Many chaotic maps are proposed to generate chaotic sequences. As a discrete-time dynamical system, the general form of the chaotic maps is given by
where 0 < y t < 1, t = 0, 1, 2 . . .. The obtained chaotic sequences are denoted by:
It is unnecessary to store the chaotic sequences since these sequences are easy and fast to generate and store. Given the chaotic maps and initial conditions, the chaotic sequences can be generated. The simplest system which is able to generate chaotic motion is the one-dimensional chaotic maps [32]. One of the simplest one-dimensional chaotic maps is the Kent map [2] [32] which is defined by:
Here, the control parameter β is within the interval (0,1). For simplicity, the parameter β is taken as 0.7 in our work.
In this section, the proposed ABC-K-Prototypes clustering approach is first described, and then its complexity, and convergence is analyzed.
The proposed approach
In this subsection, the novel clustering algorithm which is based on the k-prototypes approach, the search strategy of an artificial bee colony, and the chaos theory, is introduced. As aforementioned, the swarm of artificial bees has three types of bees: employed bees, onlookers, and scouts. In an optimization issue, a food source is a possible solution, and the nectar amount of this food source reflects the quality of the corresponding solution. In the clustering, the clustering results are determined by the position of cluster centers. The clustering issue therefore can be regarded as the optimization of the cluster centers, and a group of cluster centers is a possible solution. For clustering the data with both numeric and categorical attributes, let f
i
= {C1, C2,. . . , C
k
} denote a food source, where C
l
is the prototype of the cluster l;
In the proposed algorithm, the artificial bees are divided into two parts: the first half of the artificial bees are the employed bees, and the rest ones are the onlookers. There is only one employed bee on a food source, and therefore the number of the employed bees equals the number of solutions in the population. Assume P fs = {f1, f2,. . . , f T } indicate the population of food sources, where is the number of the food sources, and f i is the ith food source. The probability [22] of the ith food source being selected by an onlooker is formulated as:
To obtain a candidate food source from the current one, the One-step K-Prototypes procedure, abbreviated as OKP, is presented first. Essentially, the OKP procedure is one iteration step in the search process of the k-prototypes algorithm. This OKP procedure is utilized to look for the neighbor food source of the current food source in the exploitation process. The exploitation process is executed by employed bees and onlookers. Let f i be the current food source, then the procedure of the OKP contains two steps:
1) For each data object in the dataset X, allocate it to the cluster with the nearest prototype, and therefore generate a partition matrix U; concretely speaking, if the ith data object is a member of the lth cluster, then u il = 1; otherwise u il = 0, where u il is an element of U;
2) obtain the prototypes according to the partition matrix U, and thus generate a candidate food source
In the OKP procedure, we adop the Written and Frank’s normalization scheme (WF nornalization scheme) [30] to make the different numeric attributes on the same scale. The WF nornalization scheme is given by:
where vj,min (vj,max) is the mininum (maximum) value of the jth attribute, and
Once a food source is exhausted, the corresponding employed bee becomes a scout. In our algorithm, the parameter NT, which is the given number of trials, is introduced to control the abandonment of a food resource. More precisely, if a food source cannot be improved further through NT trials, this food source is abandoned, and the relevant employed bee becomes a scout. The scout will search for a new food source in the search space. The kent map, which is one of the chaotic maps, is introduced in the search process of a scout due to its ergodicity, irregularity and stochastic property. The kent map [32] [2] with the parameter β = 0.7 is given as follows:
where: CN l is the lth chaotic number. The chaotic numbers generated by the kent map is in the range (0,1). As mentioned above, the food source in the clustering scenario is a group of cluster centers, and the cluster center is the set of attribute values. In the scenario of mixed numeric and categorical attributes, the search process of a scout is different for these two types of attributes. For a categorical attribute j, the search operation of a scout is performed as following: the scout selects a categorical value in a chaotic way from the collection of attribute values by the following equation:
where catVal j is the categorical value, C j is the collection of categorical values for the jth categorical attribute in a dataset, and the index is given by:
Here, len denotes the number of categorical values in the cluster C j , and floor(s) means the greatest integer that is less than or equal to s. For a numeric attribute j, the value is determined by:
where numVal
j
denotes the jth attribute value;
where i ∈ {1, 2,. . . , T}, and scoutSearch (X) is the search operation of a scout. The pseudo-code of the scoutSearch(X) process is given by
1) Randomly initialize the chaotic variables; let C r denote the rth cluster center, and initialize r=1;
2) For the rth cluster center C r ; let A j denote the jth attribute, and set j=1;
3) For the jth attribute.
a) If the jth attribute is a numeric attribute Update the chaotic variable for this attribute according to Eq. (10); Get the the jth attribute value for cluster center C r by using the Eq. (13).
b) If the jth attribute is a categorical attribute Update the chaotic variable for this attribute according to Eq. (10) Get the jth attribute value for cluster center C r by using the Eq. (11), and Eq. (12).
4) j=j+1.
Having presented the calculation formulas for all relevant variables, the pseudo-code of the proposed ABC-K-Prototypes algorithm for the mixed data is described as follows:
1) Initialize the group of food sources G fs = {f1, f2,. . . , f T } in a random way; concretely speaking, for a food source f i (1 ≤ i ≤ T) pick up k data objects randomly from the dataset X as the prototypes of clusters; set the exploitation numbers En i =0 (1 ≤ i ≤ T) for these food sources.
2) Calculate the nectar amounts NA (f1) , NA (f2) ,. . . , NA (f T ) for these food sources according to Eq. (7);
3) set the cycles number CN=1;
a) Adopt the procedure OKP to obtain a new food source f i from the current food source, and set En i = En i + 1;
b) Calculate the nectar amount of the obtained food source, that is,, according to Eq. (7);
c) If NA (f i ) <, the current food source f i is displaced by the new food source ; otherwise the current food source f i is unchanged.
5) Assess the probability prob i for each food source f i according to Eq. (8);
6) For each onlooker bee
a) Choose a food source f i as the current food source depending on the probability prob i ;
b) Adopt the procedure OKP to obtain a new food source from the current food source f i , and set En i = En i + 1;
c) Calculate the nectar amount NA() for the food source ;
d) If NA (f i ) <, the current food source f i is displaced by the new food source ; otherwise the current food source f i is retained;
e) Update the probability prob i (1 ≤ i ≤ T) for all food sources according to Eq. (8).
7) For each food source f i , if the exploitation number En i is equal to or larger than the number of trials NT, this food source is abandoned, and the corresponding employed bee becomes a scout.
8) If there exists an abandoned food source f i ,
a) Dispatch the scout in the search space to forage for H candidate food sources
b) Calculate the nectar amounts
c) Pick up the food source with the highest nectar amount as the new food source f i , and initialize its exploitation number En i =0;
d) If NA (f i ) < the current food source f i is displaced by the new food source ; otherwise the current food source f i is retained.
9) CN=CN+1;
Complexity analysis
In this section, the time and space complexities of the proposed ABC-K-Prototypes approach is analyzed. The time complexity of the proposed method mainly contains five parts: the initialization, the search of employed bees, the calculation of the probability of food sources, and the search of scouts and onlookers. The computational cost of these five parts are O (Tk), O (T (nkm + nkp + nkC (m - p))), O (T), O (Hkm), and O (T (nkm + kpn + (m - p) Cn)), respectively. Here n is the number of data objects in the dataset X; m is the number of attributes; k is the number of clusters; p is the number of numeric attributes; T is the number of employed bees or food sources; C is the maximum number of categories value for all categorical attributes; H is the number of candidate food sources for the scout bee. Therefore, the overall time complexity of the proposed approach is O (Tk + s (Hkm + T (nkm + nkp + nkC (m - p)))), where s is the number of iterations. For space complexity, it requires O (mn) to store the dataset X, O ((T + H) km) to store the food sources, and O (nk) to store the partition matrix. Therefore, the overall space complexity of our proposed method is O (mn + (T + H) km + nk).
Converence analysis
In our approach, the search process includes exploration and exploitation process, both of which are performed by the ABC search strategy. The current solution will be displaced by a new solution if the new solution is better than the current one in the exploitation or exploration process. Therefore, each possible solution appears in the current solution list at most once. If the value of MCN (maximum cycle number) is high enough, the global optimal solution will be very likely to be found; otherwise, the algorithm will be converged to a local optimum. In other words, the higher the value of MCN, the greater the possibility that ABC-K-Prototypes will converge is. The possibility of convergence for our proposed approach approaches to 100% when MCN tends to be infinite. Therefore the convergence of our algorithm to a global/optimal solution is guaranteed as long as MCN is high enough.
Experimental results and discussion
To evaluate its performance, the proposed clustering algorithm ABC-K-Prototypes is executed on three mixed datasets: Zoo, Heart Disease, and Credit Approval, all of which are obtained from the well-known UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html). In this work, the Yang’s accuracy measure [33], which is one of the most commonly used criteria, is utilized to evaluate the obtained clustering results. In Yang’s method, the definition of the accuracy (AC) is given by:
where a i is the number of data objects that are correctly assigned to the class C i , and n is the number of data objects in a dataset. According to this definition, the AC has the same meaning as the clustering accuracy r given in [16].
According to this measure, the higher value of AC implies the better clustering result. Four well-known algorithms, i.e., the K-Prototypes algorithm [15], EKP algorithm [35], SBAC algorithm [24], and KL-FCM-GM algorithm [12], are selected to compare with the proposed algorithm. For the performance analysis, the proposed ABC-K-Prototypes algorithm, the K-Prototypes algorithm, the EKP algorithm, the SBAC algorithm, and the KL-FCM-GM algorithm are executed on three different datasets, and for each dataset these algorithms are run twenty trials. Then the clustering results of the proposed ABC-K-Prototypes algorithm are compared with that of the other four well-known algorithms according to the average of AC. All algorithms are implemented in Java language and run on an Intel(R) Core(TM) i7, 3.4GHz, 16GB RAM computer. The parameters of the proposed ABC-K-Prototypes algorithm in all experiments are set as follows: T=20, which is the typical value adopted in the original ABC algorithm [23]; MCN=100, NT=5 and H=5 are set by the rule of thumb. In all five algorithms, the number of clusters k is set as the number of classes supplied by the class information of the dataset. It is worth to note that other class information is not utilized in the clustering process apart from the number of classes. The other parameters of the k-prototypes algorithm, the EKP algorithm, the SBAC algorithm, and the KL-FCM-GM are set the same as those given in their original papers.
The experiments are begun by considering the Zoo dataset. This dataset has 101 data objects, each of which is described by one numeric attribute and 16 categorical attributes. The last categorical attribute is the class attribute, and has seven values. Therefore, the data objects in the Zoo dataset belong to one of the seven classes. Table 1 summarizes the clustering results of the ABC-K-Prototypes, the K-Prototypes, the EKP, the SBAC, and the KL-FCM-GM algorithms on the Zoo dataset according to AC. The K-Prototypes, the EKP, the SBAC, the KL-FCM-GM algorithms give values of AC 0.806, 0.566, 0.426, and 0.864, respectively. In contrast, the proposed ABC-K-Prototypes algorithm gives a higher value of AC 0.886.
The AC of the five algorithms on the Zoo dataset
The Heart Disease dataset comprises 303 patient instances, each of which has six numeric attributes and nine categorical attributes. The last two attributes are the class attributes. When the 15th attribute is selected as its class attribute, the data objects in this dataset belong to one of five classes (s1, s2, s3, s4, and H), and each of them is described by 14 attributes; when the 14th attribute is chosen as its class attribute, the data objects in this dataset belong to one of two classes (buff, sick), and each of them is described by 13 attributes. For the first case, Table 2 lists the comparison of the clustering results of the ABC-K-Prototypes and the other four well-known algorithms on the Heart Disease dataset (first case) according to AC. The K-Prototypes, the EKP, the SBAC, and the KL-FCM-GM algorithms give values of AC 0.546, 0.545, 0.545, and 0.653, respectively. In contrast, the proposed ABC-K-Prototypes algorithm gives a higher value of AC 0.648.
The AC of the five algorithms on the Heart Disease dataset (5 classes and 14 attributes)
For the second case where each data object in the Heart Disease dataset has 13 attributes and the 14th attribute is taken as its class attribute. Table 3 lists the comparison of the clustering results of ABC-K-Prototypes and the other four well-known algorithms on the Heart Disease dataset (second case) according to AC. The K-Prototypes, the EKP, the SBAC, and the KL-FCM-GM algorithms give values of AC 0.577, 0.577, 0.752, and 0.758, respectively. In contrast, the proposed ABC-K-Prototypes algorithm gives a higher value of AC 0.809.
The AC of the five algorithms on the Heart Disease dataset (2 classes and 13 attributes)
The Credit Approval dataset consists of 690 data objects from credit card organizations, where each data object has ten categorical attributes and six numeric attributes (the last categorical one is the class attribute). The data objects in this dataset belong to one of two classes: negative (383) and positive (307). Table 4 summarizes the comparison of the clustering results of ABC-K-Prototypes and the other four well-known algorithms on this dataset according to AC. The K-Prototypes, the EKP, the SBAC, the KL-FCM-GM algorithms give values of AC 0.562, 0.560, 0.555, and 0.584 respectively. In contrast, the proposed ABC-K-Prototypes algorithm gives a higher value of AC 0.794.
The AC of the five algorithms on the Credit Approval dataset
In the proposed ABC-K-Prototypes algorithm, the multi-sources search is adopted to accelerate the convergence of the algorithm. To illustrate the efficiency of the multi-sources search, we run the ABC-K-Prototypes algorithm with and without the multi-sources search twenty trials on each of the three different datasets. Table 5 lists the average number of iterations of the proposed ABC-K-Prototypes with and without the multi-sources search on the different datasets. From this table, we can see that the average number of iterations of the ABC-K-Prototypes algorithm with the multi-sources search are lower than that of the same algorithm without the multi-sources search.
The average number of iterations of the ABC-K-Prototypes algorithm with and without the multi-sources search on the different datasets
Table 6 summarizes the number of iterations of the ABC-K-Prototypes algorithm, and the EKP algorithm on each of the three different datasets. The results in Table 6 show that the number of iterations required by the ABC-K-Prototypes algorithm is lower than that of the EKP algorithm in most cases.
The number of iterations of the ABC-K-Prototypes algorithm and the EKP algorithm on the different datasets
The experimental results in Tables 1-6 show that the proposed ABC-K-Prototypes algorithm achieves higher values of AC on most datasets, and therefore the proposed algorithm outperforms the other four algorithms according to the measure AC. Furthermore, the ABC-K-Prototypes algorithm requires less number of iterations in most cases. The reason for the success of the ABC-K-Prototypes is described as follow: this approach has the ability of global search (exploration) and local search (exploitation) by introducing the OKP operator, and the ABC optimization framework. More specifically, the employed and onlooker bees implement the local search by utilizing the OKP operator, and the scout bees execute the global search in a chaotic way. Therefore, the proposed ABC-K-Prototypes algorithm can obtain optimal or near-optimal results.
Data objects with both numeric and categorical attributes are ubiquitous in many real-world applications. The k-prototypes type algorithms are well-known for their high efficiency to cluster this type of data. However, this type of algorithms is prone to trap into local optima.
To solve this issue, the novel clustering algorithm ABC-K-Prototypes, which is based on the traditional k-prototypes algorithm, ABC optimization strategy, and chaos theory, is presented in this paper. In the proposed algorithm, the employee bees and onlookers utilize the OKP procedure to explore the food source around the existing food source, and the scouts explore the food sources in the entire search space in a chaotic way. For accelerating the convergence of the ABC-K-Prototypes, the multi-source search is utilized in the search process of scout bees. The time complexity, space complexity, and convergence of the ABC-K-Prototypes algorithm is analyzed, and this algorithm is tested on three datasets with both numeric and categorical attributes. The experimental results validate the performance of the proposed algorithm.
For simplicity, the kent map is adopted to generate the chaotic sequences in this paper. In the future work, we will focus on applying the other chaotic maps and swarm intelligent algorithms to cluster mixed data.
Footnotes
Acknowledgments
This work was supported by the National Key R&D Program of China under Grant No. 2017YFC 0909200, the National Natural Science Foundation of China (NSFC) under Grant Nos. (61403077,61502 093,11501095,81502291,61802057, 61872076)), Natural Science Foundation of the Education Department of Jilin Province under Grant Nos. (2016504,2016505), Science and Technology Development Plan of Jilin province under Grant Nos. (20170520058JH, 20170520051JH, 20180414 006GH, 20180520028JH, 20150101057JC).
