Abstract
This paper proposes a modified single-objective binary cuckoo search for association rule mining (MBCS-ARM). The proposed algorithm includes a novel representations of individuals, which tackles the problems of large dimensionality with an increasing number of attributes. The MBCS-ARM also supports the mining of rules, where intervals of attributes can either be negative or positive. It uses an objective function composed of support and confidence weighted by two parameters, which control the importance of each measure in the found rules. It is tested on eight publicly available databases, while also compared to several single-objective evolutionary algorithms, and traditional algorithms, all found in the KEEL software tool. The experiments show promising results of the MBCS-ARM, compared to other algorithms, by producing rules, which are interesting, simple, and also easy to understand, which is of great importance in domains like medicine.
Introduction
In the modern world, there is a huge amount of data stored in real-world databases and this trend continues to grow daily. There are different kinds of databases such as medical, scientific, financial, and marketing transaction data. Many of them hold large amounts of data. Therefore, it has become an issue how to effectively analyze these data and find the interesting information hidden beneath. In the past decade, the most successful method for solving this kind of problems has been data mining, or to be more specific: classification, clustering, regression, and association rule mining [6].
Recently association rule mining has again gained a lot of attention within the scientific community. This is a popular method for discovering relations between attributes in large databases. The goal of the method is to find those rules having the high level of interestingness based on various measures. The pioneer of association rule mining is Agrawal [1, 2] who presented the idea of discovering regularities between products in large-scale transaction databases. An Apriori algorithm was proposed, which has become a standard approach in association rule mining. Normally, the algorithm selects the mined association rules according to interestingness measures, like support and confidence identifying the most important relationship of attributes in transaction databases.
Since the datasets are getting larger and more complex [16], the traditional algorithms, like Apriori, usually face problems of high computational complexity when generating association rules. To overcome this problem, researchers have proposed new stochastic population-based nature-inspired algorithms that treat the rule mining process as an optimization process by applying search heuristics to the underlying optimization problem.
The heuristics for association rule mining normally involve one or more interestingness measures depending on whether we approach the as a single-objective or multi-objective problem. The measures are disguised as fitness values guiding the optimization process towards the more promising regions of the search space, where more association rules of better quality can be found.
Roughly speaking, we can divide the stochastic nature-inspired algorithms into two large groups: evolutionary algorithms (EAs) and swarm intelligence (SI) based algorithms. Both families are population-based algorithms, which generate new solutions by applying appropriate variation operators (e.g., crossover, mutation, etc.).
Many single and multi-objective methods have been developed based on EAs, as follows: genetic algorithms (GA) [11], genetic programming (GP) [20], differential evolution (DE) [31]; and SI: particle swarm optimization (PSO) [19], artificial bee colony (ABC) [18], bat algorithm (BA) [33], and cuckoo search (CS) [34].
Luna et al. [22] developed a GP based algorithm for mining rare class association rules that cannot be found using the traditional data mining algorithms. Song et al. [30] discuss the effectiveness of a multi-objective based binary BA, comparing it to single objective binary PSO and BA, and the Apriori algorithm. They conclude that the proposed algorithm is feasible and highly effective. Yan et al. [32] designed a genetic algorithm-based strategy for identifying association rules without specifying the minimum support. Confidence is used as a fitness function, while the FP-Tree algorithm [12] is implemented to improve the algorithm efficiency. In [10] Ghosh et al. presented a Pareto based genetic algorithm for extraction of interestingness rules on large market-basket type dataset. They use three commonly used measures to evaluate individuals: support, comprehensiveness, and interestingness. Other interesting approaches are described in the following works [5, 28].
Another multi-objective genetic algorithm is described in [25], where the problem of dealing with numerical data is tackled. Same rule measures are used as in [10]. Heraguemi et al. [13] proposed a bat algorithm, which produced better results when compared to the FP-Growth algorithm [12]. Their method relies on the minimum support and confidence that must be supplied by the user. Sarath et al. [28] proposed a binary particle swarm optimization algorithm (BPSO), for operating on transactional databases. With a product of support and confidence as the fitness function, they evaluate their method on three transactional datasets, and a real life bank dataset. Based on the results, they conclude that BPSO is a good alternative to Apriori and FP-growth algorithms.
Algorithms based on binary encoding of individuals often suffer on the dimensionality of the problem when confronted with datasets introducing attributes with many possible attribute values. These methods are also mainly used for working on transactional databases, while not appropriate for dealing with datasets that include categorical or even numerical values. EA or SI-based algorithms also focus mainly on discovering positive association rules, while ignoring the potential rules that involve negation of attributes (an exception is the methodin [3]).
With this in mind, the paper proposes a modified binary cuckoo search for association rule mining (MBCS-ARM) based on a binary representation of individuals, where each individual encodes the corresponding association rule. Thus, all attributes in a database are encoded as binary strings with additional three control bits determining the presence/absence of the attribute in the corresponding association rule, when the attribute is part of antecedent/consequent part of the rule, and when the attribute values are taken from positive/negative domain of attribute values. The MBCS-ARM was tested on eight publicly available datasets, where seven of them are available in the KEEL tool [4] or the UCI machine learning repository [21], and one dataset supplied by a professional cyclist (produced for our research purpose) [17]. The performance of the proposed algorithm was compared to the four single-objective EAs for association rule mining, and three traditional algorithms for solving the same problem.
In summary, the proposed MBCS-ARM algorithm brings the following two key novelties: (1) the novel (i.e., binary) representation of individuals, and (2) the application of a single-objective modified binary CS algorithm to association rule mining.
The remainder of the paper is organized as follows. In Section 2, a formal definition of association rule mining is given, along with some descriptions of traditional algorithms. Section 3 presents the original and the proposed modified binary CS algorithms. Section 4 deals with describing the experimental environment (i.e. parameter settings, and used databases), and a brief review of EAs used for association rule mining. In Section 5, the results are depicted, while in Section 6, the paper is concluded outlining the directions for the future work.
Association rule mining
Mathematically, the association rule mining is defined as follows. Let us suppose a set of attributes I = {a1, …, a
m
} called items and a set of transactions D = {t1, …, t
n
} called a dataset, where each transaction has a unique identification and contains a subset of items t
i
⊂ I called an item-set, and m denotes the number of items and n the number of transactions. Then, the association rule is an implication X → Y, where X and Y are two item-sets and it holds X∩ Y = ∅. The criteria to identify the most important relationships in the specific rule are estimated according to two interestingness measures as follows: Minimum support that is defined as proportion of transactions containing X and Y, and the total number of transactions in database, as follows:
Minimum confidence that is defined as proportion of transactions that contains X also contains Y, as follows:
There are many traditional association rule mining algorithms, but the most used are Apriori, Eclat, and FP-Growth. Therefore, a brief description of the mentioned algorithms follows in the nextsections.
The Apriori algorithm proposed by Agrawal [2] is the most representative association rule mining algorithm. It works in two steps, where the first step is dedicated for generating the most frequently arisen item-sets in a transaction database. They are selected from all possible item-sets using the parameter minimum support as defined by Equation (1). After the most frequently arisen item-sets are selected, those occurring less than the specified minimum support are removed. From these item-sets, the rules are selected in the second step using the parameter minimum confidence as defined by Equation (2). Thus, all rules having the lower confidence than the one provided are removed.
Eclat
First introduced by Zaki et al. [35] in 1997, the Eclat algorithm finds the frequently arisen item-sets by a depth first search on the subset lattice and determines the support of these item-sets by intersecting transaction lists. It is suitable for parallel processing with local enhancing properties. For more information readers are referred to [35].
FP-Growth
FP-Growth algorithm was proposed by Han et al. [12] in 2000 with an intention to overcome the bottlenecks of Apriori. It uses an extended prefix-tree structure for storing crucial information about frequently arisen patterns found in the transaction database. All item-sets are generated by only two passes over the whole database. More information can be found in [12].
Cuckoo search algorithm for association rule mining
Original cuckoo search
Cuckoo search (CS) is a stochastic population-based nature-inspired algorithm that has been introduced by Yang and Deb in 2009. CS belongs to the SI-based algorithms [9], and it is inspired by natural behavior of some cuckoo species and their brood parasitism. Typically, some cuckoo species lay their eggs in the nests of other birds in order to care for them as if they were their own. To trap the behavior of cuckoos in nature on the one hand and to adapt it to be suitable for using as the computer algorithm on the other hand, authors idealized threerules [34]: Each cuckoo lays one egg at a time, and dumps it in a randomly chosen nest. The best nests with high-quality eggs will be carried over to the next generations. The number of available host nests is fixed and the egg laid by a cuckoo may be discovered by the host bird with a probability p
a
∈ (0, 1). In this case, the host bird can either get rid of the egg, or simply abandon the nest and build a completely new nest.
Initially, a solution corresponding to a cuckoo egg placed inside the nest is randomly generated within the search space. Mathematically, the position of a nest is defined as:
Equation (4) describes the local random walk that is mainly intended for exploitation of the current solutions. This random walk is governed by Lévy flight distribution expressed as L (s, λ) and scaled by the scaling factor α > 0 of the step size s.
The pseudocode of the CS is presented in Algorithm 1 (see Fig. 1 for the flowchart).
Since the CS was applied to association rule mining, it seems that a binary representation of solutions is more promising to achieve the better results. In line with this, the modified binary CS (MBCS) is proposed, where the solution is represented as a n dimensional boolean lattice, in which the solutions are updated across the corners of a hypercube [27].
In order to prepare the MBCS for association rule mining, the following five steps need to be preformed: identifying an attribute domain, rule representation, candidate solution generation, fitness evaluation, association rule mining using MBCS.
In the remainder of the paper the mentioned steps are discussed in detail.
1: generate_initial_host_nest_locations;
2: FE = 0;
3:
4:
5:
6: f
trial
= evaluate_the_new_solution(
7: FE = FE+ 1;
8: j =⌊ rand (0, 1) * NP + 1 ⌋;
9:
10:
11:
12:
13: init _ nest (
14:
15:
16:
18:
19:
20:
1:
2:
3:
4:
5:
6:
7:
8:
9:
Identifying an attribute domain
Attributes in MBCS are represented as binary strings. The size of the strings is determined with the number of attributes. It is well known that the maximum 2 n different attributes can be represented using a binary string of length n.
In order to identify the number of attributes, an analysis of a transaction database D needs to be performed. As a result, sets of attribute values a i = {αi,1, …, αi,m i } are defined after performing the analysis that contain the unique attribute values αi,j for i = 1, …, m and j = 1, …, m i , where m is the number of attributes and m i the number of attribute values for attribute a i . Let us notice that data in a database can either be categorical, numerical or transactional. However, if data are continuous, a discretization must be performed.
Rule representation
As found in literature, there are two well established encodings for representing association rules in EA domain. The first is the Pittsburgh approach [15], where each individual represents a set of association rules, while by the second so called Michigan approach [15], each individual represents the separate association rule. The main downfall of both approaches is that the search space of the EA grows quite big by increasing the number of attributes as well as the number of attribute values. In this work we present a novel representation of individuals for association rule mining, which tries to overcome this problem.
Each individual represents a separate rule as in the Michigan encoding approach, where each attribute of the rule is encoded with three control bits as follows: the first control bit indicates presence/absence of this attribute in the corresponding association rule, the second control bit denotes, whether the attribute belongs to the antecedent (bit one) or the consequent of the rule (bit zero), the third control bit defines, whether the attribute values are taken from a positive/negative attribute domain, and the actual value of the corresponding attribute.
It is obvious that, when the first control bit of an attribute is set then it is present in the corresponding association rule. Contrary, when this bit is not set means that the attribute does not belong to the rule. The similar is true for the second bit. Third control bit is reserved for defining the positive or negative attribute domain, where the positive domain means, that the actual value of the attribute is used and negative whether all other values except the actual value are used. Let us assume an attribute a = {1, 2, 3}. Then, the positive domain for an attribute value {2} is the same value, i.e.,
Finally, the attribute value is represented as a binary value, where the number of bits depends on the minimum value of bits m i required to represent the maximum number of different attribute values 2 m i for an attribute a i . However, when using this individual representation, a problem of redundant values can occur when an invalid attribute value isrepresented.
For instance, the attribute a2 has five different attribute values. That means that it needs three bits for representing the attribute values, in other words {0, 1, 2, 3, 4, 5, 6, 7}. Consequently, only first five attribute values are needed for representing attribute values, while other three values {5, 6, 7} are redundant. However, these redundant values are simply fixed by reinitializing the invalid value to the correct domain in the study.
Each attribute is thus encoded with as little information as possible. Figure 2 depicts this process for easier understanding.

Flowchart of the CS algorithm.

Representation of the i-th individual, where each attribute is presented using 3 control bits, and the actual attribute value. For example the control bits of attribute a1 are 1-0-1, which means that the attribute is present in the rule, it is part of the antecedent, and the interval of the attribute is negative. The whole example can be interpreted as follows: the first and last attributes are present in the rule and are a part of the antecedents, while the second attribute is part of the consequent. ((a1¬ =2) ∧ (a m = 4) ∧ … → (a2 = 3) ∧ …).
The candidate solution is generated by the original CS according to Equation (4) and this enters in the selection process with randomly selected solution in the current population. If the candidate solution is better than the existing one, it becomes the new population member. However, the generation in MBCS slightly differs from those in the original CS.
The generation of candidate solution in MBCS algorithm is accomplished using Algorithm 2, where the candidate solution is generated using the same Equation (4) as the original CS. However, this equation is conducted on the binary vector in the case of MBCS, where only two values 0/1 are allowed. This values are preserved by calculating the threshold using the following function [27]:
Since the proposed method for association rule mining is applied using the MBCS algorithm, a metric for identifying the quality of rules (i.e. a fitness function) must be determined. There are several different fitness functions applied in the literature, varying from simple to very complex. Our method relies on the already mentioned measures of support (Equation (1)) and confidence (Equation (2)) for the mined association rule X → Y as follows [13]:
Both confidence and support are weighted by factors β and γ, which control the importance of both measures. Increasing the value β, would result in the algorithm finding rules with an increased confidence. In contrary, increasing the value γ would result in rules favoring the increased support. Depending on the application of rule mining, the user can control the importance of both measures.
A detailed explanation of the proposed algorithm for association rule mining (MBSC-ARM) is given in this subsection. As described in the beginning of Section 3.2, the algorithm consists of two parts. In the first part, the data are mapped from real-coded elements of the CS to binary encoded strings of the MBSC, while in the second part the MBCS is employed for finding the optimal association rules. The pseudo-code of the proposed algorithm can be seen in Algorithm 3.
1: k = 0
2: R = {ø}
3:
4: rule = BCS ()
5:
6: R = R ∪ rule
7: k = 0
8:
9: k = k + 1
10:
11:
The algorithm is implemented in such a way, that when the termination condition is reached by the algorithm, the best association rule found is put into an archive of solutions. If the same solution is already in the archive, a counter k is increased. When this counter reaches the maximum number of K, the archive becomes the final result of different solutions. The K parameter is supplied by the user.
Experimental environment
This section describes the experiment settings, the used databases and PC configuration on which the experiments were conducted.
Parameter settings
During testing the parameters of MBCS-ARM were as described in Table 1. As can be seen from Table 1, NP denotes the population size, G the maximum number of generations, and p a is the switching probability. The parameters β and γ are used to weigh the importance of either support or confidence, and lastly K is the number of restarts in a row, which is dataset dependent, i.e. the bigger datasets with more attributes require a smaller number of restarts, because to many rules would be produced otherwise.
Parameters of BCS for association rule mining
Parameters of BCS for association rule mining
All runs were made on a desktop computer, with the following configuration: Processor: Intel(R) Core(TM) i5-4570 CPU @ 3.20 GHz, RAM: 16 GB, Operating system: Linux Mint 17 Qiana.
The MBCS-ARM was implemented in the C++ programming language.
Test databases
The datasets listed in Table 2 were used for evaluating the performance of the MBCS-ARM. The characteristics of each database is also presented in terms of the number of transactions, attributes and attribute types. The latter can be one of the following: categorical, integer, and real. All quantitative attributes have been partitioned into four categories (as suggested by [3]), by equal frequencysampling.
Datasets used in the study. The abbreviations for attribute types are: C (categorical), I (integer), R (real)
Datasets used in the study. The abbreviations for attribute types are: C (categorical), I (integer), R (real)
In the experimental part of this work, our algorithm was compared to all single-objective evolutionary, and three traditional association rule mining algorithms found in the KEEL software tool [4]. The evolutionary algorithms are: while the traditional algorithms are:
All listed evolutionary algorithms (Alatasetal, EARMGA, GENAR, and GAR) are GAs. The Alatasetal algorithm is able to mine both positive and negative intervals for attribute values, without specifying the minimum support and confidence. Another interesting fact is that the initialization is not done randomly, but close to final solutions. The rules are obtained from the last generation of a single run. The EARMGA is also independent of minimum support and confidence. It uses a generalized FP-Tree to improve the algorithm efficiency. Another characteristic of this algorithm is the specification of rule length, which must be supplied by the user. GENAR is able to find rules in numeric datasets by incorporating the lower and upper bounds of each attribute in the solution. Rule length is always the same as the number of attributes, where the last attribute forms the consequent. The used objective function penalizes those solutions, which cover the same records in the dataset. The GAR algorithm is an extension of GENAR, where frequent item sets are firstly discovered, on which the rules are later produced. Like GENAR this algorithm also searches for rules in numeric dataset with no need for discretization.
All traditional algorithms have previously been described in Section 2.
Let us notice, that all algorithms were run with their default parameters, as set in the KEEL tool. Each algorithm in the comparison was executed 10 times, so the reported results are an average of those 10 runs.
Results
In this section, the results of the MBCS-ARM are analyzed and compared to the algorithms in the KEEL tool. Let us notice that two variants of the MBCS-ARM are reported, denoted as MBCS-ARM± and MBCS-ARM+, where the first includes the possibility of mining positive an negative intervals of a attribute in association rules, while the latter searches only for positive intervals.
The following experiments have been performed: comparison of MBCS-ARM± and MBCS-ARM+, comparison of the best MBCS-ARM variant with evolutionary algorithms in the KEEL tool, comparison of the best MBCS-ARM variant with traditional algorithms.
Firstly, both variants of the proposed MBCS-ARM will be compared against each other, then the better one will be compared to the evolutionary algorithms, which were presented in Section 4. The comparison will be made on the basis of the following measures: number of rules produced, support, confidence, coverage (defined as the support of the antecedent), and antecedent length. Secondly the best MBCS-ARM variant will also be compared to traditional association rule mining algorithms, also using the same measures as with the EAs. Then lastly some interesting rules will be presented, obtained with the MBCS-ARM. All reported values are averages of 10 runs.
All experiments are discussed in detail in the following subsections.
Comparison of MBCS-ARM± and MBCS-ARM+
Let us analyze both variants of the MBCS-ARM from Table 3. The following conclusions can be made: the MBCS-ARM+ almost always produces more rules than MBCS-ARM±, and these rules are longer, based on the average number of antecedents. Longer association rules are often harder to understand than those of shorter length. It is also obvious that the MBCS-ARM+ produces rules, which in general have lower support than MBCS-ARM±, which implies those rules may be uninteresting. Also the coverage of the records in the database is way lower in all cases for the MBCS-ARM+, so it is obvious that the MBCS-ARM± is the better performing algorithm, and is going to be the base for comparing to other EAs in this study.
Comparison MBCS-ARM± and MBCS-ARM+. The reported results are averages of 10 individuals runs
Comparison MBCS-ARM± and MBCS-ARM+. The reported results are averages of 10 individuals runs
A similar analysis can be made when comparing the MBCS-ARM± to other algorithms. The results of this experiment are collated in Table 4. The proposed algorithm obtains higher support averages than others on 5 datasets, while the confidence obtains fairly good results. The balance between these two measures can be easily regulated by the β and γ parameters, which weigh the values of support and confidence in the objective value. Although no attention was payed to those parameters, good confidence was obtained on all datasets. When comparing the coverage of records, the EARMGA was best with 100% on all datasets, but support is low on some datasets. On the other hand, the lowest coverage of records was obtained by GENAR, but due to the fact that the algorithm involves all attributes in rule generation. It is also worth noting that both GAR and GENAR had trouble finding any rules at all on some datasets. The rules mined by Alatasetal are the closest to MBCS-ARM± in quality based on results in Table 4, since both algorithms include the possibility for finding positive and negative intervals of attributes in association rules. Let us emphasize that the rules found by the proposed algorithm are short in length, thus ensuring an easier understanding from the user’sperspective.
Comparison with single-objective evolutionary algorithms. The reported results are averages of 10 individuals runs
Comparison with single-objective evolutionary algorithms. The reported results are averages of 10 individuals runs
In this section the proposed algorithm will be compared with three traditional association rule mining algorithms, all presented in Section 2. A mandatory discretization of datasets, which include quantitative attributes was performed in order to effectively mine association rules with traditional algorithms. The used discretization was equal frequency sampling, which introduces new attributes with intervals. Many other discretization algorithms exist in the literature, but they often require some knowledge of the data being processed. To keep things simple the mentioned sampling into four intervals was used for each quantitative attribute (as used in [3]).
The results of the last experiment are collated in Table 5. The proposed MBCS-ARM± achieved the highest support for all datasets, with a high value of confidence. The reason lies in the possibility of finding negative attribute intervals. The obtained rules also cover the datasets better in 7/8 cases, and these rules are on average shorter, thus easier tounderstand.
Comparison with traditional algorithms (Apriori, Eclat, and FP-Growth)
Comparison with traditional algorithms (Apriori, Eclat, and FP-Growth)
In this section we present and analyze some rules mined by the MBCS-ARM±. Table 6 shows some interesting rules obtained from the Car, Quake, and Stock datasets. All rules have their interesting measures reported alongside.
Some examples of interesting rules found by the MBCS-ARM±
Some examples of interesting rules found by the MBCS-ARM±
The rules in Table 6 can be interpreted as follows: Car: if the car safety is proven to be low, then the car is not accepted very good by the consumers (it is tend to sell less). Quake: if the longitude is not in the range[-180.0,-67.8), then the Richter scale of the quake is not in the range [6.2,6.9, 6.2,6.9]. Stock: if the stock price of company3 is between 12.8 and 16.2, then the stock price of company2 is not between 49.1 and 53.4.
A single-objective binary cuckoo search using a novel individual representation was proposed in this paper. The representation tackles the problems of large dimensionality, while also support the mining of both positive and negative intervals of attributes in association rules. The proposed MBCS-ARM algorithm produces rules, which are interesting, simple, easy to understand, and offer good coverage of the dataset. It uses an objective function composed of the support and confidence, weighted by two parameters. These parameters control the importance of each measure, and give the user the control for finding rules with either greater support or confidence.
MBCS-ARM was tested on many publicly available databases, and compared to several evolutionary and traditional algorithms, all available in the KEEL tool. The experiments show promising result compared to other algorithms, based on the interesting rules, and also provide rules which include a lower number of attributes.
For future work we would like to test the proposed method on larger databases (with millions of records), perform a large scale study of parameters of the algorithm, while introducing new objective functions. There is also an interesting growth in hybridizing SI algorithms [26], so a step in that direction could definitely improve the results.
