Abstract
Association rule mining (ARM) is an important research issue in the field of data mining that aims to find relations among different items in binary databases. The conventional ARM algorithms consider the frequency of the items in binary databases, which is not sufficient for real time applications. In this paper, a novel hash table based Type-2 fuzzy mining algorithm (T2FM) with an efficient pruning strategy is presented for discovering multiple fuzzy frequent itemsets from quantitative databases. The algorithm employs a hash table based structure for efficient storage and retrieval of item/itemset which reduces the search efficiency to O(1) or constant time. Previously, type-2 based Apriori and FP-growth based fuzzy frequent itemsets mining were proposed, which required large amounts of computation and a greater number of candidate generation and processing. Meanwhile, the proposed approach reduces a huge amount of computation by finding the common keys before the actual intersection operation takes place. An efficient pruning strategy is proposed to avoid unpromising candidates in order to speed up the computations. Several experiments are carried out to verify the efficiency of the approach in terms of runtime and memory for different minimum support threshold and the results show that the designed approach provides better performance compared to the state-of-the-art algorithms.
Keywords
Introduction
Knowledge Discovery in Databases (KDD), also called pattern mining [1–3] is an important research problem in many tasks since it can discover implicit and hidden information from the dataset. Apriori [1] is the first fundamental algorithm that can be used to find the associations among the item(s) in the dataset. Since the Apriori algorithm is a level wise approach, a higher computational cost is needed to generate the candidates and then evaluate them level-by-level for generating frequent itemsets. FP-growth [4] is an improved algorithm that generates frequent itemsets by compressing the transactions into a tree structure called FP-tree. Most ARM algorithms focus on mining frequent itemsets (FIs) or association rules from binary datasets, in which they consider whether the item(s) appears or not in the transaction. Important factors like weight, interestingness, quantity and importantness are not considered in ARM.
Fuzzy-set theory [5, 6] can be used in many intelligent systems, such as manufacturing, engineering fields, or medical diagnosis, since fuzzy-set based knowledge representation is more interpretable for human reasoning. Fuzzy based system also converts the quantitative value of items into meaningful linguistic terms with the corresponding degrees that can be used for making efficient decisions.
Many algorithms have been proposed to handle quantitative datasets for mining fuzzy frequent itemsets based on fuzzy set theory [7–9]. However, most of the introduced approaches consider only one linguistic term with the maximum scalar cardinality of an item. So the discovered information may not be complete for decision making. Several other algorithms consider multiple fuzzy terms [10–13] to generate a complete set of fuzzy frequent itemsets, which provide more complete and sufficient knowledge for better and useful decision making.
The above specified methods mostly use the type-1 fuzzy set theory for discovering the required knowledge such as ARs or FIs. Mendel and John [14] introduced the type-2 fuzzy set theory that involves uncertain factor to discover the required information for effective decision making. Chen et al. [15] uses type-2 fuzzy set model with the level-wise approach for the generation of frequent itemsets from the quantitative datasets. However, this approach does not provide the complete information because it considers single linguistic term of an item. Lin et al. [16], designed a list based method for the mining of type-2 fuzzy frequent itemsets, which speed up the mining process compared with the level-wise approach. However, this approach does not have the effective pruning methods to reduce the search area for the effective mining of frequent itemsets.
In this study, a novel hash table based Type-2 fuzzy mining algorithm (T2FM) is introduced with the purpose of effective mining of fuzzy frequent itemsets from quantitative datasets. T2FM is a simple and fast algorithm for knowledge mining since it uses the hash table data structure for the storage and retrieval of fuzzy itemsets, which reduces the search efficiency to O(1) or constant time. The algorithm finds the common keys before doing the actual intersection operation, which increases the execution speed of the algorithm. It uses two hash tables, each for storing 1-itemsets and fuzzy frequent itemsets. No other data structure is used in the proposed approach.
The remaining of the paper is organized as follows: Section 2 reviews the related works. The preliminaries and the problem statement of the type-2 fuzzy frequent itemsets mining are given in section 3. The proposed T2FM algorithm and the data structure used are described in section 4. The application of T2FM to breast cancer detection is described in section 5. Experimental evaluations and discussions are shown in section 6. Finally, the conclusion is given in section 7.
Literature review
Association rule mining [1–3] is an important research issue in data mining that finds the relationship among the item(s) in the databases. Apriori [2] is the first algorithm that generates frequent itemsets step by step using prior knowledge, while infrequent candidates are removed at each step. This method requires either n or n + 1 searches of the entire transaction database, where n is the maximum length of the frequent itemset. That is, the method uses a lot of memory and takes a long time to search the frequent itemsets due to repetitive searches of the database.
To solve this problem, Han et al., [4] introduce the FP-growth algorithm that finds the complete set of FIs without generating a huge number of candidate itemsets. The FP-Growth transforms the transactional dataset into a FP-Tree structure in which the frequent itemsets are searched through a recursive exploration process. However, the size of the tree and the number of branches become too large in the case of rare frequent item sets. Furthermore, several extensions of association rule mining, such as maintenance algorithm [17] for data stream, sequential pattern mining [18], high-utility itemset mining [19], have been proposed for different applications.
Even though these algorithms find frequent items effectively, they cannot be applied to quantitative datasets. It is not easy to discover the relationships between items in the quantitative database. Fuzzy set theory [6, 20] is a solution for discovering knowledge for human reasoning and is used in many intelligent systems. Fuzzy set theory is used for converting quantitative numbers of items into meaningful linguistic terms with the corresponding degrees. Thus, several fuzzy mining algorithms were then introduced. For example, Au and Chen [21] developed F-APACS for mining fuzzy association rules based on fuzzy linguistic terms to recognize the discovered regularities.
Hong et al. [7] designed an algorithm for mining fuzzy association rules based on the generate-and-test approach that deals with quantitative datasets. Chen et al. [22] designed a model for mining multi-level fuzzy association rules that is based on a cumulative probability distribution. Watanabe and Fujioka [23] have implemented the fuzzy association rule mining algorithm based on redundancy equivalence and theorems. Chang et al. [24] designed an ISPFTI algorithm to mine the complete frequent sequences within the fuzzy time intervals based on fuzzy set theory.
Several other algorithms have been developed for mining the necessary information in different application domains based on fuzzy set theory [25–30]. Lin et al. [8] proposed the FFP-tree (Fuzzy Frequent Pattern Tree) algorithm for the mining of fuzzy frequent itemsets that compress the fuzzy 1-itemsets into tree structure for the subsequent mining process. Since this approach has a loose tree structure, a compressed fuzzy frequent pattern (CFFP) tree [31] algorithm is proposed to reduce the number of nodes in the tree structure. Though, this approach needs more memory for storing the attached array, it also has the problem of memory leakage. The UBFFP-tree (Upper Bound Fuzzy Frequent Pattern-Tree) algorithm [9] is then designed to solve this problem, which reduces the memory leakage problem by keeping a more condensed tree structure to handle big datasets.
These algorithms [5, 32] use a single linguistic term as the representative for the processing of the item, so the information derived may not be sufficient for better decision making. Thus, several multiple fuzzy frequent itemsets mining algorithms [12, 33] are developed for making better strategies or decisions with sufficient information. These algorithms consider multiple linguistic terms as the representative terms of an item for deriving complete information, which helps make better decisions.
The above specified approaches follow type-1 fuzzy-set theory [6], in which the uncertainty factor is not taken into account. Since the type-1 fuzzy-set membership functions are crisp, they are not enough to handle the uncertainty model in real time applications. To solve this problem, type-2 fuzzy set theory [34, 35] is then developed, which takes the uncertainty factor into account. Chen et al. [15] designed the first Apriori-based approach for type-2 fuzzy set based FFIs mining. The algorithm uses a level-wise generate-and-test strategy based on type-2 fuzzy membership functions. Since this is a time consuming process, the transformed fuzzy values of the linguistic terms are converted into type-1 fuzzy values based on the centroid mechanism. However, this approach consumes more time for the generation of candidates. Also, a single linguistic term of an item is considered by the model, which may not produce sufficient information for better decision making.
Hence, there is a need to derive complete information that must be sufficient for efficient decision making. In this paper, an efficient system is designed that consists of: i) an efficient structure for storing and retrieving information; ii) reducing the multiple scans of the datasets; iii) an efficient pruning strategy to speed up the mining process; and iv) generating complete information for efficient decision making (i.e., MFFIs–Multiple Fuzzy Frequent Itemsets).
Preliminaries and problem statement
The background preliminaries of the problem and the problem statement of hash table based type-2 fuzzy frequent itemsets mining are stated below.
Preliminaries
Suppose that I = {i1, i2, i3, . . . . . . , i m } is a finite set with ‘m’ distinct items. A quantitative database ‘D’ is considered a set of transactions ‘n’ such that D = {T1, T2, T3, T4, . . . .T n }. Each transaction in the dataset has its own unique identifier, referred to as a TID in the dataset. Moreover, each item i j in a transaction T q comprises its purchase quantity, denoted as q(i j , T q ). An itemset with ‘k’ distinct elements is called a k-itemset. A minimum support threshold denoted as δ, which can be adjusted by users’ preference. Table 1 is a set of quantitative transactions that is considered as a running example in this paper. The example dataset from Table 1 has 10 transactions with distinct items from (A) to (F). The minimum support threshold δ is assumed to be (δ= 20%). To illustrate the steps of the proposed algorithm, the membership function shown in Fig. 1 is used. Figure 1 shows the membership functions (μ) with linguistic 3-terms in type-2 fuzzy set theory. It is seen that there are three linguistic terms Low(L), Middle(M), and High(H) in the given membership function. The user can specify the number of linguistic terms based on the application and requirements.

Membership functions (μ) with linguistic 3-terms in type-2 fuzzy set theory.
An example quantitative dataset
For example, from the sample dataset in Table 1, there are six linguistic variables from (A) to (F) and each linguistic variable has three linguistic terms: low, middle and high.
For example, the values for the items (A), (C), and (E) in T1 are vA1 (= 5), vC1 (= 4), and vE1 (= 1) respectively.
For example, HTC1 stores data in the form of an item as the key, and the values are in the form of another hash table in which values are stored in the form of objects.
Let us assume that the itemset B@H.keySet = {5, 6, 8, 9} and E@M.keySet = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10,} then the itemset B@H:E@M.keySet = {5, 6, 8, 9.}
For example, the value for the itemset ‘B@H:E@M’ at t id ‘9’ is the minimum value of the itemsets B@H (0.1) and E@M (0.55), i.e., min(0.1, 0.55) is ‘0.1’.
Each entry in the hash table of fuzzy frequent itemsets consists of three ordered fields as: tid, fv and MRF. tid indicates the fuzzy term X exists in transaction T. fv indicates the fuzzy membership value of the term X in T. MRF indicates the maximum remaining fuzzy value after X in T.
The aim of this paper is to propose an efficient method for mining multiple fuzzy frequent itemsets (MFFIs) by using a type-2 fuzzy membership function with an efficient pruning strategy. The problem can be defined formally as,
Since the traditional methods of fuzzy frequent itemsets [8, 31] consider only itemsets with maximum scalar cardinality for processing, the information derived may not be complete and not enough to make better and more accurate decisions.
In this paper, multiple fuzzy frequent itemsets are considered to represent the discovered fuzzy frequent itemsets. Thus, all the linguistic terms that satisfy the minimum support count threshold are considered, i.e., more than one term of an item may form the association rules. Also, a hash table data structure is used for maintaining and generating the itemsets, which significantly reduces the execution time since its searching time is O(1) or constant time. An efficient pruning strategy is proposed to avoid unpromising candidates in order to speed up the computations. The detailed description of the proposed T2FM (Type-2 based fuzzy frequent itemsets mining) algorithm is given in the following section.
Proposed T2FM Algorithm
The T2FM algorithm is a hash table based type-2 fuzzy mining algorithm with an efficient pruning strategy to speed up the process. It consists of two major parts, namely i) fuzzy C1-itemset formation and ii) generation of fuzzy frequent itemsets. During the first part, called data pre-processing, the quantitative value of each item in the transactions is converted into a form appropriate for fuzzy mining. During the second part, the algorithm discovers the complete set of fuzzy frequent itemsets. Figure 2 shows the overall architecture of the proposed fuzzy mining system. Notations used in the algorithm are given below.

Architecture of proposed fuzzy mining system.
Notation
D - a quantitative dataset
N - number of transactions in D
μ - a set of type-2 fuzzy membership functions
h - number of fuzzy regions
Ti -- ith transaction in the dataset (1≤i≤N)
i j .Rjl - l-th fuzzy region of item i j
f ijl upper -vij’s upper fuzzy membership value for region R jl
HTC1 – Hash table stores candidate 1-itemsets
HTC1S - Sub table of Hash table HTC1
HTFZ – Hash table stores fuzzy frequent itemsets
HTFZS - Sub table of Hash table HTFZ
δ - minimum support threshold
keySize() - total number of keys in the Hash Table
keyset() - set of keys in the Hash Table
fv - fuzzy membership value
MRF - Maximum remaining fuzzy value
Fuzzy Candidate 1-itemsets (C1) are the item(s) based on which all the fuzzy frequent itemsets are generated. In this phase, the quantitative value of each item in the transactions is converted into a type-2 fuzzy value by using the given type-2 fuzzy membership function μ. Then, the calculated type-2 fuzzy values are reduced into type-1 fuzzy values with the centroid type reduction method. To maintain the C1-itemsets, the algorithm uses a hash table called HTC1 (Hash table for Candidate 1-itemsets), which has the form of item as the key and another hash table called HTC1S (Sub of HTC1) as the value. Each entry in HTC1S has the form of (key, value) pairs, with TID as the key and fuzzy values for all the linguistic terms of the corresponding variable as the values. For each entry in HTC1S, the corresponding “Total” is updated when the item is an existing member of HTC1. Otherwise, “Total” is assigned with the calculated value “cval” and added as the new entry for HTC1S. In both cases, HTC1S is added as the value for the key of the item. Figure 3 shows the structure of the HTC1 hash table. The algorithm for the generation of fuzzy c1-itemsets is shown in Fig. 4.

Structure of HTC1 Table.

Algorithm for Fuzzy c1-itemsets generation.
In this phase, the actual fuzzy frequent itemsets mining takes place with the help of the minimum support threshold δ. The output of the first phase (set of C1-itemsets) and δ are the inputs to this phase. It consists of two parts: i) L1-itemset generation and ii) Lk-itemset (k > 1) generation. L1-itemsets are formed by filtering all the elements that satisfy δ from HTC1. In this phase, a hash table HTFZ (Hash table for Fuzzy Frequent Itemsets) is used to store all fuzzy frequent itemsets, which also has the form of Item@Region as the key and HTFZS (Sub of HTFZ) as the value of HTFZ. HTFZS stores a set of TIDs with the fuzzy membership value and the maximum remaining fuzzy value for the respective region of the corresponding item. The algorithm initializes the maximum remaining fuzzy value for each item by comparing every item’s MRF with successive items’ fuzzy values and considering whether it is greater.
In the second part, the algorithm generates L K (K > 1) from the set of L1-itemsets. Every item ‘x’ in HTFZ is intersected with every other item ‘y’ in HTFZ if x and y not belong to the same item i. The algorithm first determines the intersection of TIDs of x and y referred to as xy.keySet. Then, for each element in xy.keySet, find the intersection by considering the minimum fuzzy values of x and y. Meanwhile, the maximum remaining fuzzy value is the MRF of the item y in the respective t id . For every intersection operation, the “Total” value of the respective key “xy” is updated. After completion of the intersection operation on two items, the resultant itemset is added to HTFZ only when its support count value is greater than or equal to δ. A separate list called “item To Process” is maintained to store every new itemset that has a MRF greater than δ. This list will be used later for generating the next level of frequent itemsets. Repeat the steps 24 to 45 until the list “item To Process” becomes less than or equal to “1”. Figure 5 shows the structure of the fuzzy frequent itemsets table. The T2FM algorithm for mining multiple fuzzy frequent item sets is given in Fig. 6.

Structure of fuzzy frequent itemsets table.

T2FM Algorithm for mining multiple fuzzy frequent itemsets.
In this section, a running example is given to illustrate how the MFFIs are discovered from the set of quantitative transactions by the proposed approach. The quantitative dataset shown in Table 1 is considered as an example. The dataset consists of 10 transactions and 6 distinct items, denoted A to F.
The minimum support threshold δ is assumed to be 20% and the type-2 fuzzy membership function shown in Fig. 1 is used for fuzzification. In this example, the fuzzy membership function has three linguistic terms: low, middle and high; and their range of values is assumed to be low[∞ and (3, 3.5)], middle[(0.5, 1) and (5, 5.5)] and high[(2.5, 3) and ∞]. The fuzzy C1-itemset generation for the example is described below.
First, the quantitative values of the items are transformed into type-2 fuzzy values. Consider the first item “A:5” in transaction 1 as an example. The value
of “5” is converted into (0, 0.2)/A.M + (1.0,1.0)/A.H; by using μ in Fig. 1. The same step is repeated for all the other items in all transactions. The transformed results are shown in Table 2. Then, the calculated type-2 fuzzy values are reduced into type-1 values by the centroid type-reduction method. In this example, for the item “A:5”, the transformed fuzzy value (0, 0.2)/A.M + (1.0,1.0)/A.H; is reduced into (0.1)/A.M + (1.0)/A.H and the resultant representation for all the items is shown in Table 3. For each item’s presence on successive transactions, the corresponding “Total” is added. In this example, the item “A” exists on TIDs T1, T2, T3 and T5, and the “Total” calculated is [A.Low: 1.1; A.Middle: 1.75; A.High: 1.65]. At the end of the first phase, the hash table HTC1 contains all the fuzzy C1-itemsets of the example dataset.
Fuzzified data with Type-2 membership function from Table 1
Fuzzified data with Type-2 membership function from Table 1
Reduced dataset with centroid type-reduction method
During the second phase of the algorithm, L1 and Lk(k > 1) itemsets generation steps take place. In this phase, L1-itemsets are formed by filtering all the elements that satisfy δ as 20%. Since the example dataset has 10 transactions, δ is calculated as 2 (10*20/100 = 2). From Table 3, it is seen that B@M (2.75), B@H (2.2), C@H (2.65), E@L (3.85), E@M (5.05) and E@H (2.3) are satisfied and therefore these elements are only filtered out and stored in the hash table HTFZ. For every level of updation into HTFZ, the keys are copied into the list “item To Process” for the generation of next level frequent itemsets. In this example, initially the list “item To Process” has the elements [B@M, B@H, C@H, E@L, E@M, E@H] once the L1-itemsets are copied to HTFZ. Then, successive levels of LK-itemsets are generated.
In this example, since B@M and B@H belong to the same item, they do not intersect themselves. Similarly, E@L, E@M and E@H do not intersect themselves. But they can intersect with other itemsets. Consider the items B@H and E@M; initially, the TIDs that are common to both itemsets are determined by choosing the common keys from the keySets for these two itemsets. Since, T5, T6, T8 and T9 are common to both B@H and E@M, intersection is calculated by taking the minimum value of these TIDs for these basic items. Here, min(B@H, E@M) for the TIDs (T5, T6, T8 and T9) are [1, 0.55, 0.55, 0.1 = 2.2 (sum)] which satisfies the minimum support count threshold, combination of B@H:E@M is inserted into HTFZ as the key and [T5:1, T6:0.55, T8:0.55, T9:0.1, Total:2.2] is stored as the value in the form of hash table HTFZS. Similarly, the same process is repeated until “item To Process” becomes null. In order to speed up the computation process, the algorithm identifies and removes the unpromising nodes based on the maximum remaining fuzzy values of the itemsets. For every intersection operation, T2FM takes the MRF of the second base itemset as the resultant MRF. Even though the resultant itemset is a fuzzy frequent itemset, it is considered for further processing only when its MRF is greater than δ. After the completion of this phase, the hash table HTFZ contains the complete set of multiple fuzzy frequent itemsets of the example dataset, as shown in Table 4.
Set of Fuzzy frequent itemsets derived from the example dataset
According to a study from a cancer research institute, breast cancer is the most serious disease that impacts people worldwide and is the fifth-largest cause of death in women. The two types of breast lesions that can be diagnosed are malignant and benign lesions [39]. In the UK, about 48,000 cases occur every year, and around one in nine women is affected in their lifetime [40]. This disease can be cured easily if diagnosed at an earlier stage. Mammography is one of the methods frequently used to diagnose breast cancer. But the mammography identifies the breast abnormalities with an accuracy level of 85–90%. On the other hand, fine needle aspiration cytology (FNAC) has a correct identification rate of 90%. Hence, there is a need to implement a better identification method for the diagnosis of breast cancer.
In this section, the T2FM is applied to the detection of breast cancer. The core features with the appropriate ranges of values are detected through the algorithm. The algorithm is applied to the Wisconsin Breast Cancer Database (WBCD) [38] and the features are analyzed. The dataset has 699 instances, among which 241 records (34.5%) are malignant, whereas 458 records (65.5%) are benign. In the experiment, the dataset is divided into benign and malignant categories and the features with an appropriate range of values are observed separately. Table 5 shows the set of frequent items generated with an appropriate range of values through T2FM on WBCD datasets.
Frequent itemsets formed on WBCD through T2FM
Frequent itemsets formed on WBCD through T2FM
The output shows that the feature ‘mitoses’ with a low range of values has the highest support percentage on benign (98.03%). But it also supports 63.04% in the malignant category. Thus, it shows that the feature ‘mitoses’ does not contribute to the diagnosis of breast cancer. The output also shows that the higher range of clump thickness (above 90%), shape uniformity (71.74%) and Bare Nuclei (82.38%) indicate that the person has a higher possibility of breast cancer. It is also seen that the low range of Bare Nuclei (bare), Normal Nucleoli (normal), Marginal Adhesion (adh), and Uniformity of Cell Size (sizeu) confirm the non-existence of breast cancer. Table 5 shows the complete set of fuzzy frequent itemsets generated through T2FM on the WBCD dataset. For clear readability, maximally frequent items are only shown in the table.
In this section, the performance of the proposed algorithm “T2FM” is compared with the recently proposed LFFT2 [36] and Apriori based approach [15] in terms of execution time and memory usage. Three algorithms have been implemented with Java in NetBeans IDE 6.0.1. Experiments have been done on computer with Intel(R) Core(TM)i5-3210M CPU @ 2.50 GHz processor having 6.00 GB main memory. It is to be noted that the Apriori based approach considers only the single linguistic term of an item. On the other hand, LFFT2 considers multiple fuzzy frequent itemsets. Both real and synthetic datasets, like Chess [37], Mushroom [37] and Breast Cancer [38] were used in the experimental evaluations. Parameters and characteristics of the datasets used for the implemented algorithms are shown in Tables 6 and 7, respectively.
Parameters of the dataset used
Parameters of the dataset used
In this section, the time taken by the proposed algorithm is compared with other state-of- the art algorithms Apriori and LFFT2. Runtime is measured by executing three algorithms for the datasets given in Table 7 with varying values of the minimum support threshold. Figure 7 shows the time taken by the proposed T2FM, Apriori, and LFFT2 for different datasets. It is observed that the proposed T2FM takes less time compared with the other two algorithms since it uses a hash table data structure, whose search efficiency for insertion or retrieval of items is O(1). Also, the algorithm does not consider the itemset as frequent if the “Total” of the fuzzy term is less than the minimum support count threshold, and thus it will be removed directly. Except candidate 1-itemsets and fuzzy frequent itemsets, none of the others are maintained in T2FM, which also increases the speed of the algorithm. On the other hand, the apriori-based approach takes more time due to the generation of level-wise candidate generation and multiple scans of the database, which considerably reduces the execution speed of the algorithm. Even though LFFT2 avoids multiple scanning of the dataset, it takes comparatively more time with T2FM because of the use of the list-based structure and an explicit pruning strategy, while in T2FM, non-frequent itemsets are automatically removed from the structure.

Runtime comparisons of Apriori, LFFT2 and T2FM on (a) Breast cancer (b) Chess and (c) Mushroom datasets.
Characteristics of the dataset used
In this section, the memory usage of the proposed algorithm is compared with other state-of-the-art algorithms Apriori and LFFT2. Memory usage is measured by executing three algorithms on the datasets: Breast cancer, Chess and Mushroom for varying values of the minimum support threshold. Figure 8 shows the memory usage comparison of three algorithms. It shows that T2FM consumes little memory compared with other algorithms due to the compact representation of itemsets in the hash table structure. Also, T2FM maintains only the needed data, like the candidate 1-itemsets and frequent itemsets. Other unneeded data is not maintained in T2FM, which considerably reduces the memory usage of the algorithm. On the other hand, Apriori maintains candidate itemsets in addition to frequent itemsets. Even though LFFT2 does not maintain candidate itemsets, it stores the maximum remaining fuzzy membership value for each TID of the fuzzy itemsets, which in turn consumes a little more memory than T2FM.

Memory usage comparisons of Apriori, LFFT2 and T2FM on (a) Breast cancer (b) Chess and (c) Mushroom datasets.
This paper proposes a novel type-2 based fuzzy mining algorithm for the generation of multiple fuzzy frequent itemsets. The algorithm uses a hash table structure for efficient storage and retrieval of frequent itemsets, which reduces the search efficiency to O(1) or constant time. An efficient pruning strategy was designed to avoid the unpromising candidates in advance so that execution speed is increased. Based on the proposed model, complete and sufficient information is produced, which helps to make efficient strategies for better decision making. The algorithm was applied to the Wisconsin Breast Cancer Dataset to find the features with appropriate ranges of values for the diagnosis of breast cancer. Experimental evaluation performed on real and synthetic datasets shows that the developed T2FM achieves better performance compared to other recent state-of-the-art algorithms, Apriori and LFFT2, in terms of execution time and memory usage.
The dataset used in the algorithm is assumed to be static. But in the real world, application data is dynamically added to the database. Thus, in the future, T2FM will be extended to handle dynamic datasets. Further improvements will also be performed in the T2FM.
Author contributions statement
Funding
“Not Applicable”
