An intuitionistic fuzzy-rough set model and its application to feature selection

Abstract

Due to the development of modern internet-based technology, the electronically stored information is growing exponentially with time. It is highly challenging to select relevant and non-redundant features of the real-valued high dimensional datasets. Feature selection, a preprocessing technique, refers to the process of reducing the dimension of the input data in order to extract the most meaningful features for processing and analysis. One of the numerous useful applications of rough set theory is the attribute or feature selection, but it has certain limitations as it cannot be applied on real-valued data sets directly because rough set based feature selection can handle discrete data only. In order to deal with real-valued data sets, discretization method is applied to convert dataset from real-valued to discrete, which usually leads to information loss. Fuzzy rough set theory is profitably applied to address this problem and retain the semantics of real-valued datasets. However, intuitionistic fuzzy set can deal with uncertainty in a much better way when compared to fuzzy set theory as it considers membership, non-membership and hesitancy degree of an object simultaneously. In this paper, an intuitionistic fuzzy rough set model is established by combining intuitionistic fuzzy set and rough set. Furthermore, we propose a novel approach of feature selection derived from this model. Moreover, we develop an algorithm based on our proposed concept. Finally, our approach is applied to some benchmark data sets and compared with the existing fuzzy rough set based technique. The performed experiments show the superiority of our approach.

Keywords

Rough set fuzzy set intuitionistic fuzzy set degree of dependency feature selection

1 Introduction

Nowadays, all business organizations regularly receive data due to advancement of internet-based technology on millions of observations across diverse subjects, at regular time periods, brands, predictor variables and storage locations. Every day quintillions bytes of data are recorded at diverse sources like sensor data pertaining to climate information, agricultural information, and census information, etc., that get posted to many related social media sites like digital pictures and videos, purchase and sale transaction records and GPS signals of cell phones etc. to name a few [9, 10]. Such huge amount of data from various sources may lead to lots of difficulties in learning by the system through various classifiers because of several redundant features available in datasets. The operational cost of computation and time needed during classification process are extremely sensitive to the number of features used to build a rule-based classifier. The existing features in a dataset can be categorized as redundant, irrelevant, misleading and predictive or relevant features. The presence of irrelevant and redundant features increases the computational load and the availability of misleading features in a dataset reduce the overall classification accuracy as it leads the classification task negatively. However, predictive or relevant features are better performing in the process of interclass discrimination and always lead to better average classification accuracy. So, it is essential to remove misleading, redundant and irrelevant features available in a dataset by preserving predictive features. To achieve this, various dimensionality reduction techniques are applied to acquire more significant and less or no redundant features. Feature selection or attribute reduction is the most frequently used dimensionality reduction technique. Several factors motivate for dimensionality reduction in various problem solving systems [12]. Feature Selection can be applied as the valuable preprocessing step for the application in tasks that involve high dimensional datasets, such as gene expression microarray data, which might be impossible to process further for some learning algorithms. Feature Selection is successfully applied to small and medium sized datasets with the purpose of finding the most informative features. Feature selection is used as preprocessing step in various important fields like data mining, signal processing, machine learning and bioinformatics, etc. [1 –8].

Rough set theory (proposed by Pawlak) [13 –15] can be used as a tool to determine data dependencies and to reduce the dimension of the dataset using the information from data alone without any additional information. Rough set based attribute selection approach has been successfully implemented to reduce the number of features with preserving the essence of the features. Rough set theory can produce the most informative subset of features from original attribute set of a dataset with discretized feature values. In case of real-valued datasets, rough set theory struggles as it can be applied to the datasets having features containing symbolic values only. So, discretization is applied to the real-valued datasets before using rough set based approach for feature selection, which usually leads to information loss. This problem was solved by using fuzzy rough set based approach for feature selection, which can tackle real-valued datasets in a better way.

D. Dubois and Henry Prade [16, 17] combined fuzzy set [23, 24] (proposed by Zadeh [18, 19]) and rough set (proposed by Pawlak) and presented the concept of fuzzy rough set. Fuzzy-rough set theory is evolved to address the two analogous and complementary concepts, viz., vagueness (for fuzzy set) and indiscernibility concept (for rough sets) with distinct notions and both the concepts are generated as the results of uncertainty in knowledge, which can be implemented for feature selection of datasets containing either discrete or continuous or heterogeneous attributes. The concept of a dependency function in a traditional rough set model into the fuzzy occurrence was proposed by Jensen and Shen [35] and introduced a feature selection algorithm using fuzzy rough set concept and improved in [11 , 29–34] and [36 –42].

Many uncertainty problems cannot be suitably handled by fuzzy set. For example, some base patterns exist in most of the medical diagnosis problems, and decisions are made by the experts on the basis of similarity between unknown sample and the base patterns, but the uncertainty is not found just in the judgment, but also in the identification. Hence, there is a necessity for different kind of fuzzy sets which could support the latter uncertainty.

2 State of the art

Intuitionistic fuzzy set [20 –22] is an extension of Zadeh’s fuzzy set. Intuitionistic fuzzy sets have the better ability of narrating and describing ambiguities of the objective world than the traditional fuzzy sets as it considers the positive, negative and hesitancy degrees of an object simultaneously. Intuitionistic fuzzy sets have been efficiently applied to solve many of the decision problems. In recent years, IF set theory has been effectively applied in the field of pattern recognition, decision analysis, medical image processing, etc. In spite of the fact that rough sets and IF sets both capture specific aspects of the same idea-imprecision, the combination of IF set theory and rough set theory are rarely discussed by the researchers. Jena and Ghosh [55] demonstrated that lower and upper approximations concept of IF rough sets are again IF sets. In the last few years, some of the IF rough set models have been proposed by the researchers in [23 –25] and [44 –46]. Coker [43] established relationship between rough set and IF set and revealed the fact that fuzzy rough set is admittedly an intuitionistic L-fuzzy set. Bing Huang et al. [26] presented dominance intuitionistic fuzzy rough set and showed its various applications. Moreover, some research articles have presented intuitionistic fuzzy rough set based feature selection or attribute reduction approaches [57 –64]. However, none of the proposed feature selection techniques based on intuitionistic fuzzy set theory considered individual objects of information system and none of them are implemented for real-world datasets.

In the current work, we present a novel approach for the feature selection based on intuitionistic fuzzy rough set model by considering individual objects details in the information system. We define lower and upper approximations by combining intuitionistic fuzzy set and rough set and propose a novel intuitionistic fuzzy rough set framework. Furthermore, we define a dependency function based on this framework to calculate degree of dependency between decision attributes and conditional attributes. Moreover, a suitable algorithm is given for the wide applicability of our proposed concept to calculate the reduct set. Finally, this algorithm is applied to the benchmark datasets and a comparison has been presented with already existing fuzzy rough set based approach.

3 Preliminaries

3.1 Rough set theory

Rough set theory (RST) [13 –15] can be used to extract knowledge from a domain in a concise way while retaining the information content. In RST, discernibility of two objects plays crucial role for feature selection. Suppose (U, A) be an information system, where U is a non-empty collection of finite objects and A is a non-empty collection of finite attributes such that a : U → V_a for every a ∈ A. V_a is the set of values that attribute a may take. For any P ⊆ A there is an associated equivalence relation R_P on U defined as: $R_{P} = {(x, y) \in U^{2} | \forall a \in P, a (x) = a (y)}$ (1)

If (x, y) ∈ R_P, then x and y are said to be indiscernible by attributes from P. We denote equivalence classes generated by equivalence relation R_P as [x] _P. Now, lower and upper approximations of X ⊆ U are defined as below: $R_{P} ↓ X = {x \in U | {[x]}_{P} \subseteq X} \dots$ (2) $R_{P} ↑ X = {x \in U | {[x]}_{P} \cap X \neq φ} \dots$ (3)

The ordered pair 〈R_P↓ X, R_P ↑ X 〉 is known as rough set.

3.2 Fuzzy-rough set theory

Since most datasets contain real-valued attributes. Hence, discretization methods are applied in order to tackle real-valued information system before attribute selection using rough set based concept, and this may lead to loss of some information. This is main drawback of RST and can be overcome by fuzzy rough set theory (FRST), which proposes an alternate method by calculating the similarity between the values using a fuzzy relation that assigns to each distinct pair of objects with their corresponding degree of similarity. Given a fuzzy set X ⊆ U and a fuzzy tolerance relation R, the lower and upper approximations of X can be calculated in several ways. From [16 , 39], a general definition can be given as follows: $R ↓ X (x) = inf_{y \in U} i (R (x, y), X (y))$ (4) $R ↑ X (x) = sup_{y \in U} t (R (x, y), X (y))$ (5)

Here, i and t represent a fuzzy implicator and a fuzzy norm respectively.

3.3 Intuitionistic fuzzy set

Let U (≠ φ) be an universe of discourse of objects. An intuitionistic fuzzy set B in U is collection of objects represented in the form B ={ 〈 x, l_B (x) , m_B (x) 〉 |x ∈ U }, where l_B : U → [0, 1] and m_B : U → [0, 1] are called degree of membership and degree of non-membership of the element x respectively, satisfying 0 ≤ l_B (x) + m_B (x) ≤1, ∀ x ∈ U . π_B (x) =1 - l_B (x) - m_B (x) represents the degree of hesitancy of x to B. It is obvious that 0 ≤ π_B (x) ≤1, ∀ x ∈ U.

Any fuzzy set B ={ 〈 x, l_B (x) 〉 |x ∈ U } can be recognized as a particular case of intuitionistic fuzzy set in the form {〈 x, l_B (x) , 1 - l_B (x) 〉 |x ∈ U }. Therefore an intuitionistic fuzzy set is considered as an extension of fuzzy set.

The cardinality of an intuitionistic fuzzy set B is defined by [56]: $| B | = \sum_{x \in U} \frac{1 + l_{B} (x) - m_{B} (x)}{2} .$

3.4 Intuitionistic fuzzy information system

An intuitionistic fuzzy information system can be defined as a quadruple IFIS = (U, J ∪ K, S, T), where U (≠ φ) is collection of finite number of objects, called universe of discourse, J (≠ φ) and K are finite sets of conditional and decision features such that J ∩ K = φ, S is the collection of all intuitionistic fuzzy values such that S = S₁ ∪ S₂, where S₁ and S₂ are domains of conditional and decision features and T is called information function which is defined as T : U × J ∪ K → S such that T (x, j) ∈ S_j, ∀ j ∈ J, S_j ⊆ S₁ and T (x, d) ∈ S₂ for K ={ d }, where T (x, j) and T (x, d) are intuitionistic fuzzy values.

4 Intuitionistic fuzzy rough feature selection (IFRFS)

In 1998, Chakrabarty et al. [23] proposed a concept to design an intuitionistic fuzzy rough set (IFRS) (L, M) of a rough set (A, B), where L and M are both intuitionistic fuzzy sets in U (non-empty set of objects) such that L ⊆ M, i . e . μ_L ≤ μ_M and υ_L ≥ υ_M. In this case, lower approximation L and upper approximation M are both intuitionistic fuzzy sets. In 2001, Samanta and Mondal [46] proposed their method to define IFRS, where they defined a couple (E, F) as intuitionistic fuzzy rough set such that E and F are both fuzzy rough sets (as proposed by Nanda and Majumdar [44]) and E ⊆ Complement (F). From [44], it is obvious that IFRS is a generalization of an intuitionistic fuzzy set, in which membership and non-membership functions are fuzzy rough sets. In 2002, Rizvi et al. [45] reported their proposal as rough intuitionistic fuzzy set, which also contains hesitation margin on lower and upper approximations.

In 2003, Cornelis et al. [24] defined the lower and upper approximations of X ⊆ U (Universe of discourse) as follows: $R ↓_{I} X (y) = inf_{x \in U} I (R (x, y), X (x))$ $R ↑_{T} X (y) = sup_{x \in U} T (R (x, y), X (x)), \forall x, y \in U .$ where, T, I, R are an intuitionistic fuzzy triangular norm, an intuitionistic fuzzy implicator and an intuitionistic fuzzy relation on U respectively. Here, a pair (R ↓ _IX (y) , R ↑ _TX (y)) represents intuitionistic fuzzy rough set.

However, all above proposed definitions do not consider memberships and non-memberships of individual objects to obtain the approximations. From literature [16 , 35], we can define intuitionistic fuzzy lower and upper approximations by considering individual objects as follows:

A relation is said to be an intuitionistic fuzzy tolerance relation if it is reflexive and symmetric [54]. Now, we define an intuitionistic fuzzy tolerance relation as follows:

Let $α = 1 - \frac{| μ_{a} (x) - μ_{a} (y) |}{| μ_{a_{max}} - μ_{a_{min}} |}$ , $β = \frac{| ν_{a} (x) - ν_{a} (y) |}{| ν_{a_{max}} - ν_{a_{min}} |}$

Then, $\begin{matrix} 〈 μ_{R_{a}} (x, y), ν_{R_{a}} (x, y) 〉 = {\begin{matrix} 〈 α, β 〉 & if α + β \leq 1 \\ 〈 1, 0 〉 & if α + β > 1 \end{matrix} \end{matrix}$

Where, μ_{R
_a} (x, y) and ν_{R
_a} (x, y) are membership and non-membership grades of intuitionistic fuzzy tolerance relation between x and y. μ_{a
_max}, μ_{a
_min} and ν_{a
_max}, ν_{a
_min} represent maximum and minimum membership and non-membership grades for attribute a.

If R_P [24, 39] is the intuitionistic fuzzy tolerance relation induced by the subset of feature P, then, $\begin{matrix} < μ_{R_{P}} (x, y), υ_{R_{P}} (x, y) > \\ =_{a \in P}^{inf} < μ_{R_{a}} (x, y), υ_{R_{a}} (x, y) > \end{matrix}$

Let IFIS be an intuitionistic fuzzy information system as defined in section 3.4. So, we can define lower and upper approximations of X ⊆ U over a set of attributes P ⊆ J by: $\begin{matrix} \underline{{approx}_{P}} (X) = < μ_{\underline{R_{P} X}} (x), υ_{\underline{R_{P} X}} (x) > \\ = min {< μ (x), υ (x) > \\ \underset{y \in U}{inf_{︸}} I (< μ_{R_{P}} (x, y), υ_{R_{p}} (x, y) >, < μ_{X} (y), υ_{X} (y) >)} \end{matrix}$ $\begin{matrix} \bar{{approx}_{p}} (X) = < μ_{\bar{R_{P} X}} (x), υ_{\bar{R_{P} X}} (x) > \\ = min {< μ (x), υ (x) >, \\ \underset{y \in U}{sup_{︸}} T (< μ_{R_{P}} (x, y), υ_{R_{P}} (x, y) >, < μ_{X} (y), υ_{X} (y) >)} \end{matrix}$

Taking intuitionistic fuzzy triangular norm T_w and intuitionistic fuzzy implicator I_w as follows [24]: $\begin{matrix} T_{w} (x, y) & = & 〈 max (0, x_{1} + y_{1} - 1), min (1, x_{2} + y_{2}) 〉 \\ I_{w} (x, y) & = & 〈 min (1, 1 + y_{1} - x_{1}, 1 + x_{2} - y_{2}), \\ max (0, y_{2} - x_{2}) 〉 \end{matrix}$

Where, x =〈 x₁, x₂ 〉 and y =〈 y₁, y₂ 〉 are two intuitionistic fuzzy values.

Now, the lower approximation of X over a set of features P can be redefined as: $\begin{matrix} \underline{{approx}_{P}} (X) = min (〈 μ (x), υ (x) 〉, \\ inf_{y \in U} 〈 min (1, 1 + μ_{X} (y) - μ_{R_{P}} (x, y), 1 \\ + ν_{R_{P}} (x, y) - ν_{X} (y)), \\ max (0, ν_{X} (y) - ν_{R_{P}} (x, y)) 〉) \end{matrix}$

Now, the intuitionistic fuzzy positive region can be defined by: $\begin{matrix} 〈 μ_{{POS}_{R_{P}} (Q)} (x), υ_{{POS}_{R_{P}} (Q)} (x) 〉 \\ = sup_{X \in U / Q} 〈 μ_{\underline{R_{P}} X} (x), υ_{\underline{R_{P}} X} (x) 〉 \end{matrix}$ where Q is the set of decision attributes.

Any object does not belong to the positive region, only if the equivalence class, it belongs to, is not an element of the positive region.

Therefore, the degree of dependency can be defined by [63]: $γ_{P}^{″} (Q) = \frac{\sum_{x \in U} \frac{1 + μ_{P O S_{R_{P}} (Q)} (x) - υ_{P O S_{R_{P}} (Q)} (x)}{2}}{| U |},$ where, |U| = CardinalityofU.

Now, we can calculate the reduct set by using concepts from [35].

5 Algorithm for reduct computation

Now, we present an algorithm to calculate the reduct of a high-dimensional information system based on our above proposed technique.

Intuitionistic FuzzyRoughQuick Reduct (C,D)

Input: C, Collection of all conditional features;

D, Collection of all decision features

Output: L,the feature subset

1. L ← { }, $γ_{best}^{″}$ ← 0, $γ_{prev}^{″}$ ← 0

2. do

3. K ← L

4. $γ_{prev}^{″}$ ← $γ_{best}^{″}$

5. ∀x ∈ (C - L)

6. if $γ_{L \cup {x}}^{″}$ (D) > (D)

7. K ← L ∪ { x}

8. $γ_{best}^{″}$ ← (D)

9. L ← K

10. until $γ_{best}^{″}$ = = $γ_{prev}^{″}$ OR $(γ_{best}^{″} - γ_{prev}^{″}) \leq \in$

11. return L

6 Application example

In the current study, the performance evaluation of the proposed intuitionistic fuzzy-rough feature selection is widely studied and compared with existing fuzzy-rough feature selection. All the algorithms are implemented in WEKA [49]. Fuzzification of the conditional features is performed using a simple algorithm, which generates membership function for every conditional feature. Moreover, Fuzzy information system is converted into intuitionistic fuzzy information system by using Jurio et al. [27] concept and hesitancy is selected as the minimum value apart from zero in the corresponding fuzzy information table. Hill climbing is used as the search method.

We use following experimental setup in order to perform our experimental results:

6.1 Dataset

We select eleven benchmark datasets from the University of California, Irvine, Machine Learning Repository [28] and another Mass Spectrometry dataset from Lu et al. [50] to represent the performance of our proposed approach. The details of these datasets are mentioned in Table 1. The dimension of the datasets indicates that these are small to medium size datasets as instances and attributes are ranging from 214 to 6118 and 6 to 51 respectively.

Table 1
Dataset characteristics and Reduct size

Dataset Instances Attributes Reduct size

FRFS IFRFS

Column 2C 310 6 6 5

Glass 214 9 9 7

Ionosphere 351 34 7 14

Abalone-3 class 4177 8 8 6

Cardiotography 2126 35 14 18

Qsar-biodegradation 1055 41 22 18

Page Blocks 5473 10 10 5

Segment 2310 19 16 10

Mass Spectrometry 4023 22 21 13

KR-VS-KP 3196 36 31 23

First-order-theorem 6118 51 51 22

Chronic-kidney-disease 400 24 14 10

Dataset	Instances	Attributes	Reduct size
Column 2C	310	6	6	5
Glass	214	9	9	7
Ionosphere	351	34	7	14
Abalone-3 class	4177	8	8	6
Cardiotography	2126	35	14	18
Qsar-biodegradation	1055	41	22	18
Page Blocks	5473	10	10	5
Segment	2310	19	16	10
Mass Spectrometry	4023	22	21	13
KR-VS-KP	3196	36	31	23
First-order-theorem	6118	51	51	22
Chronic-kidney-disease	400	24	14	10

6.2 Classifiers

Seven different classifiers namely: Naive Bayes (NB), Multilayer Perceptron (MLP) [53], Rotation Forest (ROF) [52], Support Vector Machines with sequential minimization optimization (SMO) [51], Nearest Neighbor method (IBK), Random Forest (RF) [47] and PART [48] from different families of machine learning algorithms available in WEKA are employed to demonstrate the average percentage classification accuracies on reduced data sets. Furthermore, we perform a comparative study of intuitionistic fuzzy-rough feature selection with the previously existing fuzzy-rough feature selection based on the change in overall accuracies of different classifiers along with standard deviation for reduced datasets obtained by using both the techniques.

6.3 Dataset split

We apply 10-fold cross validation on the twelve benchmark datasets during the process of classification. Dataset is randomly divided into ten parts, and nine of them are used for training purpose and rest one as testing set. After completion of ten rounds, average value is calculated as the final performance measure.

6.4 Bioinformatics dataset

In this section, we have selected the protein data set of Lu et al. [50] for our experiments. This data set was generated out of the forty most abundant proteins from the shotgun sequencing of the yeast proteome and it contains about 3309 unobserved peptides and 714 observed peptides. The ratio of positive (observed) to negative (unobserved) class is 1:4.6. So, the data set is an imbalance data set where the positive class has become the minority class as the number of the observed peptide is less than that of the number of the unobserved peptide. The data set can be downloaded from the web page - http://www.nature.com/nbt/journal/v25/n1/suppinfo/nbt1270_S1.html.

6.5 Performance evaluation metrics

The prediction performance of the six machine learning algorithms are evaluated relatively using threshold-dependent and threshold-independent parameters. The values of these parameters are calculated from the values of the confusion matrix, namely: true positives (TP) that is the number of correctly predicted observed peptides, true negatives (TN) that is the number of correctly predicted unobserved peptides, false negatives (FN) that is the number of incorrectly predicted observed peptides and false positives (FP) that is the number of incorrectly predicted unobserved peptides.

Sensitivity : It provides the percentage of correctly predicted observed peptides and is represented as $Sensitiviy = \frac{TP}{(TP + FN)} \times 100$ (6)

Specificity : It provides the percentage of correctly predicted unobserved peptides and is represented by $Specificity = \frac{TN}{(TN + FP)} \times 100$ (7)

Accuracy : It is the percentage of correctly predicted observed and unobserved peptides, which can be calculated as $Accuracy = \frac{TP + TN}{TP + FP + TN + FN} \times 100$ (8)

AUC : It gives the area under the receiver operating characteristic curve (ROC), the nearer its value to 1, the superior the predictor. This metric is considered as one of the evaluation metrics which are robust to the imbalance nature of the data sets [36].

MCC : It is known as Mathew’s correlation coefficient, which is calculated by using the equation as follows: $MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}}$ (9)

It is broadly used as performance parameter for binary classifications. An MCC value of 1 is defined as the best for a predictor.

In this study, entire experiment is performed using open source Java based machine learning platform WEKA [49].

6.6 Experimental results

Table 1 represents the dimension of the datasets and the size of the reduct produced by fuzzy rough feature selection (FRFS) as well as IFRFS. It can be observed that our proposed method generally provides smaller subsets than already existing FRFS approach. IFRFS generally provides highly reduced datasets compare to FRFS in case of high dimensional datasets. Many researchers have experimentally evaluated FRFS with other leading feature selection approaches and have revealed to outperform these in form of better classification performance. So, we have compared our proposed method to FRFS approach only. Tables 2 and 3 show average classification accuracy with standard deviation in terms of percentage, which was obtained by using 10-fold cross validation for reduced datasets by FRFS as well as IFRFS methods. It can be observed that IFRFS has either improved or remained the same to overall classification accuracies for most of the datasets with more reduced dimension. For some of the datasets, such as Ionosphere and Cardiotography, IFRFS provides less reduced datasets than FRFS but classification accuracy is found to be high. Hence, IFRFS gives more accurate reduct [35].

Table 2
Comparison of classification accuracies of reduced dataset by FRFS

Dataset Naive Bayes MLP SMO IBK Rotation Forest PART RF

Column 2C 77.74±7.04 84.52±6.77 78.71±8.36 81.61±5.70 86.45±6.23 80.97±4.92 82.84±6.28

Glass 48.59±6.55 67.73±7.32 56.13±7.27 70.50±7.27 73.79±6.81 68.14±7.27 79.89±5.03

Ionosphere 89.76±3.52 92.03±2.41 84.05±5.89 89.76±4.00 92.90±3.54 91.75±3.85 92.61±4.01

Abalone-3 class 57.70±2.25 66.27±3.01 62.77±2.86 57.94±2.50 64.97±2.69 61.34±2.74 64.23±1.96

Cardiotography 88.33±2.66 98.50±1.06 97.17±0.94 98.21±0.94 98.26±1.36 97.98±0.97 98.54±1.16

Qsarbiodegradation 72.71±6.03 85.79±3.30 82.85±4.20 83.32±2.97 86.25±1.52 82.37±3.73 87.20±1.75

Page-Blocks 90.85±1.59 96.24±0.72 92.93±0.86 96.02±0.60 97.70±0.41 97.00±0.59 97.53±0.42

Segment 79.83±1.85 95.97±1.62 92.90±1.72 97.49±0.86 97.97±0.82 96.45±0.76 97.79±0.43

Mass Spectrometry 72.81±2.41 86.11±2.39 82.13±0.33 78.28±1.57 87.89±1.58 86.40±1.50 88.42±1.45

KR-VS-KP 88.71±1.89 99.34±0.65 96.31±1.13 96.84±0.98 99.31±0.48 99.12±0.57 99.25±0.40

First-order-theorem 17.18±1.14 53.12±1.34 44.97±0.66 58.34±1.46 61.67±1.48 55.592.12 62.85±1.41

Chronic-kidney-disease 94.50±5.37 96.50±2.42 97.75±1.84 97.75±2.49 99.50±1.05 98.50±2.42 100.0±0.00

Dataset	Naive Bayes	MLP	SMO	IBK	Rotation Forest	PART	RF
Column 2C	77.74±7.04	84.52±6.77	78.71±8.36	81.61±5.70	86.45±6.23	80.97±4.92	82.84±6.28
Glass	48.59±6.55	67.73±7.32	56.13±7.27	70.50±7.27	73.79±6.81	68.14±7.27	79.89±5.03
Ionosphere	89.76±3.52	92.03±2.41	84.05±5.89	89.76±4.00	92.90±3.54	91.75±3.85	92.61±4.01
Abalone-3 class	57.70±2.25	66.27±3.01	62.77±2.86	57.94±2.50	64.97±2.69	61.34±2.74	64.23±1.96
Cardiotography	88.33±2.66	98.50±1.06	97.17±0.94	98.21±0.94	98.26±1.36	97.98±0.97	98.54±1.16
Qsarbiodegradation	72.71±6.03	85.79±3.30	82.85±4.20	83.32±2.97	86.25±1.52	82.37±3.73	87.20±1.75
Page-Blocks	90.85±1.59	96.24±0.72	92.93±0.86	96.02±0.60	97.70±0.41	97.00±0.59	97.53±0.42
Segment	79.83±1.85	95.97±1.62	92.90±1.72	97.49±0.86	97.97±0.82	96.45±0.76	97.79±0.43
Mass Spectrometry	72.81±2.41	86.11±2.39	82.13±0.33	78.28±1.57	87.89±1.58	86.40±1.50	88.42±1.45
KR-VS-KP	88.71±1.89	99.34±0.65	96.31±1.13	96.84±0.98	99.31±0.48	99.12±0.57	99.25±0.40
First-order-theorem	17.18±1.14	53.12±1.34	44.97±0.66	58.34±1.46	61.67±1.48	55.592.12	62.85±1.41
Chronic-kidney-disease	94.50±5.37	96.50±2.42	97.75±1.84	97.75±2.49	99.50±1.05	98.50±2.42	100.0±0.00

Table 3

Comparison of classification accuracies of reduced dataset by IFRFS

Dataset	Naive Bayes	MLP	SMO	IBK	Rotation Forest	PART	RF
Column 2C	77.74±7.04	84.52±6.77	78.71±8.36	81.61±5.70	86.45±6.23	80.97±4.92	84.84±6.28
Glass	48.59±6.55	67.73±7.32	56.13±7.27	70.50±7.27	73.79±6.81	68.14±7.21	79.89±5.03
Ionosphere	90.31±4.46	93.16±4.05	88.32±3.65	90.88±3.52	94.59±2.84	92.02±3.48	93.73±3.23
Abalone	57.22±2.77	65.53±2.88	62.51±2.61	57.77±2.42	65.55±2.50	61.53±2.55	63.97±2.32
Cardiotography	94.03±1.73	99.20±0.63	98.40±0.81	98.87±0.74	99.01±0.71	98.45±1.02	98.82±0.71
Qsarbiodegradation	76.50±4.47	85.78±3.66	83.32±3.27	82.38±3.02	86.26±2.75	81.33±3.86	85.97±2.73
Page-Blocks	87.06±2.06	95.60±0.75	92.34±0.63	95.10±0.52	96.67±0.72	96.62±0.77	97.02±0.70
Segment	88.57±1.87	95.89±1.49	92.64±1.65	97.27±0.65	97.97±0.65	96.54±1.00	98.31±0.06
Mass Spectrometry	79.62±1.71	86.38±1.68	82.18±0.22	78.75±2.24	87.50±1.27	85.70±0.74	87.80±1.38
KR-VS-KP	91.36±0.29	98.62±0.10	95.87±0.20	98.37±0.12	98.34±0.13	98.59±0.11	98.84±0.52
First-order-theorem	19.97±0.95	51.59±1.69	43.85±0.68	58.34±1.63	60.82±1.55	55.77±1.48	62.63±1.27
Chronic-kidney-disease	96.50±4.12	97.75±2.49	97.75±2.99	97.75±2.75	97.25±3.62	96.25±2.70	97.50±3.12

In section 6.4, we have conducted experiments with six different machine learning algorithms, namely: Naive Bayes (NB), Multilayer Perceptron (MLP), Nearest Neighbor method (IBK), Rotation Forest (ROF), Random Forest (RF), PART. The values of different performance metrics for the six classifiers trained on the imbalanced training data set are recorded in the Tables 4 and 5 for reduced datasets by using FRFS as well as IFRFS.

Table 4

Performance metrics for different classifiers using reduced dataset by FRFS

Learning Algorithms	Imbalanced Dataset
	Sensitivity	Specificity	Accuracy	AUC	MCC
Naive Bayes	69.20	73.60	72.80	0.800	0.345
MLP	52.00	93.40	86.00	0.858	0.489
IBK	32.40	88.20	78.30	0.617	0.217
Rotation Forest	45.00	97.20	87.89	0.874	0.529
PART	47.50	94.80	86.40	0.852	0.485
RF	46.10	97.60	88.41	0.886	0.551

Table 5

Performance metrics for different classifiers using reduced dataset by IFRFS

Learning Algorithms	Imbalanced Dataset
Sensitivity	Specificity	Accuracy	AUC	MCC
Naive Bayes	63.00	87.50	79.61	0.790	0.304
MLP	51.10	94.00	86.37	0.855	0.496
IBK	33.60	88.50	78.70	0.618	0.234
Rotation Forest	46.40	97.20	87.50	0.872	0.509
PART	48.30	95.10	85.70	0.843	0.447
RF	46.80	96.60	87.80	0.878	0.529

By observing the performance evaluation metrics, it is obvious that the values of different performance parameters are similar or sometimes better also for reduced dataset by using IFRFS than FRFS. Since, IFRFS provides highly reduced dataset compared to FRFS, hence, IFRFS enhances interpretability also.

By observing the performance evaluation metrics, the sensitivity, specificity, accuracy, auc and mcc values are either better or similar for all the six classifiers when these algorithms were trained on the reduced balanced training set as produced by IFRFS when compared to FRFS. This indicates that the average accuracy of the predicting negative class is predominant over the accuracy for predicting the positive class by using IFRFS. Based on the values of the performance evaluation parameters, it was observed that RF has performed much better than the other five machine learning algorithms.

A convenient way to observe the overall performance of different classifiers at different decision threshold is the Receiver Operating Characteristic (ROC) curve, which gives a visual representation of the classifier performance. The ROC curves by using different classifiers for reduced datasets by FRFS and IFRFS are given in Figs. 1 and 2 respectively. It can be seen that the performance of different classifiers are almost invariant or better in case of reduced dataset obtained by using IFRFS method than for reduced dataset produced by FRFS.

Fig.1

AUC of seven machine learning algorithms for Reduced Dataset by FRFS.

Fig.2

AUC of seven machine learning algorithms for Reduced Dataset by IFRFS.

7 Conclusion

In this paper, an intuitionistic fuzzy rough set assisted attribute selection technique is proposed, which can be applied on high-dimensional data sets in order to get more reduced data sets with higher classification accuracy. A novel pair of lower and upper approximations has been proposed by combining intuitionistic fuzzy set and rough set. Furthermore, a novel feature selection approach has been developed by using lower approximation expression. Moreover, we have given an appropriate algorithm based on our proposed concept. Finally, our concept is applied on twelve benchmark datasets and it is observed that our proposed method provided more reduced datasets andbetter classification accuracies than by already existing FRFS technique. In this work, the hesitancy is calculated from the information available in the dataset itself.

In the future, we can propose more accurate conversion of fuzzy information system into intuitionistic fuzzy information system by defining the hesitancy available in data. Moreover, we can define some more accurate intuitionistic fuzzy rough set models like type-2 intuitionistic fuzzy rough set model and probabilistic variable precision intuitionistic fuzzy rough set model which can tackle uncertainty in much better way.

References

Webb

A.R.

, Statistical pattern recognition, John Wiley & Sons, 2003.

Jain

A.K.

, Duin

R.P.W.

and Mao

, Statistical pattern recognition: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence22 (2000), 4–37.

Kwak

and Choi

C.-H.

, Input feature selection by mutual information based on Parzen window, IEEE Transactions on Pattern Analysis and Machine Intelligence24 (2002), 1667–1671.

Langley

, Selection of relevant features in machine learning, in Proceedings of the AAAI Fall Symposium on Relevance, 1994, pp. 245–271.

Kohavi

and John

G.H.

, Wrappers for feature subset selection, Artificial Intelligence97 (1997), 273–324.

Iannarilli

F.J.

and Rubin

P.A.

, Feature selection for multiclass discrimination via mixed-integer linear programming, IEEE Transactions on Pattern Analysis and Machine Intelligence25 (2003), 779–783.

Jäger

, Sengupta

and Ruzzo

W.L.

, Improved gene selection for classification of microarrays, in Pacific Symposium on Biocomputing, 2002, pp. 53–64.

Xiong

, Fang

and Zhao

, Biomarker identification by feature wrappers, Genome Research11 (2001), 1878–1887.

Xing

E.P.

, Jordan

M.I.

and Karp

R.M.

, Feature selection for high-dimensional genomic microarray data, in ICML, 2001, pp. 601–608.

10.

Ding

and Peng

, Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology3 (2005), 185–205.

11.

Kumar

, Vadakkepat

and Poh

L.A.

, Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets, Applied Soft Computing11 (2011), 3429–3440.

12.

Carreira-Perpinan

M.A.

, Continuous latent variable models for dimensionality reduction and sequential data reconstruction, University of Sheffield UK, 2001.

13.

Pawlak

, Rough sets, International Journal of Parallel Programming11 (1982), 341–356.

14.

Pawlak

, Grzymala-Busse

, Slowinski

and Ziarko

, Rough sets, Communications of the ACM38 (1995), 88–95.

15.

Pawlak

, Rough sets: Theoretical aspects of reasoning about datavol. 9: Springer Science & Business Media, 2012.

16.

Dubois

and Prade

, Putting rough sets and fuzzy sets together, in Intelligent Decision Support, ed: Springer, 1992, pp. 203–232.

17.

Dubois

and Prade

, Rough fuzzy sets and fuzzy rough sets, International Journal of General System17 (1990), 191–209.

18.

Zadeh

L.A.

, Fuzzy sets, Information and Control8 (1965), 338–353.

19.

Klir

, Yuan

, Fuzzy sets and fuzzy logic, Prentice hall New Jersey, vol. 4, 1995.

20.

Atanassov

K.T.

, More on intuitionistic fuzzy sets, Fuzzy Sets and Systems33 (1989), 37–45.

21.

Atanassov

K.T.

, Intuitionistic fuzzy sets, Fuzzy Sets and Systems20 (1986), 87–96.

22.

Atanssov

, Intuitionistic fuzzy sets: Theory and applications, Studies in fuzziness and soft computing, Heidelberg, New York, Physicaverl, 1999.

23.

Chakrabarty

, Gedeon

and Koczy

, Intuitionistic fuzzy rough set, in Proceedings of 4th Joint Conference on Information Sciences (JCIS), Durham, NC, 1998, pp. 211–214.

24.

Cornelis

, De Cock

and Kerre

E.E.

, Intuitionistic fuzzy rough sets: At the crossroads of imperfect knowledge, Expert Systems20 (2003), 260–270.

25.

Zhang

, Zhou

and Li

, A general frame for intuitionistic fuzzy rough sets, Information Sciences216 (2012), 34–49.

26.

Huang

, Zhuang

Y.-L.

, Li

H.-X.

and Wei

D.-K.

, A dominance intuitionistic fuzzy-rough set approach and its applications, Applied Mathematical Modelling37 (2013), 7128–7141.

27.

Jurio

, Paternain

, Bustince

, Guerra

and Beliakov

, A construction method of Atanassov’s intuitionistic fuzzy sets for image processing, in Intelligent Systems (IS), 2010 5th IEEE International Conference, 2010, pp, 337–342.

28.

Blake

C.L.

and Merz

C.J.

, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, vol. 55, 1998.

29.

Chen

, Hu

and Yang

, Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets, Information Sciences181 (2011), 5169–5179.

30.

Chen

, Kwong

, He

and Wang

, Geometrical interpretation and applications of membership functions with fuzzy rough sets, Fuzzy Sets and Systems193 (2012), 122–135.

31.

Chen

, Zhang

, Zhao

, Hu

and Zhu

, A novel algorithm for finding reducts with fuzzy rough sets, IEEE Transactions on Fuzzy Systems20 (2012), 385–389.

32.

Degang

and Suyun

, Local reduction of decision system with fuzzy rough sets, Fuzzy Sets and Systems161 (2010), 1871–1883.

33.

, Yu

and Xie

, Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters27 (2006), 414–423.

34.

, Zhang

, Chen

, Pedrycz

and Yu

, Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications, International Journal of Approximate Reasoning51 (2010), 453–471.

35.

Jensen

and Shen

, Fuzzy–rough attribute reduction with application to web categorization, Fuzzy Sets and Systems141 (2004), 469–485.

36.

Jensen

and Shen

, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Transactions on Knowledge and Data Engineering16 (2004), 1457–1471.

37.

Jensen

and Shen

, Fuzzy-rough data reduction with ant colony optimization, Fuzzy Sets and Systems149 (2005), 5–20.

38.

Jensen

and Shen

, Fuzzy-rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems15 (2007), 73–89.

39.

Jensen

and Shen

, Computational intelligence and feature selection: Rough and fuzzy approaches, John Wiley & Sons, 2008.

40.

Jensen

and Shen

, New approaches to fuzzy-rough feature selection, IEEE Transactions on Fuzzy Systems17 (2009), 824–838.

41.

Tsang

E.C.

, Chen

, Yeung

D.S.

, Wang

X.-Z.

and Lee

J.W.

, Attributes reduction using fuzzy rough sets, IEEE Transactions on Fuzzy systems16 (2008), 1130–1141.

42.

Zhao

, Tsang

E.C.

and Chen

, The model of fuzzy variable precision rough sets, IEEE Transactions on Fuzzy Systems17 (2009), 451–467.

43.

Çoker

, Fuzzy rough sets are intuitionistic L-fuzzy sets, Fuzzy Sets and Systems96 (1998), 381–383.

44.

Nanda

and Majumdar

, Fuzzy rough sets, Fuzzy Sets and Systems45 (1992), 157–160.

45.

Rizvi

, Naqvi

H.J.

and Nadeem

, Rough Intuitionistic Fuzzy Sets, in JCIS, 2002, pp. 101–104.

46.

Samanta

and Mondal

, Intuitionistic fuzzy rough sets and rough intuitionistic fuzzy sets, Journal of Fuzzy Mathematics9 (2001), 561–582.

47.

Breiman

, Random forests, Machine Learning45 (2001), 5–32.

48.

Frank

and Witten

I.H.

, Generating accurate rule sets without global optimization, 1998.

49.

Hall

, Frank

, Holmes

, Pfahringer

, Reutemann

and Witten

I.H.

, The WEKA data mining software: An update, ACM SIGKDD Explorations Newsletter11 (2009), 10–18.

50.

, Vogel

, Wang

, Yao

and Marcotte

E.M.

, Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation, Nature Biotechnology25 (2007), 117.

51.

Platt

, Sequential minimal optimization: A fast algorithm for training support vector machines, 1998.

52.

Rodriguez

J.J.

, Kuncheva

L.I.

and Alonso

C.J.

, Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence28 (2006), 1619–1630.

53.

Ruck

D.W.

, Rogers

S.K.

, Kabrisky

, Oxley

M.E.

and Suter

B.W.

, The multilayer perceptron as an approximation to a Bayes optimal discriminant function, IEEE Transactions on Neural Networks1 (1990), 296–298.

54.

S.K.

, Biswas

and Roy

A.R.

, Intuitionistic fuzzy database, in pp, Second International Conference on IFS, NIFS (1998), 43–31.

55.

Jena

, Ghosh

and Tripathy

, Intuitionistic fuzzy rough sets, Notes on Intuitionistic Fuzzy Sets8 (2002), 1–18.

56.

Iancu

, Intuitionistic fuzzy similarity measures based on Frank t-norms family, Pattern Recognition Letters42 (2014), 128–136.

57.

Y.-L.

, Lei

Y.-J.

and Hua

J.-X.

, Attribute reduction based on intuitionistic fuzzy rough set [J], Control and Decision3 (2009), 003.

58.

Chen

and Yang

, One new algorithm for intuitiontistic fuzzy-rough attribute reduction, Journal of Chinese Computer Systems32(3) (2011), 506–510.

59.

Esmail

, Maryam

and Habibolla

, Rough set theory for the intuitionistic fuzzy information, Systems International Journal of Modern Mathematical Sciences6(3) (2013), 132–143.

60.

Huang

, Li

H.-X.

and Wei

D.-K.

, Dominance-based rough set model in intuitionistic fuzzy information systems, Knowledge-Based Systems28 (2012), 115–123.

61.

Zhang

, Attributes reduction based on intuitionistic fuzzy rough sets, Journal of Intelligent & Fuzzy Systems30(2), 1127–1137.

62.

Shreevastava

, Tiwari

A.K.

and Som

, Intuitionistic fuzzy neighborhood rough set model for feature selection, International Journal of Fuzzy System Applications (IJFSA)7(2) (2018), 75–84.

63.

Tiwari

A.K.

, Shreevastava

, Shukla

K.K.

and Subbiah

, New approaches to intuitionistic fuzzy-rough attribute reduction, Journal of Intelligent & Fuzzy Systems34(5) (2018), 3385–3394.

64.

Tiwari

A.K.

, Shreevastava

, Som

and Shukla

K.K.

, Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction, Expert Systems with Applications101 (2018), 205–212.