Abstract
Technological advancement in the area of computing has led to production of huge amount of structured as well as unstructured data. This high dimensional data is very complex to process. Feature selection is one of the widely used techniques for preprocessing of this huge data in predictive analytics. Rough set based feature selection is an approach for handling the vagueness in data and works fine on discrete data but struggles in the continuous case as it requires discretization. This process of discretization leads to information loss. Solution for this problem was given by various authors in form of fuzzy rough set as well as intuitionistic fuzzy rough set based approaches for feature selection. Intuitionistic fuzzy set has certain benefits over the theory of traditional fuzzy sets such as its ability in a better expression of underlying information as well as its aptness to recite fragile ambiguities of the uncertainty of the objective world. The benefits offered by Intuitionistic fuzzy sets is due to the concurrent contemplation of positive, negative and hesitancy degrees for an object to belong to a set. In this paper, three novel approaches of feature reduction based on intuitionistic fuzzy rough set are presented. For this, a new intuitionistic fuzzy rough set model is established by defining a pair of lower and upper approximations. Furthermore, three new approaches of feature selection based on the degree of dependency by using score function, membership grade and cardinality of intuitionistic fuzzy numbers are introduced. Moreover, the basic results on lower and upper approximations based on rough sets are extended for intuitionistic fuzzy rough sets and analogous results are established. Moreover, a suitable algorithm is given based on our proposed approaches. Finally, the proposed algorithm is applied to an arbitrary example data set and comparison has been made with the previous fuzzy rough set based technique. The proposed algorithm is found to be better performing in terms of selected features.
Keywords
Introduction
Feature selection or feature reduction is one of the key issues encountered in data mining, signal processing and bioinformatics [11, 30]. The leading objective of this function is to retain the optimum prominent characteristics required for the pattern recognition process and to reduce the dimensionality space so that low complexity algorithms can be formulated for efficient classification [10]. Feature selection describes the problem of specifying those input features that are most prognostic of a given outcome. Unlike other feature reduction methods [13], feature selectors preserve the indigenous meaning of the features after reduction. It has applications in pattern recognition, which involves datasets containing very large number of features.
Although, it might be anticipated that the inclusion of a large number of features would increase the likelihood of encoding enough information to differentiate between classes, however, it is usually found that high-dimensional data sets increase the possibilities that a data mining algorithm will find bogus patterns that are invalid in general [21, 22]. Almost all techniques require some degree of reduction in order to cope with huge amounts of data. So, it is necessary to use an efficient and productive reduction method.
Pattern processing with insufficient or incomplete and uncertain knowledge is the center of major research in computational intelligence and cognitive sciences. Rough set theory (1982) [27] (proposed by Pawlak) provides one of the most natural approaches for modeling knowledge [28]. The rough set theory is an extension of conventional crisp set theory. The core in the rough set theory is “Indiscernibility Relation”, which is an equivalence relation. With the help of indiscernibility relation, the universe of discourse can be split into various indiscernibility classes. These classes are fundamental concepts for building of lower and upper approximations. Most of the information systems consist of real-valued data and rough set theory was not having the ability to tackle these types of data sets directly. Therefore, several discretization methods were applied in order to transform the data in discrete form. This may result in information loss. Moreover, there was no idea of handling noisy data directly.
Fuzzy rough feature selection is a way through which discrete or real-valued noisy data (or combination of both) can be reduced without the requirement of user-supplied information. Moreover, this method is frequently used to data with continuous or nominal decision features and intrinsically can be implemented to regression as well as classification problems.
As crisp equivalence classes are core to rough set, fuzzy rough sets are based on fuzzy equivalence classes. Fuzzy rough set was proposed by generalization of crisp rough set [25]. Dubois and Prade [12] combined fuzzy set (proposed by Zadeh [35]) and rough set (proposed by Pawlak). Fuzzy rough set was generalized by Richard Jensen et al. [17, 18] for feature selection.
In Intuitionistic fuzzy set [1, 2], an extension of Zadeh’s fuzzy set, positive, negative and hesitancy degrees can be applied simultaneously for an object related to a set. So, it has much stronger ability to deal with information system and draw a better glimpse of fragile ambiguities of the objective world. Intuitionistic fuzzy set concept has already been used in pattern recognition and decision making [3, 32–34, 36].
Jena et al. [16], Chakrabarty et al. [4] and Nanda et al. [26] combined intuitionistic fuzzy set and rough set and proposed “Intuitionistic Fuzzy Rough Set” by using different approaches. Coker [8] added that a fuzzy rough set is in fact an intuitionistic L-fuzzy set. However, very few researchers are working in the area of intuitionistic fuzzy rough set based feature selection. Lu, Lei, & Hua [24] proposed the genetic algorithm for attribute reduction of intuitionistic fuzzy information system (IFIS). Chen & Yang [5] introduced a new attribute reduction method by combining intuitionistic fuzzy and rough sets with information entropy. Huang, Li, & Wei [15] presented an intuitionistic fuzzy rough set based attribute reduction model by using the concept of distance measure. Esmail, Maryam, & Habibolla [14] designed their method of attribute reduction by considering structure of intuitionistic fuzzy rough set model and its properties along with rule extraction. Z. Zhang [37] proposed an attribute reduction method by using the concept of discernibility matrix. However, none of the proposed approaches for feature selection is based on dependency function by considering similarity between two objects.
In this paper, we propose a novel feature selection method by combining intuitionistic fuzzy set and rough set. First, a new intuitionistic fuzzy rough set model is established by defining a pair of lower and upper approximations based on T-equivalence relation. Second, we define three new approaches to calculate degree of dependency based on score function, membership grade and cardinality of intuitionistic fuzzy numbers. Third, validity of the model analogous to rough set theory is established. Fourth, a suitable algorithm for feature selection based on our approaches is developed. Finally, these approaches are applied on an arbitrary example dataset. We show that these techniques perform better than fuzzy rough set based approach by calculating the reduct.
We organize rest of the paper as follows: Section 2 briefly introduces preliminary concepts. In Section 3, we define a novel intuitionistic fuzzy rough set model and validate it. Moreover, three novel approaches for feature selection based on dependency function are introduced. In Section 4, an algorithm regarding our approaches is presented. In Section 5, an arbitrary example of fuzzy information system is taken and fuzzy rough set based feature selection technique is applied. Furthermore, this information system is converted into intuitionistic fuzzy information system by using Jurio et al. [20] concept. Moreover, our proposed methods are applied to calculate the reduct set. In Section 6, entire work is concluded.
Preliminaries
In this section, we discuss some basic definitions regarding intuitionistic fuzzy number (IFN) and IFIS.
The cardinality of L is defined by
〈m1, n1 〉 = 〈 m2, n2 〉 ⇔ m1 = m2 ∧ n1 = n2 〈m1, n1〉 ∩ 〈 m2, n2 〉 = 〈 min { m1, m2 }, max { n1, n2 } 〉 〈m1, n1〉 ∪ 〈 m2, n2 〉 = 〈 max { m1, m2 }, min { n1, n2 } 〉
Intuitionistic fuzzy rough set (IFRS) based approach for feature selection
In 1998, Chakrabarty et al. [4] proposed a method to construct an IFRS (U, V) of a rough set (A, B), where U and V are both intuitionistic fuzzy sets in X (non-empty set of objects) such that U ⊆ V, i. e. μ U ≤ μ V and υ U ≥ υ V . In this case, lower approximation U and upper approximation V are both intuitionistic fuzzy sets.
In 2001, Samanta and Mondal [31] proposed their method to define intuitionistic fuzzy rough set. They defined a couple (U, V) as intuitionistic fuzzy rough set such that U and V are both fuzzy rough sets (as proposed by Nanda and Majumdar [26]) and U ⊆ Complement (V). From [26], it is obvious that IFRS is a generalization of an intuitionistic fuzzy set, in which membership and non-membership functions are fuzzy rough sets.
In 2002, Rizvi et al. [29] reported their proposal as rough intuitionistic fuzzy set, which also contains hesitation margin on lower and upper approximations.
In 2003, Cornelis et al. [9] defined the lower and upper approximations of X ⊆ U (Universe of discourse) as follows:
All above proposed definitions do not consider memberships and non-memberships of individual objects to the approximations. From the literature [12, 19], we can define intuitionistic fuzzy lower and upper approximations by considering individual objects as follows:
Let IFIS be an intuitionistic fuzzy information system as defined in Section 2.3. Let P ⊆ J and X ⊆ U So, we can define lower and upper approximations of X over a set of features P by:
Taking intuitionistic fuzzy triangular norm T
w
and intuitionistic fuzzy implicator I
w
as follows [9]:
Now, we can redefine lower and upper approximations as follows:
Now, intuitionistic fuzzy positive region can be defined by:
Therefore, we can define degree of dependency in three ways based on membership grade, score function and cardinality by:
Every time we add one feature in the feature subset and calculate degree of dependency. When degree of dependency has no increment, process stops and we get the required reduct.
Let (U, C ∪ D, V, f) be an intuitionistic fuzzy information system (IFIS), where U = (≠ φ) is universe of discourse, C and D are finite sets of conditional and decision features respectively, V is the collection of all intuitionistic fuzzy values and f is called information function which is defined as f : U × C ∪ D → V, such that f (x, b) ∈ V, ∀ b ∈ C ∪ D, ∀ x ∈ U.
Where,
Now for y ∈ X, μ
X
(y) =1, υ
X
(y) =0
Using Equations (1–4), we get
Hence,
Now, for y ∈ X, μ
X
(y) =1, υ
X
(y) =0,
Since, P1 ⊆ P2, hence, μ P 1 (y) ≤ μ P 2 (y) and υ P 1 (y) ≥ υ P 2 (y).
From Equations (5–8) and along with above two conditions we get the required result.
Hence,
Now, for y ∈ X, μ
X
(y) =1, υ
X
(y) =0,
Since, P1 ⊆ P2, therefore, μ P 1 (y) ≤ μ P 2 (y) and υ P 1 (y) ≥ υ P 2 (y). From Equations (9–12) along with above two conditions, we get the required result.
Hence,
In this section, we present a quick reduct algorithm for feature selection. Algorithm starts with a null set and adds those attributes one by one, which provide greatest increase in degree of dependency of decision attribute over subset of conditional attributes until it gains highest possible value for any data set (it will be 1 in case of consistent system). This algorithm generates a close-to-minimal reduct of a decision system without exhaustively checking all possible subsets of conditional attributes, which is the main advantage of our proposed algorithm. The algorithm can be given as follows:
In the current algorithm, all the three proposed approaches are considered and it can be easily applied on any intuitionistic fuzzy information system to calculate the smallest possible reduct set. For a data set with dimension n, the worst case data set will result in (n2 + n)/2 evaluations of the dependency function in all the three cases.
Worked example
In order to illustrate our approach of intuitionistic fuzzy rough set based feature selection, a data set inspired from [19] is given in Table 1.
Fuzzy information system
Fuzzy information system
From Table 1, decision class can be given as follows:
Degree of dependencies of Q over A = {a}, B = {b}, C = {c}, D = {d}, E = {e} and F = {f} can be calculated using [31] as follows:
Since, feature b will cause the greatest increase in dependency degree. Hence, this feature is chosen and added to the potential reduct set.
Now, adding other features to potential reduct set we calculate degree of dependencies for {a, b}, {b, c}, {b, d}, {b, e}, {b, f}
On adding feature d to the reduct candidate causes the larger increase of degree of dependency. So, new reduct becomes {b, d}. This process iterates and other degrees of dependencies are:
It is obvious that by adding rest of the features with {b, d} cause no increase in degree of dependency, the algorithm stops and outputs the reduct {b, d}.
Now we convert the above fuzzy information system into intuitionistic fuzzy information system by using Jurio et al. [20] concept. The transformed information system is given in Table 2.
Intuitionistic fuzzy information system
From Table 2, decision class for intuitionistic fuzzy information system is given by:
Setting A = {a},
For the first decision equivalence class X ={ x1, x3, x6 } lower approximations of different objects can be calculated by using Section 3 as follows:
So, inf {〈 0.32, 0.48 〉, 〈 1, 0 〉, 〈 0.64, 0.16 〉, 〈 1, 0 〉, 〈 1, 0 〉, 〈 0.48, 0.32 〉} = 〈 0.32, 0.48 〉.
Therefore, lower approximations of each object can be calucated using section 3 as follows:
Similarly, for the second decision class X = { x2, x4, x5 }, lower approximations are:
Now, positive regions of every object are:
Now, we determine degree of dependencies using following three approaches: On the basis of membership grade:
Degree of dependency of Q upon A using membership function can be calculated as:
For B = {b}, C = {c}, D = {d}, E = {e} and F = {f}, other degree of dependencies are:
Since, degree of dependency of Q over B = {b} is largest. Hence, B is added to the reduct set. Now, On adding other features to the set {b} one by one, degree of dependencies of Q over {a, b}, {b, c}, {b, d}, {b, e}, {b, f} are:
Since, adding other features to the set {b} causes no increment in terms of degree of dependencies, hence algorithm terminates and we get {b} as reduct set. On the basis of score function:
Degree of dependency of Q upon A using score function can be calculated as:
Finding other degree of dependencies on the basis of “score function”, we get,
Since, B ={ b } causes the greatest effect on degree of dependency, hence, b will be a reduct candidate. Now, we add other features to the set {b} and get other degree of dependencies as follows:
Since there is no increment in degree of dependency, therefore, process stops and we get the reduct {b}.
Now, the value of degree of dependency of Q over A, B, C, D, E and F is determined using intuitionistic fuzzy cardinality as follows: On the basis of cardinality of an intuitionistic fuzzy set:
Degree of dependencies of Q over A = { a }, B = {b}, C = {c}, D = {d}, E = {e} and F = {f} by using cardinality of an intuitionistic fuzzy set are as follows:
Since, B = {b} causes the greatest effect on dependency, hence, b is added to the reduct set. On addition of other attributes to the set {b}, degree of dependencies of decision attribute over {a, b}, {b, c}, {b, d}, {b, e}, {b, f} are:
So, this approach also gives the same reduct {b}.
We observe that, in case of fuzzy rough set based approach the obtained reduct is {b, d} and after applying our proposed methods one by one, we get the same reduct set as {b}. It is obvious from the given example that our model works fine in order to find the smallest reduct set from a decision system. Moreover, our approach can perform better to handle uncertainty and noise available in the decision system by adjusting different types of intuitionistic fuzzy t-norms and implicators.
In this paper, we have presented three novel approaches for intuitionistic fuzzy rough set based feature selection by calculating degree of dependencies derived from membership grade, score function and cardinality of an intuitionistic fuzzy set. Moreover, the analogous results on lower and upper approximations have been established for intuitionistic fuzzy rough set. In this paper, the proposed algorithm has been applied to an example data set and comparison has been made with the previous fuzzy rough set based algorithm. We observed that our proposed method is superior to fuzzy rough set based approach as it considers membership, non-membership and hesitancy of an object in a set and produces the smallest reduct set of a decision system. The proposed work is capable to maintain the consistency of the system (a decision system is said to be consistent if degree of dependency of decision attribute over subset of conditional attributes is 1) as well as it can easily handle vagueness, uncertainty and noise available in data sets. Nowadays, the dimension of digital data is continuously growing with the enhancement of computer as well as database technology. Experts as well as machine learning processes on large volumes of high-dimensional data are main sources of knowledge. Knowledge extraction is an essential step in framing expert and intelligent systems. However, the knowledge extraction phase is very slow due to noise and large size of high-dimensional data. To enhance the productivity of learning, feature selection or attribute reduction plays a vital role in selection of predictive and non-redundant features to improve the performance of machine learning algorithms and interpretability of data. Many areas of real life applications like machine learning, image processing, data mining, natural language processing and bioinformatics, etc., which have high relevancy to expert and intelligent systems, are applications of feature selection. Our proposed approach handles uncertainty in much better way and produces the smallest possible reduct set as it is based on the concept of dependency function and it considers two important tools of uncertainty, i.e. intuitionistic fuzzy set and rough set, which is not considered till date.
In future, we intend to establish type-2 intuitionistic fuzzy rough set model and want to generalize it for feature selection. Some more accurate models like probabilistic variable precision intuitionistic fuzzy rough set model can be presented for feature reduction. Moreover, some more generalized methods for conversion of fuzzy information system into intuitionistic fuzzy information system can be presented so that these approaches can be implemented for real valued data sets.
