Feature genes selection based on fuzzy neighborhood conditional entropy

Abstract

For those key data in feature genes selection which the neighborhood of a sample is not completely contained in its decision equivalence class, most of existing models lack of advantages. Therefore, in this paper, we propose a new model to handle this problem: fuzzy neighborhood conditional entropy model. First, fuzzy neighborhood granule and fuzzy decision-making are introduced by using a parameterized fuzzy similarity relation between samples to depict the gene expression profile data more accurately. Then, we introduce the both into conditional entropy and propose the definitions of fuzzy neighborhood conditional entropy, whose strict proof of the monotonicity and other theorems are given. The strategy, which combines algebra definition with information theory definition about the importance of feature genes subset, makes the measurement mechanism more perfect. In the meantime, we set parameters and discuss the importance of its selection to tolerate the noise in the data. Finally, we employ the monotonicity principle of fuzzy neighborhood conditional entropy to evaluate the significance of a candidate feature gene, using which a feature genes selection algorithm is designed for proposed model. Comparing with the existing related algorithms through data sets selected from the public data sources, the experimental results show that the proposed algorithm selects relatively few feature genes and possess higher classification performance.

Keywords

1. Introduction

With the rapid development of science and technology, human beings’ life has been better and better. But at the same time, its side effects for human life have also become increasingly apparent, for instance, the rising incidence and mortality of cancers. Tumor, one of the major threats to human health, which has a great variety and complex pathogenesis, is a difficult research field. Therefore, the prevention and treatment of tumor have already become one of the focuses for researchers. Nowadays, machine learning and data mining have been applied extensively in the processing of gene expression profile, which, on the one hand, has played a significant role in the field of tumor generation and development, on the other hand, has a particularly important and practical significance in tumor molecular diagnosis [1].

Gene expression profile generates panoramic records of cells in gene expression messages, which can approximately embody the entire genome of a biological tissue and reflect the information of a large number of cells at different levels. In recent years, with the development of gene microarrays, the reliable acquisition of gene expression profiling data has become possible. However, the high dimensionality, small sample size, and noise redundancy of gene expression profiling data pose a challenge for data detection and feature selection [2]. From a biological point of view, there are only a few genes that are truly related to the sample category. Therefore, it is critical to select the most contributing features for the classification of the tumor through the correlation measure between the feature gene and the tumor sample during gene expression profiling.

At present, there are many research topics and solutions in feature genes selection. As a knowledge acquisition tool, rough set theory has been widely used in the analysis of gene expression profile and has achieved good results. The classical rough set model, proposed by Pawlak [3], has been successfully applied in feature selection [4-7]. However, the classical rough set can only deal with discrete data, so it is confronted with the problem of discretization of continuous attributes before the attribute reduction, which might lead to the loss of original data information. To solve this problem, neighborhood rough set and fuzzy rough set were proposed as two important models in succession, and have been constantly studied in depth. Lin generalized classical rough set and neighborhood operators [8]. Subsequently, Yao et al. studied the relationship between rough approximation operators and neighborhood operators and proposed the axiomatic property of the model [9]. Hu et al. constructed an attribute reduction algorithm based on the neighborhood rough set model, which took the positive domain as the heuristic information from the definition of algebra [10]. By using the neighborhood relations, the problem of classical rough set analysed above can be solved with the direct processing of continuous attributes. However, when the background is fuzzy, the fuzziness of the samples can not be accurately described. Dubois and Prade combined rough sets with fuzzy sets, defined the concept of fuzzy rough set [11]. Then, Jensen introduced the dependence function of classical rough set into fuzzy rough set, and presented a greedy algorithm for reducing redundant attributes [12]. Bhatt and Gopal proposed the concept of a compact computational domain, improving the computational efficiency greatly [13]. Hu et al. used kernel functions to define fuzzy similarity relation, and employed a greedy algorithm for dimensionality reduction [14]. For noise data analysis, Mieszkowicz Rolka constructed a variable-precision fuzzy rough set model to deal with noisy data [15]. However, among various fuzzy rough set models, the description of sample is usually performed by the calculation of the relationship with the nearest sample, so the existing of noise in the dataset may increase the risk of the calculation result. In view of the drawbacks analysed above, this paper adopt the combination of fuzzy and neighborhood concepts in the data characterization stage, describe the samples in the form of fuzzy neighborhood, construct the fuzzy decision-making based on the decision attributes, trying to keep the original information of the data as much as possible in the calculation process and reduce the misjudgment and the loss of data information.

The construction of the feature evaluation function plays a crucial role in the performance of the final selected feature subset for feature selection. In general, effective feature evaluation methods can improve the classification performance. Methods for measuring the quality of candidate subsets have been proposed, including those based on positive region and dependence [16-18], information volume [19] and information entropy [20-21], etc. At present, some scholars have integrated information entropy with rough sets, such as conditional entropy [21-22], mixed entropy [23] and so on. It has been mentioned in these papers [24-25] that the algebraic definition of attribute importance focuses on the significance of attributes on the determination of categorical subsets in the universe of discourse, while the definition of information theory considers the significance of attributes on the subset of uncertain classes in the universe of discourse. Both are highly complementary with each other, so the combination of the two will make the measurement mechanism more comprehensive. Chen et al. proposed an entropy measure based on a neighborhood rough set, which can process real-valued data and reduce the impact of data noise [26]. For mixed data sets, Zhang et al. proposed a fuzzy rough set information entropy in feature selection [27]. The neighborhood similarity is used to approximately describe the decision equivalence class [18]. The fuzzy relation can express the original information of the data more accurately. In the analysis of gene expression profiling data, there always be such data that the neighborhood of the sample is not completely contained in the sample decision equivalence class, which is often the key to improve the classification performance of the subset of selected feature genes. Therefore, aiming at this problem, basing on the fuzzy neighborhoods studied in paper of C.Wang [18], we redefine it and introduce it into the conditional entropy. Thus, A new model is proposed, which is named the fuzzy neighborhood conditional entropy model (FNCE).

This model combines the advantages of neighborhood rough set with fuzzy rough set, which makes the data more clearly described, maximizes the use of data information contained in the computation of conditional entropy, and makes the research on the uncertainty of the data more accurate. In this model, the fuzzy similarity relation which is employed in the feature space to parameterize the degree of association between the samples, combines with the sample’s decision equivalence class to obtain the fuzzy decision-making. Meanwhile, the fuzzy neighborhood granule of sample is described by using the neighborhood radius. The intersection between them is used to calculate the conditional entropy of the corresponding attribute, and a new model is proposed. Then, we define the significance between feature genes and decision and design an algorithm for feature genes selection. In order to get fewer and more effective feature genes subset, we introduce a variable-precision method to tolerate noises in data. Ultimately, the experimental results, which are obtained by the designed algorithm, demonstrate the effectiveness of the proposed model.

The remainder of this paper is organized as follows. In the second section, we review some basic theoretical knowledge about fuzzy and neighborhood. In the third section, we establish a new model: fuzzy neighborhood conditional entropy model, and design a feature genes selection algorithm for this model. In the fourth section, we verify the effectiveness of the proposed algorithm by comparing the experimental results with other similar algorithms. The fifth section is a summary of our study.

2. Related theoretical knowledge

In this section, we briefly review some basic concepts in rough set theory such as equivalence relation, neighborhood relation and fuzzy relation.

During data processing, the classical rough set uses the equivalence relation to granulate the domain into several equivalence classes. Its measurement method for data is mainly based on equivalence relation, so it is only suitable for processing data with discrete attributes.

Definition 1. [3] Let DIS=<U, A, V, f> be a decision information system. U={x₁, x₂, …, x_n} is nonempty finite set, which called the universe of discourse. A=C ∪ D, C ∩ D=∅, where C is the conditional attribute of the sample, D is the decision attribute of the sample. V= $⋃_{a \in A} V_{a}$ , V_a represents the range of the attribute a, f (x, a)∈V_a indicates that each sample has a corresponding value for each attribute.

The neighborhood rough set uses the neighborhood relation to granulate the domain mainly by neighborhood radius. It solves some information loss caused by discretized data and can effectively process numerical data.

Definition 2. [10] Let <U, L> be a non-empty metric space. U={x₁, x₂, …, x_n}, L is a measure of space. For ∀x_i∈U, its neighborhood is denoted as ɛ (x_i)={x|x∈U, L (x, x_i)≤ɛ}, and ɛ≥0. It can be seen from the property of measurement: (1) ɛ (x_i)≠∅; (2) x_j∈ɛ (x_i) ⇒ x_i∈ɛ (x_j); $(3) \cup_{i = 1}^{n} ε (x_{i}) U .$ =U.

In the fuzzy rough set, the fuzzy relation can describe the degree of association between elements and parameterize the relationship between them. It is a relatively basic concept in fuzzy sets.

Definition 3. [11] Given a set of attributes A, the universe of discourse U={x₁, x₂, …, x_n}. Let B⊆A be a subset of attributes on U that can induce binary fuzzy relation R_B. R_B is a fuzzy similarity relation on U if it satisfies the following relationship: $reflexivity : R_{B} (x, x) = 1, \forall x \in U$ (1) $symmetry : R_{B} (x, y) = R_{B} (y, x), \forall x, y \in U$ (2)

Therefore, for ∀x⊆U, the fuzzy neighborhood of sample x with respect to R_B can be expressed as: [x] _{R
_B}=R_B (x, y) , y∈U, which is a fuzzy set on U.

3. Fuzzy neighborhood conditional entropy model

In this section, in order to reduce the loss of the original data information during the calculation process, a parameterized fuzzy similarity relation is introduced to characterize fuzzy neighborhood granule and fuzzy decision-making for analysis of gene expression profile data. Then, the definitions of fuzzy neighborhood conditional entropy are proposed, and its strict proof of the monotonicity and other theorems are given. Ultimately, based on proposed model, we design a feature genes selection algorithm.

3.1 Fuzzy neighborhood granule and fuzzy decision-making

By using fuzzy similarity relation, the degree of association between the samples is parameterized to form a relation matrix that satisfies the reflexivity and symmetry. Each row or column represents the relevance value of one sample to the remaining samples. In order to avoid the impact of data noise, we need to set a parameter α as the neighborhood radius to describe the similarity of the samples, which is beneficial to the study of feature genes classification under different information granularity.

Definition 4. Let <U, A, D> be a decision information system, U={x₁, x₂, …, x_n}, B⊆A, a∈A, R_a is a fuzzy similarity relation on U. For ∀x, y ⊆ U, its fuzzy similarity matrix is defined as: [x] _a (y)=R_a (x, y), the fuzzy similarity matrix based on B is defined as: $[x]_{B} (y) = min_{b \in B} ([x]_{b} (y)), (\forall b$ ∈B).

Definition 5. [18] Let <U, A, D> be a decision information system. For ∀B⊆A, its parameterized fuzzy neighborhood granule is constructed as: $α_{B} (x) = [x]_{B}^{α} (y) = {\begin{matrix} R_{B} (x, y), R_{B} (x, y) \geq α \\ 0, R_{B} (x, y) < α \end{matrix}$ (3)

When it satisfies that α₁≤α₂, for ∀x⊆U, α_{2_B} (x) ⊆ α_{1_B} (x).

Definition 6. Let <U, A, D> be a decision information system, U=, U/D = {D₁, D₂, ⋯ , D_r}. For ∀x, y⊆U, its parameterized fuzzy decision-making is constructed as follows: $F D_{j} (x) = \frac{\sum {[x]}_{A} (d)}{\sum {[x]}_{A} (y)}$ (4) where d ⊆ D_j, j = 1, 2,. . . , r . $F D_{j} = {F D_{j} (x_{1}), F D_{j} (x_{2}), . . ., F D_{j} (x_{n})}$ (5) where j = 1, 2,. . . , r . $FD = {F {D_{1}}^{T}, F {D_{2}}^{T}, . . ., F {D_{r}}^{T}}$ (6) where FD_j (x) represents the membership degree of x on FD_j, FD_j is the fuzzy set of the sample decision equivalence class, and FD is the fuzzy decision-making of the sample.

Example 1. Given a decision table <U, A, D> is shown in Table 1, A = {a₁, a₂, a₃, a₄} is a condition attributes set, D is decision attribute.

The values in Table 1 are normalized to [0,1] by the formula f (x_i) = (x_i - x_min)/(x_max - x_min) as shown in Table 2:

Table 1

Decision table

U	a ₁	a ₂	a ₃	a ₄	D
x ₁	15.5	2.68	4.6	3.22	1
x ₂	24	1.84	3.74	2.78	2
x ₃	19.5	2.58	2.9	2.81	2
x ₄	21.5	0.65	7.65	1.86	3

Table 2

Normalized data

U	a ₁	a ₂	a ₃	a ₄	D
x ₁	0	1	0.3579	1	0
x ₂	1	0.5862	0.1768	0.6765	0.5000
x ₃	0.4706	0.9507	0	0.6985	0.5000
x ₄	0.7059	0	1	0	1

The fuzzy similarity relation R_{a
_k} between samples x_i and x_j is calculated relative to attribute a_k as: $R_{a_{k}} = 1 - | x_{ik} - x_{jk} |$ (7) where a_k ∈ A, k=1, 2, 3, 4; x_i, x_j ∈ U, i = 1, 2, 3, 4, j=1, 2, 3, 4.

So we can get the fuzzy similarity matrix [x] _{a
_k} (y) about the attribute a_k, and in conjunction with Definition 4, [x] _A (y)= $\min_{a_{k} \in A} ({[x]}_{a_{k}} (y))$ . Due to that fuzzy similarity relation R_{a
_k} meet the reflexivity, so R_{a
_k}=1 when i=j, then get: ${[x]}_{A} (y) = [\begin{matrix} 1 & 0 & 0.5294 & 0 \\ 0 & 1 & 0.4706 & 0.1768 \\ 0.5294 & 0.4706 & 1 & 0 \\ 0 & 0.1768 & 0 & 1 \end{matrix}]$

Table 1 is apparent to find U/D={D₁, D₂, D₃}, D₁={x₁}, D₂={x₂, x₃}, D₃={x₄}. According to Definition 6 we can get: $\begin{matrix} F D_{1} & = & {F D_{1} (x_{1}), F D_{1} (x_{2}), F D_{1} (x_{3}), F D_{1} (x_{4})} \\ = & {0.6538, 0, 0.2647, 0} \\ = & {F D_{2} (x_{1}), F D_{2} (x_{2}), F D_{2} (x_{3}), F D_{2} (x_{4})} \\ = & {0.3462, 0.8927, 0.7353, 0.1503} \\ = & {F D_{3} (x_{1}), F D_{3} (x_{2}), F D_{3} (x_{3}), F D_{3} (x_{4})} \\ = & {0, 0.1073, 0, 0.8497} \end{matrix}$

So the final fuzzy decision-making is: $\begin{matrix} FD = {F {D_{1}}^{T}, F {D_{2}}^{T}, F {D_{3}}^{T}} \\ = [\begin{matrix} 0.6538 & 0.3462 & 0 \\ 0 & 0.8927 & 0.1073 \\ 0.2647 & 0.7353 & 0 \\ 0 & 0.1503 & 0.8497 \end{matrix}] \end{matrix}$

3.2 Fuzzy neighborhood conditional entropy

Based on fuzzy neighborhood granule and fuzzy decision-making, we propose the fuzzy neighborhood conditional entropy. In addition, those strict proof of other theorems are given.

Definition 7. Let <U, A, D> be a fuzzy neighborhood decision information system, B⊆A is a conditional attributes subset, α_B (x) is fuzzy neighborhood granule with radius α. Then the fuzzy neighborhood rough entropy of B is defined as: $E_{f r} (B) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} \log_{2} \frac{1}{| α_{B} (x_{i}) |}$ (8) where |α_B (x_i) | is the number of non-zero values in the fuzzy neighborhood granule of object x_i, and $\frac{1}{| α_{B} (x_{i}) |}$ is the probability of an element in the fuzzy neighborhood granule |α_B (x_i) |.

Definition 8. Let M and N be two fuzzy sets on U. |M ∩ N| is defined as the number of non-zero samples whose degree of membership to M is not greater than that of N.

Example 2. Given a set U={x₁, x₂, ⋯ , x₁₀}, M and N are two fuzzy sets defined on U, and represent the degree of membership of the sample respectively, as follows:

$M = {\frac{0.5}{x_{1}}, \frac{0.2}{x_{2}}, \frac{0.7}{x_{3}}, \frac{0.1}{x_{4}}, \frac{0.3}{x_{5}}, \frac{0.2}{x_{6}}, \frac{0.4}{x_{7}}, \frac{0.3}{x_{8}}, \frac{0.1}{x_{9}}, \frac{0.6}{x_{10}}}$ $N = {\frac{0.6}{x_{1}}, \frac{0.2}{x_{2}}, \frac{0.5}{x_{3}}, \frac{0.3}{x_{4}}, \frac{0.4}{x_{5}}, \frac{0.1}{x_{6}}, \frac{0.3}{x_{7}}, \frac{0.8}{x_{8}}, \frac{0.1}{x_{9}}, \frac{0.9}{x_{10}}}$ so we can get:

|M ∩ N| = | {x₁, x₂, x₄, x₅, x₈, x₉, x₁₀} |=7,

|N ∩ M| = | {x₂, x₃, x₆, x₇, x₉} |=5.

Definition 9. Let <U, A, D> be a fuzzy neighborhood decision information system, B⊆A is a conditional attributes subset. Both fuzzy neighborhood granule α_B (x) and fuzzy decision-making FD = {FD₁, FD₂,. . . , FD_r} are two fuzzy matrices, then the fuzzy neighborhood conditional entropy of decision attribute D with respect to attribute set B on U is defined as: $\begin{array}{l} E_{f r} (D | B) = \\ \frac{1}{| U |} \sum_{i = 1}^{| U |} \sum_{j = 1}^{r} \frac{| α_{B} (x_{i}) \cap^{} F D_{j} |}{| α_{B} (x_{i}) |} \log_{2} \frac{| α_{B} (x_{i}) |}{| α_{B} (x_{i}) \cap^{} F D_{j} |} \end{array}$ (9) where |α_B (x_i) | is the number of non-zero values in the fuzzy neighborhood granule of object x_i, |α_B (x_i) ∩ FD_j| represents the number of non-zero values whose degree of membership to α_B (x_i) is not greater than that of FD_j.

Theorem 1. Let <U, A, D> be a fuzzy neighborhood decision information system, ∀P, Q⊆A is any two sets of condition attributes. If α_P (x)⊆α_Q (x), then E_fr (D|P) ≤ E_fr (D|Q), and the equality hold up if and only if α_P (x)=α_Q (x).

Corollary 1. Let ∀P, Q⊆A is any two sets of condition attributes. If Q⊆P, then E_fr (D|P) ≤ E_fr (D|Q).

Proof. It can be achieved by Definition 6 in [21] combined with Lemma 4.1 in [28].

Theorem 2. Let <U, A, D> be a fuzzy neighborhood decision information system, B⊆A is a conditional attributes subset, and fuzzy decision-making FD={FD₁, FD₂,. . . , FD_r}. Then E_fr (D|B) ≥0.

Proof. Assume E_fr (D|B) <0, then

${log}_{2} \frac{| α_{B} (x_{i}) |}{| α_{B} (x_{i}) \cap F D_{j} |} < 0$ , equivalent to $\frac{| α_{B} (x_{i}) |}{| α_{B} (x_{i}) \cap F D_{j} |} < 1$ , which is |α_B (x_i) | < |α_B (x_i) ∩ FD_j|, but this obviously does not hold. So $\frac{| α_{B} (x_{i}) |}{| α_{B} (x_{i}) \cap F D_{j} |} \geq 1$ , that is ${log}_{2} \frac{| α_{B} (x_{i}) |}{| α_{B} (x_{i}) \cap F D_{j} |} \geq 0$ . Therefore, E_fr (D|B) ≥0.

Theorem 3. Let <U, A, D> be a fuzzy neighborhood decision information system, B⊆A, b∈B. If E_fr (D|B - {b}) = E_fr (D|B), then attribute b is unnecessary.

Proof. Supposing that ∃b ∈ B satisfies E_fr (D|B - {b})=E_fr (D|B), and b is necessary. We see from Definitions 4-5 that α_B-{b} (x) ≠ α_B (x), and B - {b} ⊆ B, according to Theorem 1 and Corollary 1, we know that E_fr (D|B - {b})>E_fr (D|B), this contradicts the hypothesis. So for ∀b ∈ B, if E_fr (D|B - {b}) = E_fr (D|B), the attribute b is unnecessary.

Definition 10. Let <U, A, D> be a fuzzy neighborhood decision information system. For ∀B⊆A, We call B a reduction of A in the decision information system relative to decision attribute D if it satisfies that:

E_fr (D|B) = E_fr (D|A);

For ∀b ∈ B, E_fr (D|B - {b}) > E_fr (D|B).

Definition 11. Let <U, A, D> be a fuzzy neighborhood decision information system, attribute significance for a ∈ A relative to D is defined as: $SIG (a, A, D) = E_{fr} (D | A - {a}) - E_{fr} (D | A)$ (10)

To obtain a relative reduction subset, two conditions must be satisfied according to Definition 10. However, due to the inconsistence of datasets and the existence of noise [29-30], and finding the minimum reduction is NP-complete [31], small deviation need to be allowed. In this model, the difference of the conditional entropy between the reduced subset and the original condition subset relative to the decision attribute is not greater than the parameter β as the constraint condition, and the relative approximate reduced subset red is selected, that is, red need satisfies: $E_{fr} (D | red) - E_{fr} (D | A) \leq β$ (11)

Then the attribute significance for r ∈ A - red relative to D is defined as: $SIG (r, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {r})$ (12)

3.3 Feature genes selection algorithm based on fuzzy neighborhood conditional entropy model

According to the above definitions, this paper designs a feature genes selection algorithm for fuzzy neighborhood conditional entropy model. The significance of candidate gene was calculated by using the monotonicity principle of fuzzy neighborhood conditional entropy. The specific calculation process is illustrated by Example 3 and the detailed design flow is shown in Algorithm 1.

Example 3. Building on example 1, the data here is only for the algorithm1 example, so the neighborhood radius α and parameters β are set to 0 (The parameter settings in the actual experiment will be discussed in this paper).

According to example 1, we have got fuzzy similarity matrix [x] _a (y) and fuzzy decision-making FD, and because of α = 0, so the fuzzy neighborhood granule is: α_A (x) = [x] _A (y). Then we can get: $\begin{matrix} E_{fr} (D | A) = 0.25 \\ E_{fr} (D | A - {a_{1}}) = 0.2642 \\ E_{fr} (D | A - {a_{2}}) = 0.25 \\ E_{fr} (D | A - {a_{3}}) = 0.6392 \\ E_{fr} (D | A - {a_{4}}) = 0.25 \end{matrix}$

So the attribute significance for a ∈ A relative to D are:

$\begin{matrix} SIG (a_{1}, A, D) & = & E_{fr} (D | A - {a_{1}}) - E_{fr} (D | A) \\ = & 0.0142 \\ SIG (a_{2}, A, D) & = & E_{fr} (D | A - {a_{2}}) - E_{fr} (D | A) \\ = & 0 \\ SIG (a_{3}, A, D) & = & E_{fr} (D | A - {a_{3}}) - E_{fr} (D | A) \\ = & 0.3892 \\ SIG (a_{4}, A, D) & = & E_{fr} (D | A - {a_{4}}) - E_{fr} (D | A) \\ = & 0 \end{matrix}$ We can see that SIG (a₃, A, D) = max {a|SIG (a, A, D)}, so red = {a₃}, B = {a₁, a₂, a₄}.

Then we get E_fr (D|red) =0.8412, apparently, E_fr (D|red) - E_fr (D|A) >0, we need to calculate: $\begin{matrix} E_{fr} (D | red \cup {a_{1}}) = 0.75 \\ E_{fr} (D | red \cup {a_{2}}) = 0.2642 \\ E_{fr} (D | red \cup {a_{4}}) = 0.2642 \end{matrix}$

The attribute significance for r ∈ B relative to D are: $\begin{matrix} SIG (a_{1}, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {a_{1}}) \\ = 0.0912 \\ SIG (a_{2}, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {a_{2}}) \\ = 0.5771 \\ SIG (a_{4}, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {a_{4}}) \\ = 0.5771 \end{matrix}$ We can see that SIG (a₂, red, D) = SIG (a₄, red, D) = max {r|SIG (r, red, D)}, so taking order priority to get red = {a₃, a₂}, B = {a₁, a₄}.

Then we get E_fr (D|red) =0.2642, apparently, E_fr (D|red) - E_fr (D|A) >0, so we need to continue to calculate: $\begin{matrix} E_{fr} (D | red \cup {a_{1}}) = 0.25 \\ E_{fr} (D | red \cup {a_{4}}) = 0.2642 \end{matrix}$

The attribute significance for r ∈ B relative to D are:

\begin{matrix} SIG (a_{1}, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {a_{1}}) \\ = 0.0142 \\ SIG (a_{4}, red, D) = E_{fr} (D | red) - E_{fr} (D | red \cup {a_{4}}) \\ = 0 \end{matrix}

Algorithm 1:
Feature genes selection based on fuzzy neighborhood conditional entropy (FNCE)

Input: Fuzzy neighborhood decision system <U, A, D>, parameter α and β. // α is fuzzy neighborhood radius, β controls the selection of feature genes subset.

Output: feature genes set red

Step1: Initialize red=∅

Step2: For ∀a∈A, calculate the fuzzy similarity matrixR_a

Step3: Fuzzy decision FD is calculated according to Definition 6

Step4: Calculate the fuzzy neighborhood condition entropy E_fr (D|A) of decision attribute D with respect to A according to Definition 5 and Definition 9

Step5: For ∀a∈A, calculate SIG (a, A, D), if a satisfies SIG (a, A, D)=max {a|SIG (a, A, D)}, then red = red ∪ {a}

Step6: Calculate E_fr (D|red), if E_fr (D|red) - E_fr (D|A)≤β, execute Step10, otherwise execute Step7

Step7: Let B = A - red, for ∀r ∈ B, calculate SIG (r, red, D)

Step8: Choose r that satisfies SIG (r, red, D) = max {r|SIG (r, red, D) , r∈B}

Step9: Let B=B - {r}, red=red ∪ {r}, execute Step6

Step10: Return red

We can see that SIG (a₁, red, D) = max {r|SIG (r, red, D)}, so red = {a₃, a₂, a₁}, B = {a₄}.

Then we get E_fr (D|red) =0.25, apparently, E_fr (D|red) - E_fr (D|A) =0, so finally return red = {a₃, a₂, a₁}.

4. Experimental results and analysis

In this section, we select four data sets from public data sources, compare our designed algorithm with other existing related algorithms by the classification accuracy and number of selected feature genes and discuss the selection of suitable parameters for different data sets.

4.1 Experiment preparation

In order to ensure the reliability of experimental results, the data sets are all selected from public data sources. Among them, the first three sets of data are common standard data sets selected from the UCI Machine Learning Repository, with the last set selected from http://datam.i2r.a-star.edu.sg/datasets/krbd/ as experimental data. Specifically see information in detail from Table 3. In order to calculate correlation between the data more conveniently, the experimental data are all normalized to [0,1] by the formula f (x_i) = (x_i - x_min)/(x_max - x_min).

Table 3
Description of data sets

No Data sets Sample Attributes Classes

1 WPBC 198 32 2

2 WDBC 569 30 2

3 Heart-Cle 303 13 5

4 Colon 62 2000 2

No	Data sets	Sample	Attributes	Classes
1	WPBC	198	32	2
2	WDBC	569	30	2
3	Heart-Cle	303	13	5
4	Colon	62	2000	2

4.2 Experimental comparision

In order to validate the validity of the proposed algorithm (FNCE), data that has not been processed by any algorithm (Raw data) will be used as a comparison. In addition, FNCE also will be compared with existing related algorithms in the experiment, including the algorithm bases on fuzzy entropy (FISEN) [32] and the algorithm bases on fuzzy neighborhood rough set(FNRS) [18]. The numbers of features selected from different algorithms are listed in Table 4. We respectively use two different classification learning algorithms: Support Vector Machine (Liner-SVM) and K-Nearest Neighbor (KNN, K = 3) to evaluate the performance of different algorithms. In order to make the classification accuracy of classifier more representative, the 10-fold cross validation is employed. Finally, the classification accuracy obtained from different algorithms are shown in Tables 5-6.

Table 4
Number of selected feature genes

Data Sets Raw data FISEN FNRS FNCE

WPBC 32 16 8 3

WDBC 30 16 18 6

Heart-Cle 13 11 7 8

Colon 2000 10 5 4

Average 518.75 13.25 9.50 6.25

Data Sets	Raw data	FISEN	FNRS	FNCE
WPBC	32	16	8	3
WDBC	30	16	18	6
Heart-Cle	13	11	7	8
Colon	2000	10	5	4
Average	518.75	13.25	9.50	6.25

Table 5

Classification accuracy for reduced data with Liner-SVM

Data Sets	Raw data	FISEN	FNRS	FNCE
WPBC	74.24	81.82	82.50	85.00
WDBC	93.16	95.26	97.37	99.12
Heart-Cle	59.41	56.44	65.57	68.85
Colon	75.00	83.33	92.31	92.31
Average	75.45	79.21	84.44	86.32

Table 6

Classification accuracy for reduced data with 3NN

Data Sets	Raw data	FISEN	FNRS	FNCE
WPBC	74.84	77.72	77.50	85.00
WDBC	97.00	96.13	97.37	98.25
Heart-Cle	57.38	58.42	70.49	72.13
Colon	76.92	84.62	84.62	92.31
Average	76.54	79.22	82.50	86.92

From Table 4, we can obviously see the number of feature genes selected by the three algorithms are less than the number of feature genes in the original data set, indicating that those algorithms can eliminate redundant genes to a certain extent. According to the average of the number of feature genes after reduction, the FNCE algorithm proposed in this paper is superior to the other two algorithms, for resulting in relatively fewer feature genes.

Table 5 shows the classification accuracy of the Raw data and reduced feature genes subset obtained by different algorithms under the Liner-SVM classifier. The classification accuracy of the FNCE algorithm on the four data sets is higher than that of the Raw data and the FISEN algorithm. It is higher than the FNRS algorithm on WPBC, WDBC and Heart-Cle data sets and the same as FNRS algorithm on Colon data set. By means of the average of classification accuracy, the average classification accuracy obtained by FNCE algorithm is the highest. Table 6 shows the classification accuracy of the Raw data and reduced feature genes subset obtained by different algorithms under the 3NN classifier. The accuracy of the FNCE algorithm on four data sets is higher than that of the Raw data and the other two algorithms, where about 7 - 8 percent higher than the FISEN and FNRS algorithms on the WPBC and Colon data sets.

It is not difficult to find from this experiment that the FNCE algorithm can not only remove redundant features to obtain a smaller subset of feature genes, but also obtain higher classification accuracy. Especially in the WPBC and WDBC data sets, the number of feature genes obtained by FNCE is less than half of the FNRS algorithm, while the classification accuracy is higher than the other two algorithms. Therefore, it is proved that the FNCE algorithm proposed in this paper can extract the most effective feature genes subset.

4.3 Parameter discussion

In this algorithm, there are two parameters α and β. α is the fuzzy neighborhood radius. In order to reduce the noise and the unwanted computation cost caused by the weak correlation between samples, an appropriate fuzzy neighborhood radius should be selected during the calculation of the fuzzy neighborhood granule. β controls the selection of feature genes subset. Due to the inconsistency and redundant noise of data sets, the deviation of conditional entropy between the resulting reduction subset and the original condition subset relative to the decision attributes need to be allowed. Different data sets have different strength of relationship, so we need to select the most suitable parameters for different data sets.

We set α and β to vary from 0 to 0.5, respectively, with interval of 0.05. The number of feature genes selected by different parameters, combined with their classification accuracy under Liner-SVM classifier (the experimental results using 3NN classifier are roughly the same), is shown in Figs.1-4. Final parameters used for different data sets are shown in Table 7.

Fig.1

The classification accuracy varying with parameters α and β (WPBC).

Fig.2

The classification accuracy varying with parameters α and β (WDBC).

Fig.3

The classification accuracy varying with parameters α and β (Heart-Cle).

Fig.4

The classification accuracy varying with parameters α and β (Colon).

Table 7

Suitable parameters for different data sets

No	Data sets	Values of α	Values of β
1	WPBC	0.1	0.1
2	WDBC	0.1	0.05
3	Heart-Cle	0.2	0
4	Colon	0.2	0.5

5. Conclusions and future works

In the gene expression profiling, this paper mainly focus on the key data which the neighborhood of the sample is not completely included in the sample decision equivalence class, and proposes a feature genes selection algorithm based on the fuzzy neighborhood conditional entropy model. Through the fuzzy neighborhood granule and fuzzy decision-making, the information contained in the original data can be more accurately described. According to the fuzzy neighborhood conditional entropy model, the best feature genes subset is obtained by basing on tolerating a little noise. Finally, we achieve a higher classification accuracy with a smaller number of feature genes and the experimental results, which are obtained by the designed algorithm, verify the effectiveness of proposed model. This study is based on a complete information decision-making system. Therefore, in the next step, we will supplement and verify the validity of this model under the incomplete information decision-making system, and further optimize the parameter selection to find the appropriate generalized parameters.

Footnotes

Acknowledgements

This work is supported by National Natural Science Foundation of China (Nos.61370169, 61402153, 61772176), Key Project of Science and Technology Department of Henan Province (Nos.162102210261).

References

T.H.

Xu ,

Y.Y.

Ma and

J.C

Xu , Efficient Gene Selection Technique Based on Maximum Neighborhood Mutual Information and Particle Swarm Optimization, Journal of Chinese Computer Systems 37(8) (2016), 1775–1779.

S.M.

Brown , Analysis of Microarray Data, Journal of Computational Biology 1(1) (2012), 82–83.

Pawlak , Rough sets, International of Compute and Infor- nation Science 11(5) (1982), 341–356.

Lang ,

Li and

Yang , An incremental approach to attribute reduction of dynamic set-valued information systems, International Journal of Machine Learning and Cybernetics 5(5) (2014), 775–788.

J.Y.

Liang ,

Wang ,

C.Y.

Dang , et al., A group incremental approach to feature selection applying rough set technique, IEEE Transaction on Knowledge and Data Engineering 26(2) (2014), 294–308.

X.W.

Hu ,

Jiang , et al., New classification method based on neighborhood relation fuzzy rough set, Journal of Computer Applications 35(11) (2015), 3116–3121.

C.Z.

Wang ,

Y.L.

Q ,

M.W.

Shao , et al., A Fitting Model for Feature Selection With Fuzzy Rough Sets, IEEE Transactions on Fuzzy Systems 25(4) (2017), 741–753.

T.Y.

Lin , Neighborhood systems-application to qualitative fuzzy and rough setsin:

P.P.

Wang

(Ed.), Advances in Machine Intelligence and Soft Computing (1997), 132–155.

Y.Y.

Yao , Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111(1-4) (1998), 239–259.

10.

Q.H.

Hu ,

Yu ,

J.F.

Liu and

C.X

Wu , Neighborhood-rough-set based heterogeneous feature subset selection, Information Sciences 178(18) (2008), 3577–3594.

11.

Dubois and

Prade , Rough fuzzy sets and fuzzy rough sets, Int J Gen Syst 17(2-3) (1990), 191–209.

12.

Jensen and

Shen , Fuzzy-rough attributes reduction with application to web categorization, Fuzzy Sets and Systems 141 (2004), 469–485.

13.

R.B.

Bhatt and

Gopal , On the compact computational domain of fuzzy rough sets, Pattern Recognit Lett 26(11) (2005), 1632–1640.

14.

Hu ,

Yu ,

Pedrycz and

Chen , Kernelized fuzzy rough sets and their applications, IEEE Trans Knowl Data Eng 23(11) (2011), 1649–1667.

15.

Mieszkowicz-Rolka and

Rolka , Variable precision fuzzy rough sets, Lecture Notes in Computer Sci 3100 (2004), 144–160.

16.

Y.H.

Qian ,

J.Y.

Liang ,

Pedrycz and

C.Y

Dang , Positive approximation: an accelerator for attribute reduction in rough set theory, Artificial Intelligence 174(9) (2010), 597–618.

17.

Hu ,

Yu ,

Pedrycz ,

Chen , Kernelized fuzzy rough sets and their applications, IEEE Trans Knowl 23(11) (2011), 1649–1667.

18.

C.Z.

Wang ,

M.W.

Shao ,

He ,

Y.H.

Qian and

Y.L

Qi , Feature subset selection based on fuzzy neighborhood rough sets, IEEE Trans Knowledge-Based Systems 111 (2016), 173–179.

19.

Huang ,

X.Z.

Zhou and

R.R

Zhang , Attribute reduction based on information quantity under incomplete information systems, Systems Engineering-Theory and Practice 25(4) (2005), 55–60.

20.

Yao

, Liu

, Zhang

, et al., Type-2 fuzzy cross-entropy and entropy measures and their applications, Journal of Intelligent & Fuzzy Systems 30(4) (2016), 2169–2180.

21.

Sun ,

J.C.

Xu and

Tian , Feature selection using rough entropybased uncertainty measures in incomplete decision systems, Knowledge-Based Systems 36 (2012), 206–216.

22.

Zhang and

N.B

Fan , Heuristic attribute reduction based on neighborhood approximate conditional entropy, Application Research of Computers 35(5) (2018).

23.

Yao ,

Wang , et al., Uncertainty measure and attribute reduction in incomplete neighborhood rough set, Journal of Computer Applications 38(1) (2018), 97–103.

24.

Q.H.

Hu ,

D.R.

Yu and

Z.X

Xie , Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation, Journal of Software 19(3) (2008), 640–649.

25.

Xie ,

J.S.

Lei and

F.F

Xu , Vibrant Fault Diagnosis for Hydroturbine Generating Unit Based on Improved Neighborhood Rough Sets and PNN, Journal of Shanghai University of Electric Power 32(2) (2016), 181–187.

26.

Y.Chen,

Zhang ,

Zheng , et al., Gene selection for tumor classification using neighborhood rough sets and entropy measures, Journal of Biomedical Informatics 67 (2017), 59–68.

27.

Zhang ,

Mei ,

Chen , et al., Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56(1) (2016), 1–15.

28.

Wang , Rough reduction in algebra view and information view, International Journal of Intelligent Systems 18(6) (2003), 679–688.

29.

Slezak and

Wroblewki , Order Based Genetic Algorithms for the Search of Apporximate Entropy Reducts, In:REFDGrC2003 (2003), 308–311.

30.

Rashedi and

Nezamabadi-pour , Feature subset selection using improved binary gravitational search algorithm, Journal of Intelligent & Fuzzy Systems 26(3) (2014), 1211–1221.

31.

S.K.M.

Wong ,

Ziarko , On optimal decision rules in decision tables, Bulletin of Polish Academy of Science 33(11-12) (1985), 693–696.

32.

Q.H.

Hu ,

Yu and

Z.X

Xie , Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters 27(5) (2006), 414–423.

Feature genes selection based on fuzzy neighborhood conditional entropy

Abstract

Keywords

1. Introduction

2. Related theoretical knowledge

3.1 Fuzzy neighborhood granule and fuzzy decision-making

Algorithm 1: Feature genes selection based on fuzzy neighborhood conditional entropy (FNCE)

4.1 Experiment preparation

Table 3 Description of data sets No Data sets Sample Attributes Classes 1 WPBC 198 32 2 2 WDBC 569 30 2 3 Heart-Cle 303 13 5 4 Colon 62 2000 2

Table 4 Number of selected feature genes Data Sets Raw data FISEN FNRS FNCE WPBC 32 16 8 3 WDBC 30 16 18 6 Heart-Cle 13 11 7 8 Colon 2000 10 5 4 Average 518.75 13.25 9.50 6.25

Footnotes

Acknowledgements

References

Algorithm 1:
Feature genes selection based on fuzzy neighborhood conditional entropy (FNCE)

Table 3
Description of data sets

No Data sets Sample Attributes Classes

1 WPBC 198 32 2

2 WDBC 569 30 2

3 Heart-Cle 303 13 5

4 Colon 62 2000 2

Table 4
Number of selected feature genes

Data Sets Raw data FISEN FNRS FNCE

WPBC 32 16 8 3

WDBC 30 16 18 6

Heart-Cle 13 11 7 8

Colon 2000 10 5 4

Average 518.75 13.25 9.50 6.25