Feature selection for interval-valued data via FRIC -model

Abstract

Feature selection is one basic technology for data mining. This paper investigates feature selection for interval-valued data via fuzzy rough iterative computation model (FRIC-model). To depict the similarity between samples in an interval-valued decision information system (IVDIS), the fuzzy symmetry relation in an IVDIS is first introduced from the perspective of “The similarity between information values is fed back to the feature set”. After that, several attribute evaluation functions, such as fuzzy positive regions, dependency functions and attribute importance functions are defined. Subsequently, FRIC-model for interval-valued data is established by using the iterations of these functions. Next, An feature selection algorithm in an IVDIS based on this model is presented. Lastly, numerical experiments and statistics tests are carried out to estimate the performance of the presented algorithm. The experimental results illustrate that the presented algorithm maintains high classification accuracy, and does not occupy too much memory. These findings will provide new perspective for feature selection in an IVDIS.

Keywords

IVDIS Feature selection FRS Attribute evaluation function

1 Introduction

Rough set theory (RST) is a significant method to handle uncertainty. Its idea is to use the known knowledge to describe the uncertain knowledge. An information system (IS) was presented by Pawlak [30, 31]. RST can effectively deal with the uncertainty of an IS. The application of RST is mostly related to an IS [25 , 43].

In real life, some data are expressed in the form of interval-valued numbers, such as estimating the price of an item, the age of a person, the length of a sample, etc. Thus, interval-valued data is an important type of data. An interval-valued information system (IVIS) means an IS where its attribute values are interval-valued numbers. When the conditional attribute values of a decision information system (DIS) are interval-valued numbers, it is called an interval-valued decision system (IVDIS). IVISs and IVDISs are regarded as the generalization of IS and DIS, respectively, which can describe imprecise samples more accurately. Some scholars have studied IVISs and IVDISs. For example, Dai et al. [9] studied uncertainty measurement for an IVDIS via extended conditional entropy; Zhang et al. [43] presented incremental updating of rough approximations in an IVIS under attribute generalization; Xie et al. [37] considered new measurements of uncertainty for an IVIS; Liu et al. [23] proposed attribute reduction algorithm via the fuzzy α-similarity relation in an incomplete IVIS.

Feature selection as an important technology of data processing in machine learning can effectively reduce redundant selections and improve the accuracy of classification. According to feature selection process whether relys on the learner, feature selection methods are usually divided into three categories: filtered [21], wrapped [20] and embedded [18]. Feature selection of RST can delete redundant attributes of conditional feature set while maintaining the dependency between conditional attributes and decision attributes in a decision table. It is a filtered feature selection method. Many researchers studied feature selection. Dai et al. [8] presented attribute attribute via conditional entropies for an incomplete DIS. Zhao et al. [44] put forward mixed feature selection in an incomplete DIS. Chen et al. [1] introduced a fuzzy kernel-feature selection method for heterogeneous data. Jia et al. [19] proposed similarity-based feature selection in RST from clustering perspective. Hu et al. [16] presented a robust and fast feature selection method on the basis of separability for a fuzzy DIS. Wang et al. [34] explored feature selection by means of local conditional entropy.

Fuzzy rough set (FRS) was proposed by Dubois et al. [6]. Moresi et al. [26] presented the axiomatic definition for FRS. Cornelis et al. [2] considered feature selection with FRS. Chen et al. [4] gave an feature selection algorithm based on FRS. Hu et al. [17] developed kernelized FRS. Wang et al. [35] investigated feature selection with FRS. Yuan et al. [40] studied unsupervised feature selection for mixed data based on FRS. Thippa et al. [33] discussed two of the prominent dimensionality reduction techniques. Gadekallu et al. [13] used the improved hybrid bat and firefly algorithms to calculate feature selection. Wang et al. [36] studied feature selection for categorical data based on FRS. The biggest advantage of intelligent algorithm is that it is fast and can deal with all kinds of datasets, but it is difficult to reduce the number of attributes while ensuring the classification accuracy.

Classical rough set model can deal with categorical data. The deficiency of this model is that it is sensitive to noise in classification learning due to the stringent condition of an equivalence relation. Thus, a fuzzy relation is introduced to describe the similarity between samples with categorical data. By the way, FRS model is established to manage feature selection for categorical data.

In order to handle interval-valued data, we usually use “The similarity between information values is fed back to the sample set”. But we can deal with interval-valued data from the perspective of “The similarity between information values is fed back to the feature set". Based on this perspective, a fuzzy relation can be defined to describe the similarity between samples, a variable parameter can be introduced to control the similarity between samples and FRIC-model can be established. This paper studies feature selection for interval-valued data via FRIC-model.

This paper is organized as follows. Section 2 recalls interval-valued numbers and IVDISs, and then introduces the fuzzy symmetry relation induced by each subsystem. Section 3 presents several attribute evaluation functions. Section 4 proposes FRIC-model in an IVDIS. Section 5 gives a feature selection algorithm in an IVDIS via FRIC-model. Section 6 implements comparative analysis. Section 7 executes statistical analysis. Section 8 concludes the paper.

2 Preliminaries

Throughout this paper, O signifies a finite set, 2^O denotes the family of all subsets on O, |X| expresses the cardinality of X ∈ 2^O,

Put $O = {o_{1}, o_{2}, \dots, o_{n}}$ (1)

2.1 Interval-valued numbers

Put $\begin{matrix} Ω = {s = [s_{1}, s_{2}] : s_{1} and s_{2} are real numbers, \\ s_{1} \leq s_{2}} \end{matrix}$ (2)

forany s, t = [t₁, t₂] ∈ Ω, define

(1) s = t ⇒ s₁ = t₁, s₂ = t₂.

(2) s ≤ t ⇒ s₁ ≤ t₁, s₂ ≤ t₂;

s < t ⇒ s ≤ t, s ≠ t.

Definition 2.1. [[27, 29] Let s, t ∈ Ω. Then the possible degree of s relative to t is computed by $PD (s, t) = \min {1, \max {\frac{s_{2} - t_{1}}{(s_{2} - s_{1}) + (t_{2} - t_{1})}, 0}}$ (3)

Proposition 2.2. [[11, 29] The following properties hold:

(1) ∀ s, t ∈ Ω, 0 ≤ PD(s, t) ≤1;

(2) ∀ s ∈ Ω, PD(s, s) =0.5;

(3) ∀ s, t ∈ Ω, PD(s, t) + PD(t, s) =1.

Definition 2.3. [9] Let s, t ∈ Ω. Then the similarity degree between s and t is computed by $SD (s, t) = 1 - | PD (s, t) - PD (t, s) |$ (4)

Proposition 2.4. [9] The following properties hold:

(1) ∀ s, t ∈ Ω, SD(s, t) = SD(t, s);

(2) ∀ s, t ∈ Ω, 0 ≤ SD(s, t) ≤1;

(3) ∀ s, t ∈ Ω, SD(s, t) =1 ⇒ s = t.

Example 2.5. Pick s = [4, 10] and t = [6, 9]. Then

$PD (s, t) = \min {1, \max {\frac{10 - 6}{(10 - 4) + (9 - 6)}, 0}} = \frac{4}{9},$ $PD (t, s) = \min {1, \max {\frac{9 - 4}{(10 - 4) + (9 - 6)}, 0}} = \frac{5}{9},$ $SD (s, t) = 1 - | PD (s, t) - PD (t, s) | = 1 - | \frac{4}{9} - \frac{5}{9} | = \frac{8}{9} .$

2.2 An IVDIS

noindentDefinition 2.6. [31] Let O be a finite sample set. F expresses a finite feature set. Then (O, F) is referred to as an information system (IS), if a ∈ F is able to decide a function a : O → V_a, where V_a = {a(o) : o ∈ O}.

(O, A, d) is known as a decision information system (DIS), if (O, A ∪ {d}) is an IS, where A is the conditional feature set and d is the decision feature.

Definition 2.7. [39] Let (O, A, d) be an DIS. Then (O, A, d) called an interval-valued decision information system (IVDIS), if ∀ a ∈ A, ∀ o ∈ O, a(o) is an interval-valued number.

If B ⊆ A, then (O, B, d) is known as a subsystem of (O, A, d).

Example 2.8. Table 1 depicts an IVDIS (O, A, d), where O = {o₁, ⋯ , o₁₂} and A = {a₁, ⋯ , a₆}.

Table 1
An IVDIS

a ₁ a ₂ a ₃ a ₄ a ₅ a ₆ d

o ₁ [68.75,80.79] [12.06,22.79] [51.52,77.06] [30.61,45.38] [70.11,81.16] [30.35,33.43] 1

o ₂ [75.71,81.46] [11.97,19.62] [64.77,80.64] [36.87,43.52] [77.54,80.59] [30.35,33.43] 2

o ₃ [61.52,77.06] [23.24,34.75] [51.52,77.06] [30.61,45.38] [70.11,81.16] [31.24,35.75] 2

o ₄ [75.71,81.46] [11.97,19.62] [58.75,70.79] [43.26,54.95] [77.54,80.59] [30.35,33.43] 3

o ₅ [60.77,82.64] [12.06,22.79] [51.52,77.06] [54.77,70.64] [77.54,80.59] [30.35,33.43] 2

o ₆ [68.75,80.79] [12.06,22.79] [64.77,80.64] [43.26,54.95] [33.26,44.95] [57.54,60.59] 1

o ₇ [75.71,81.46] [11.97,19.62] [51.52,77.06] [43.26,54.95] [70.11,81.16] [31.24,35.75] 3

o ₈ [44.77,70.64] [23.24,34.75] [51.52,77.06] [36.87,43.52] [70.11,81.16] [30.35,33.43] 1

o ₉ [75.71,81.46] [11.97,19.62] [65.71,71.46] [36.87,43.52] [77.54,80.59] [30.35,33.43] 2

o ₁₀ [75.71,81.46] [23.24,34.75] [64.77,80.64] [30.61,45.38] [77.54,80.59] [30.35,33.43] 1

o ₁₁ [68.75,80.79] [12.06,22.79] [64.77,80.64] [43.26,54.95] [70.11,81.16] [31.24,35.75] 2

o ₁₂ [75.71,81.46] [23.24,34.75] [64.77,80.64] [36.87,43.52] [54.77,60.64] [31.24,35.75] 3

	a ₁	a ₂	a ₃	a ₄	a ₅	a ₆	d
o ₁	[68.75,80.79]	[12.06,22.79]	[51.52,77.06]	[30.61,45.38]	[70.11,81.16]	[30.35,33.43]	1
o ₂	[75.71,81.46]	[11.97,19.62]	[64.77,80.64]	[36.87,43.52]	[77.54,80.59]	[30.35,33.43]	2
o ₃	[61.52,77.06]	[23.24,34.75]	[51.52,77.06]	[30.61,45.38]	[70.11,81.16]	[31.24,35.75]	2
o ₄	[75.71,81.46]	[11.97,19.62]	[58.75,70.79]	[43.26,54.95]	[77.54,80.59]	[30.35,33.43]	3
o ₅	[60.77,82.64]	[12.06,22.79]	[51.52,77.06]	[54.77,70.64]	[77.54,80.59]	[30.35,33.43]	2
o ₆	[68.75,80.79]	[12.06,22.79]	[64.77,80.64]	[43.26,54.95]	[33.26,44.95]	[57.54,60.59]	1
o ₇	[75.71,81.46]	[11.97,19.62]	[51.52,77.06]	[43.26,54.95]	[70.11,81.16]	[31.24,35.75]	3
o ₈	[44.77,70.64]	[23.24,34.75]	[51.52,77.06]	[36.87,43.52]	[70.11,81.16]	[30.35,33.43]	1
o ₉	[75.71,81.46]	[11.97,19.62]	[65.71,71.46]	[36.87,43.52]	[77.54,80.59]	[30.35,33.43]	2
o ₁₀	[75.71,81.46]	[23.24,34.75]	[64.77,80.64]	[30.61,45.38]	[77.54,80.59]	[30.35,33.43]	1
o ₁₁	[68.75,80.79]	[12.06,22.79]	[64.77,80.64]	[43.26,54.95]	[70.11,81.16]	[31.24,35.75]	2
o ₁₂	[75.71,81.46]	[23.24,34.75]	[64.77,80.64]	[36.87,43.52]	[54.77,60.64]	[31.24,35.75]	3

Let (O, A, d) be an IVDIS with B ⊆ A and θ ∈ [0, 1]. Define $R_{B}^{θ} = {(o, o^{'}) \in O \times O : \forall a \in B, SD (a (o), a (o^{'})) \geq θ}$ (5) $R_{d} = {(o, o^{'}) \in O \times O : d (o) = d (o^{'})}$ (6)

Apparently, $R_{B}^{θ}$ is a tolerance relation on O and R_d is an equivalence relation on O .

In this paper, denote $\forall o \in O, R_{d} (o) = {o^{'} \in O : d (o) = d (o^{'})}$ (7) $O / d = {R_{d} (o) : o \in O} = {D_{1}, D_{2}, \dots, D_{r}}$ (8)

2.3 The fuzzy symmetry relation induced by the subsystem of an IVDIS

In $R_{B}^{θ}$ , “SD(a(o) , a(o′)) ≥ θ" is fed back to the sample set of an IVDIS. If “SD(a(o) , a(o′)) ≥ θ" is fed back to the feature set of an IVDIS, then we can introduce the following definition.

Definition 2.9.Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Then the fuzzy relation on O is computed by $S_{B}^{θ} (o, o^{'}) = \frac{1}{λ} | {a \in B : SD (a (o), a (o^{'})) \geq θ} |$ (9) $R_{d} = {(o, o^{'}) \in O \times O : d (o) = d (o^{'})}$ (10) where λ is a constant with λ ∈ [|B|, |A|], for instance, λ = |A|.

forconvenience, denote $B_{{oo}^{'}}^{θ} = {a \in B : SD (a (o), a (o^{'})) \geq θ}$ (11) Then $S_{B}^{θ} (o, o^{'}) = \frac{1}{λ} | B_{{oo}^{'}}^{θ} |$ (12)

Clearly, $S_{B}^{θ}$ is a fuzzy symmetry relation on O, which is referred to as the fuzzy symmetry relation induced by the subsystem (O, B, d). If λ = |B|, then $S_{B}^{θ}$ is a fuzzy tolerance relation on O.

Proposition 2.10.Let (O, A, d) be an IVDIS.

(1) If B₁ ⊆ B₂ ⊆ A, then ∀ θ ∈ [0, 1], $R_{B_{1}}^{θ} \subseteq R_{B_{2}}^{θ}$ .

(2) If 0 ≤ θ₁ < θ₂ ≤ 1, then ∀ B ⊆ A, $R_{B}^{θ_{2}} \subseteq R_{B}^{θ_{1}}$ .

Proof. (1) By Definition 2.9, $R_{B_{1}}^{θ} (o, o^{'}) = \frac{1}{λ} | (B_{1})_{{oo}^{'}}^{θ} |, R_{B_{2}}^{θ} (o, o^{'}) = \frac{1}{λ} | (B_{2})_{{oo}^{'}}^{θ} | .$

Since B₁ ⊆ B₂, we have ∀ o, o′ ∈ O, $(B_{1})_{{oo}^{'}}^{θ} \subseteq (B_{2})_{{oo}^{'}}^{θ} .$

So ∀ o, o′ ∈ O, $R_{B_{1}}^{θ} (o, o^{'}) \leq R_{B_{2}}^{θ} (o, o^{'})$ .

Thus, $R_{B_{1}}^{θ} \subseteq R_{B_{2}}^{θ}$ .

(2) By Definition 2.9, $R_{B}^{θ_{1}} (o, o^{'}) = \frac{1}{λ} | B_{{oo}^{'}}^{θ_{1}} |, R_{B}^{θ_{2}} (o, o^{'}) = \frac{1}{λ} | B_{{oo}^{'}}^{θ_{2}} | .$

Since 0 ≤ θ₁ < θ₂ ≤ 1, we have ∀ o, o′ ∈ O, $B_{{oo}^{'}}^{θ_{2}} \subseteq B_{{oo}^{'}}^{θ_{1}} .$

So ∀ o, o′ ∈ O, $R_{B}^{θ_{2}} (o, o^{'}) \leq R_{B}^{θ_{1}} (o, o^{'})$ .

Thus, $R_{B}^{θ_{2}} \subseteq R_{B}^{θ_{1}}$ .□

3 Attribute evaluation functions

Definition 3.1.Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Define $\underline{S_{B}^{θ}} (X) (o) = ⋀_{o^{'} \notin X}, \forall o \in O$ (13) $\bar{S_{B}^{θ}} (X) (o) = ⋁_{o^{'} \in X} S_{B}^{θ} (o, o^{'}), \forall o \in O$ (14) Then $\underline{S_{B}^{θ}} (X)$ and $\bar{S_{B}^{θ}} (X)$ are referred to as the lower and upper fuzzy approximations of X, respectively.

Proposition 3.2.Let (O, A, d) be an IVDIS. Then the following properties hold. (1) ∀ B ⊆ A, ∀ θ ∈ [0, 1], ∀ X ∈ 2^O, $\underline{S_{B}^{θ}} (X) \subseteq X \subseteq \bar{S_{B}^{θ}} (X)$ , where $X (o) = {\begin{matrix} 1, & o \in X; \\ 0, & o \notin X . \end{matrix}$ (2) If B₁ ⊆ B₂ ⊆ A, then ∀ θ ∈ [0, 1], ∀ X ∈ 2^O, $\underline{R_{B_{1}}^{θ}} (X) \subseteq \underline{R_{B_{2}}^{θ}} (X), \bar{R_{B_{2}}^{θ}} (X) \subseteq \bar{R_{B_{1}}^{θ}} (X) .$

(3) If 0 ≤ θ₁ < θ₂ ≤ 1, then ∀ B ⊆ A, ∀ X ∈ 2^O, $\underline{R_{B}^{θ_{1}}} (X) \subseteq \underline{R_{B}^{θ_{2}}} (X), \bar{R_{B}^{θ_{1}}} (X) \subseteq \bar{R_{B}^{θ_{2}}} (X) .$ Proof.

(1) (i) Given B ⊆ A, θ ∈ [0, 1], X ∈ 2^O. Then by Definition 2.9, we can obtain that $\forall o, o^{'} \in O, 0 \leq S_{B}^{θ} (o, o^{'}) \leq 1 .$

This implies that $0 \leq ⋀_{o^{'} \notin X} [1 - S_{B}^{θ} (o, o^{'})] \leq 1 .$

So ∀ o ∈ O, $0 \leq \underline{S_{B}^{θ}} (X) (o) \leq 1 .$

Thus ∀ o ∈ X, $\underline{S_{B}^{θ}} (X) (o) \leq 1 = X (o) .$

Note that ∀ o ∉ X, $\underline{S_{B}^{θ}} (X) (o) = ⋀_{o^{'} \notin X} [1 - S_{B}^{θ} (o, o^{'})] \leq 1 - S_{B}^{θ} (o, o) = 0 .$ Then ∀ o ∉ X, $\underline{S_{B}^{θ}} (X) (o) = 0 \leq 0 = X (o) .$

It follows that ∀ o ∈ O, $\underline{S_{B}^{θ}} (X) (o) \leq X (o) .$

Thus $\underline{S_{B}^{θ}} (X) \subseteq X .$

(ii) Clearly, $\forall o \in O, 0 \leq \bar{S_{B}^{θ}} (X) (o) \leq 1 .$

Then ∀ o ∉ X, $X (o) = 0 \leq \bar{S_{B}^{θ}} (X) (o) .$

Note that ∀ o ∈ X, $\bar{S_{B}^{θ}} (X) (o) = ⋁_{o^{'} \in X} S_{B}^{θ} (o, o^{'}) \geq S_{B}^{θ} (o, o) = 1 .$ Then ∀ o ∈ X, $\bar{S_{B}^{θ}} (X) (o) = 1 \leq 1 = X (o) .$

This implies that ∀ o ∈ O, $X (o) \leq \bar{S_{B}^{θ}} (X) (o) .$

Thus $X \subseteq \bar{S_{B}^{θ}} (X) .$

From the above, $\underline{S_{B}^{θ}} (X) \subseteq X \subseteq \bar{S_{B}^{θ}} (X) .$

(2) (i) Note that B₁ ⊆ B₂ ⊆ A. Then by Proposition 2.10, $R_{B_{1}}^{θ} \subseteq R_{B_{2}}^{θ}$ .

So ∀ o ∈ O, o′ ∉ X, $R_{B_{1}}^{θ} (o, o^{'}) \leq R_{B_{2}}^{θ} (o, o^{'}) .$

This implies that ∀ o ∈ O, $⋀_{o^{'} \notin X} [1 - R_{B_{2}}^{θ} (o, o^{'})] \leq ⋀_{o^{'} \notin X} [1 - R_{B_{1}}^{θ} (o, o^{'})] .$

Thus ∀ o ∈ O, $\underline{R_{B_{2}}^{θ}} (X) (o) \leq \underline{R_{B_{1}}^{θ}} (X) (o) .$

Therefore, $\underline{R_{B_{2}}^{θ}} (X) \subseteq \underline{R_{B_{1}}^{θ}} (X)$ .

(ii) Note that B₁ ⊆ B₂ ⊆ A. Then by Proposition 2.10, $R_{B_{1}}^{θ} \subseteq R_{B_{2}}^{θ}$ .

This implies that ∀ o ∈ O, o′ ∈ X, $R_{B_{1}}^{θ} (o, o^{'}) \leq R_{B_{2}}^{θ} (o, o^{'}) .$

So $⋁_{o^{'} \in X} R_{B_{1}}^{θ} (o, o^{'}) \leq ⋁_{o^{'} \in X} R_{B_{2}}^{θ} (o, o^{'}) .$

Thus $\bar{R_{B_{1}}^{θ}} (X) (o) \leq \bar{R_{B_{2}}^{θ}} (X) (o) .$

Hence $\bar{R_{B_{1}}^{θ}} (X) \subseteq \bar{R_{B_{2}}^{θ}} (X)$ .

(3) (i) Note that 0 ≤ θ₁ < θ₂ ≤ 1. Then by Proposition 2.10, $R_{B}^{θ_{1}} \subseteq R_{B}^{θ_{2}}$ .

So ∀ o ∈ O, o′ ∉ X, $R_{B}^{θ_{1}} (o, o^{'}) \leq R_{B}^{θ_{2}} (o, o^{'}) .$

This implies that ∀ o ∈ O, $⋀_{o^{'} \notin X} [1 - R_{B}^{θ_{2}} (o, o^{'})] \leq ⋀_{o^{'} \notin X} [1 - R_{B}^{θ_{1}} (o, o^{'})] .$

Thus ∀ o ∈ O, $\underline{R_{B}^{θ_{2}}} (X) (o) \leq \underline{R_{B}^{θ_{1}}} (X) (o) .$

Therefore, $\underline{R_{B}^{θ_{2}}} (X) \subseteq \underline{R_{B}^{θ_{1}}} (X)$ .

(ii) Note that B₁ ⊆ B₂ ⊆ A. Then by Proposition 2.10, $R_{B}^{θ_{1}} \subseteq R_{B}^{θ_{2}}$ .

It follows that ∀ o ∈ O, o′ ∈ X, $R_{B}^{θ_{1}} (o, o^{'}) \leq R_{B}^{θ_{2}} (o, o^{'}) .$

So $⋁_{o^{'} \in X} R_{B}^{θ_{1}} (o, o^{'}) \leq ⋁_{o^{'} \in X} R_{B}^{θ_{2}} (o, o^{'}) .$

Thus $\bar{R_{B}^{θ_{1}}} (X) (o) \leq \bar{R_{B}^{θ_{2}}} (X) (o) .$

Hence $\bar{R_{B}^{θ_{1}}} (X) \subseteq \bar{R_{B}^{θ_{2}}} (X)$ .□

Definition 3.3. Let (O, A, d) be an IVDIS with B ⊆ A and θ ∈ [0, 1]. The fuzzy positive region of decision d relative to B is computed by ${POS}_{B}^{θ} (d) = ⋃_{D \in O / d} \underline{S_{B}^{θ}} (D) = ⋃_{i = 1}^{r} \underline{S_{B}^{θ}} (D_{i})$ (15)

Proposition 3.4. Let (O, A, d) be an IVDIS.

(1) If B₁ ⊆ B₂ ⊆ A, then ∀ θ ∈ [0, 1], ${POS}_{B_{2}}^{θ} (d) \subseteq {POS}_{B_{1}}^{θ} (d) .$

(2) If 0 ≤ θ₁ < θ₂ ≤ 1, then ∀ B ⊆ A, ${POS}_{B}^{θ_{1}} (d) \subseteq {POS}_{B}^{θ_{2}} (d) .$

Proof. It follows from Proposition 3.2.□ Definition 3.5. Let (O, A, d) be an IVDIS. Suppose B ⊆ A and θ ∈ [0, 1]. Then θ-degree of dependence of d relative to B is computed by $Γ_{B}^{θ} (d) = \frac{| {POS}_{B}^{θ} (d) |}{n}$ (16)

Proposition 3.6. Let (O, A, d) be an IVDIS.

(1) If B₁ ⊆ B₂ ⊆ A, then ∀ θ ∈ [0, 1], $Γ_{B_{2}}^{θ} (d) \subseteq Γ_{B_{1}}^{θ} (d) .$

(2) If 0 ≤ θ₁ < θ₂ ≤ 1, then ∀ B ⊆ A, $Γ_{B}^{θ_{1}} (d) \subseteq Γ_{B}^{θ_{2}} (d) .$

Proof. It follows from Proposition 3.4.□ Definition 3.7. Let (O, A, d) be an IVDIS. Suppose B ⊆ A, θ ∈ [0, 1] and a ∈ A - B. Then θ-significance of a relative to B about d is computed by ${sig}^{θ} (a, B, d) = Γ_{B}^{θ} (d) - Γ_{B \cup {a}}^{θ} (d)$ (17)

4 FRIC-model

In this section, we establish FRIC-model in an IVDIS with the help of the iterations of fuzzy positive regions and dependency functions.

Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. For i = |B|, |B|+1, ⋯ , |A|, denote $(S_{B}^{θ})_{i} (o, o^{'}) = \frac{1}{i} | B_{{oo}^{'}}^{θ} | (o, o^{'} \in O)$ (18) $\underline{S_{B}^{θ}} (D)_{i} (o) = ⋀_{o^{'} \notin D} [1 - (S_{B}^{θ})_{i} (o, o^{'})] (D \in O / d, o \in O)$ (19) $\bar{S_{B}^{θ}} (D)_{i} (o) = ⋁_{o^{'} \in D} (S_{B}^{θ})_{i} (o, o^{'}) (D \in O / d, o \in O)$ (20) ${POS}_{B}^{θ} (d)_{i} = ⋃_{D \in O / d} \underline{S_{B}^{θ}} (D)_{i}$ (21) $Γ_{B}^{θ} (d)_{i} = \frac{| {POS}_{B}^{θ} (d)_{i} |}{n}$ (22) ${sig}^{θ} (a, B, d)_{i} = Γ_{B}^{θ} (d)_{i} - Γ_{B \cup {a}}^{θ} (d)_{i} (B \subseteq A, a \in A - B)$ (23)

Theorem 4.1.Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Then ∀ D ∈ O/d, ∀ o ∈ O, ∀ i ∈ [|B|, |A|-1], $\underline{S_{B}^{θ}} (D)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} \underline{S_{B}^{θ}} (D)_{i} (o)$ (24)Proof. $\begin{matrix} \underline{S_{B}^{θ}} (D)_{i} (o) & = ⋀_{o^{'} \notin D} (1 - \frac{1}{i} | B_{{oo}^{'}}^{θ} |) \\ = 1 - ⋁_{o^{'} \notin D} \frac{1}{i} | B_{{oo}^{'}}^{θ} | \\ = 1 - \frac{1}{i} ⋁_{o^{'} \notin D} | B_{{oo}^{'}}^{θ} | . \end{matrix}$

Similarly, $\underline{S_{B}^{θ}} (D)_{i + 1} (o) = 1 - \frac{1}{i + 1} ⋁_{o^{'} \notin D} | B_{{oo}^{'}}^{θ} | .$

Then $\begin{matrix} \frac{1}{i + 1} + \frac{i}{i + 1} \underline{S_{B}^{θ}} (D)_{i} (o) \\ = \frac{1}{i + 1} + \frac{i}{i + 1} (1 - \frac{1}{i} ⋁_{o^{'} \notin D} | B_{{oo}^{'}}^{θ} |) \\ = 1 - \frac{1}{i + 1} ⋁_{o^{'} \notin D} | B_{{oo}^{'}}^{θ} | . \end{matrix}$

Thus $\underline{S_{B}^{θ}} (D)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} \underline{S_{B}^{θ}} (D)_{i} (o) .$ □

Theorem 4.2.Let (O, A, d) be an IVDIS. Given B ⊆ A and D ∈ O/d. Then ∀ i ∈ [|B|, |A|-1], $\bar{S_{B}^{θ}} (D)_{i + 1} = \frac{i}{i + 1} \bar{S_{B}^{θ}} (D)_{i}$ (25)Proof. Clearly.□

Theorem 4.3.Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Then ∀ o ∈ O, ∀ i ∈ [|B|, |A|-1], ${POS}_{B}^{θ} (d)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} {POS}_{B}^{θ} (d)_{i} (o)$ (26) noindentProof. By Theorem 4.1, we can obtain that ∀ o ∈ O, ∀ i, $\begin{matrix} {POS}_{B}^{θ} (d)_{i + 1} (o) & = ⋁_{D \in O / d} {\underline{S_{B}^{θ}}}^{(i + 1)} (D) (o) \\ = ⋁_{D \in O / d} (\frac{1}{i + 1} + \frac{i}{i + 1} {\underline{S_{B}^{θ}}}^{(i)} (D) (o)) \\ = \frac{1}{i + 1} + ⋁_{D \in O / d} \frac{i}{i + 1} {\underline{S_{B}^{θ}}}^{(i)} (D) (o) \\ = \frac{1}{i + 1} + \frac{i}{i + 1} ⋁_{D \in O / d} {\underline{S_{B}^{θ}}}^{(i)} (D) (o) . \end{matrix}$

Thus ${POS}_{B}^{θ} (d)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} {POS}_{B}^{θ} (d)_{i} (o) .$ □

Theorem 4.4.Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Then ∀ i ∈ [|B|, |A|-1], $Γ_{B}^{θ} (d)_{i + 1} = \frac{1}{i + 1} + \frac{i}{i + 1} Γ_{B}^{θ} (d)_{i}$ (27)Proof. By Theorem 4.3, $\begin{matrix} | {POS}_{B}^{θ} (d)_{i + 1} | & = \sum_{i = 1}^{n} {POS}_{B}^{θ} (d)_{i + 1} (o_{i}) \\ = \sum_{i = 1}^{n} (\frac{1}{i + 1} + \frac{i}{i + 1} {POS}_{B}^{θ} (d)_{i} (o_{i})) \\ = \frac{1}{i + 1} n + \frac{i}{i + 1} \sum_{i = 1}^{n} {POS}_{B}^{θ} (d)_{i} (o_{i}) \\ = \frac{1}{i + 1} n + \frac{i}{i + 1} | {POS}_{B}^{(i)} (o_{i}) | . \end{matrix}$

Then $\frac{| {POS}_{B}^{θ} (d)_{i + 1} |}{n} = \frac{1}{i + 1} + \frac{i}{i + 1} \frac{| {POS}_{B}^{θ} (d)_{i} |}{n} .$

Thus $Γ_{B}^{θ} (d)_{i + 1} = \frac{1}{i + 1} + \frac{i}{i + 1} Γ_{B}^{θ} (d)_{i} .$ □

Theorem 4.5.Let (O, A, d) be an IVDIS. Suppose B ⊆ A, a ∈ A - B. Then ∀ i ∈ [|B|, |A|-1], ${sig}^{θ} (a, B, d)_{i + 1} = \frac{i}{i + 1} {sig}^{θ} (a, B, d)_{i}$ (28)Proof. By Theorem 4.4, $\begin{matrix} {sig}^{θ} (a, B, d)_{i + 1} \\ = Γ_{B}^{θ} (d)_{i + 1} - Γ_{B \cup {a}}^{θ} (d)_{i + 1} \\ = \frac{1}{i + 1} + \frac{i}{i + 1} Γ_{B}^{θ} (d)_{i} - (\frac{1}{i + 1} + \frac{i}{i + 1} Γ_{B \cup {a}}^{θ} (d)_{i}) \\ = \frac{i}{i + 1} Γ_{B}^{θ} (d)_{i} - \frac{i}{i + 1} Γ_{B \cup {a}}^{θ} (d)_{i} \\ = \frac{i}{i + 1} (Γ_{B}^{θ} (d)_{i} - Γ_{B \cup {a}}^{θ} (d)_{i}) . \end{matrix}$

Thus ${sig}^{θ} (a, B, d)_{i + 1} = \frac{i}{i + 1} {sig}^{θ} (a, B, d)_{i} .$ □

5 An feature selection algorithm in an IVDIS via FRIC-model

This part presents an feature selection algorithm in an IVDIS via FRIC-model.

Definition 5.1. Let (O, A, d) be an IVDIS. Given B ⊆ A and θ ∈ [0, 1]. Then B is referred to as a feature selection subset of A to d, if ${POS}_{B}^{θ} (d) = {POS}_{A}^{θ} (d)$ and a ∈ B, a is ${POS}_{B - {a}}^{θ} (d) \neq {POS}_{A}^{θ} (d)$ .

Algorithm 1 Obtain a feature selection subset in an MSVDIS (FRIC)

REQUIRE An IVDIS (O, A, d), D ∈ O/d, θ, δ.

ENSURE A feature selection subset B.

1: Let n = |O|, B =∅, i = 1;

2: Computer matrix R =(r_oo′) _n×n, where o ∈ D and o′ ∈ O - D, then r_oo′ = 0, otherwise r_oo′ = NaN;

3: for each a ∈ A do

4: foro, o′ ∈ O do

5: Computer $S_{a}^{θ} ({oo}^{'}) = \frac{1}{i} | B_{{oo}^{'}}^{θ} |$ ;

6: end for

7: get each matrix $S_{a}^{θ} = (S_{a}^{θ} ({oo}^{'}))_{n \times n}$ ;

8: end for

9: i = 0;

10: while true do

11: foreach a ∈ A - B

12: $temp = S_{a}^{θ} + R$ ;

13: foreach D ∈ d

14: Compute $\underline{R_{B ⋃ {a_{k}}}^{θ}} (D)_{i} (o) = min {1 - \frac{1}{i + 1} temp}$ ;

15: end for

16: $Γ_{B ⋃ {a}}^{θ} (d)_{i} = \frac{| {POS}_{B ⋃ {a}}^{θ} (d)_{i} |}{n}$ ;

17: end for

18: Find a^* with maximum value $Γ_{B ⋃ {a^{*}}}^{θ} (d)_{i}$ ;

19: Compute $sig (a^{*}, B, d)_{i + 1} = \frac{i}{i + 1} (Γ_{B}^{θ} (d)_{i} - Γ_{B ⋃ {a^{*}}}^{θ} (d)_{i}$ ;∥

20: if sig(a^*, B, d) _i+1 > δ

21: B = B ∪ {a^*};

22: $R = R + S_{a^{*}}^{θ}$

23: i = |B|;∥

24: else

25: break;

26: end if

27: end while

28: return B.

Algorithm 1 has a parameter δ, which is a threshold used primarily to control algorithm termination conditions. When the significance of attribute B exceeds δ, the algorithm continues to be executed, and when it is less than δ, the algorithm terminates. Here we analyze the time and space complexity of algorithm 1. The second step is to calculate a matrix, and the time complexity is O(|O|²). The third to eighth steps need to calculate |A| matrices, so the time complexity is O(|A||O|²). The time complexity of steps 10 through 27 is $O (\frac{(| A | + | B |)}{2} * (| A | - | B | + 1) * | O |^{2}) = O (\frac{| A |^{2} + | A | - | B |^{2} + | B |}{2} * | O |^{2})$ . Therefore, the time complexity of the whole algorithm is $O (| O |^{2} + | A | | O |^{2} + \frac{| A |^{2} + | A | - | B |^{2} + | B |}{2} * | O |^{2}) = O ((| A |^{2} - | B |^{2}) * | O |^{2})$ . The space complexity of the whole algorithm is O(|A||O|²).

In order to verify the effectiveness of the algorithm, 10 datasets were used for testing. The Fish, car and water are real-life interval-valued datasets [5, 14]. The other seven datasets are real-valued from UCI database [12]. See Table 2 for details of each data set. Since some data are not interval values, they need to be transformed into interval values. We adopted the method in the reference [7]. Let (O, A, d) be an IVDIS. ∀ a ∈ A, o_i ∈ O, we compute $a_{i}^{-} = a_{i} - 2 σ_{d}, a_{i}^{+} = a_{i} + 2 σ_{d}$ , where σ_d is the standard deviation for the attribute a_i.

Table 2
Data sets

Data sets Abbreviation Instances Attributes Classes

Fish Fish 12 13 4

Car Car 33 7 4

Water Water 316 48 2

Ionosphere Ion 351 33 2

Spectf Heart SH 267 44 2

ImageSegmentation IS 2310 19 7

Wdbc Wdbc 569 30 2

Diabetic Dia 1151 19 2

Hillvalley Hill 1212 100 2

Waveform Wave 5000 40 3

Data sets	Abbreviation	Instances	Attributes	Classes
Fish	Fish	12	13	4
Car	Car	33	7	4
Water	Water	316	48	2
Ionosphere	Ion	351	33	2
Spectf Heart	SH	267	44	2
ImageSegmentation	IS	2310	19	7
Wdbc	Wdbc	569	30	2
Diabetic	Dia	1151	19	2
Hillvalley	Hill	1212	100	2
Waveform	Wave	5000	40	3

Ten graphs are drawn to show the effectiveness of the proposed algorithm and the size change of the reduction set caused by the change of parameters θ and δ. Figures 1-10 shows the curve change of 10 reduced sets obtained by running the FRM-IV algorithm proposed in this paper. The horizontal axis represents the value of δ, from 0 to 0.18. The vertical axis represents the size of the reduction set. Each picture has four curves, and the parameter θ of each curve is 0.6, 0.7, 0.8 and 0.9 respectively. The blue curve represents parameter θ = 0.6, the orange curve represents θ = 0.7, the Yellow curve represents θ = 0.8, and the purple curve represents θ = 0.9. It can be seen from the image that the size of the reduction set decreases with the gradual increase of variable δ for each curve. When θ takes four different values, the downward trend of each curve is close. This shows that the effectiveness of the algorithm is closely related to δ, but it has little to do with θ. In order to find the optimal values of δ and θ, we establish a mathematical model: z = max(δ, θ). z represents classification accuracy. This mathematical model has no explicit expression and is difficult to be solved by the usual method. Therefore, it is solved by the traversal method. First fix the value of δ, and then take the variable θ from 0.5 to 0.99 in steps of 0.01. Adjust the value of δ and traverse θ again. This is repeated until the parameters δ and θ with the maximum classification accuracy are found.

Fig. 1

Reduct (Fish).

Fig. 2

Reduct (Car).

Fig. 3

Reduct (Water).

Fig. 4

Reduct (Ion).

Fig. 5

Reduct (SH).

Fig. 6

Reduct (IS).

Fig. 7

Reduct (Wdbc).

Fig. 8

Reduct (Dia).

Fig. 9

Reduct (Hill).

Fig. 10

Reduct (Wave).

Next, the accuracy of the algorithm is tested. We randomly select 80% of each data set as the training set and the remaining 20% as the test set. Each data set is trained 10 times, and the accuracy is taken as the average value. Because the current classifiers cannot process real valued datasets, we use the same K-Nearest Neighbor (KNN, K=3) and Probabilistic Neural Network (PNN) classifiers in the reference [7]. The specific formula for measuring the distance between two samples is also shown in the reference [7]. Figures 11-20 show the change of KNN classification accuracy when θ and δ take different values. Figures 21-30 show the accuracy change of classifier PNN when θ and δ take different values. From Figures 11-30, it can be seen that the accuracy curves presented on most images change regularly, and the overall trend increases or decreases gradually. However, it can be clearly seen from Figures 11, 12, 21 and 23, that The four curves on the precision image of datasets fish, car and water vibrate obviously and have no obvious law, indicating that when the δ increases and the reduction set becomes smaller, taking different θ will affect the accuracy of the data set.

Fig. 11

KNN classification accuracy (Fish).

Fig. 12

KNN classification accuracy (Car).

Fig. 13

KNN classification accuracy (Water).

Fig. 14

KNN classification accuracy (Ion).

Fig. 15

KNN classification accuracy (SH).

Fig. 16

KNN classification accuracy (IS).

Fig. 17

KNN classification accuracy (Wdbc).

Fig. 18

KNN classification accuracy (Dia).

Fig. 19

KNN classification accuracy (Hill).

Fig. 20

KNN classification accuracy (Wave).

Fig. 21

PNN classification accuracy (Fish).

Fig. 22

PNN classification accuracy (Car).

Fig. 23

PNN classification accuracy (Water).

Fig. 24

PNN classification accuracy (Ion).

Fig. 25

PNN classification accuracy (SH).

Fig. 26

PNN classification accuracy (IS).

Fig. 27

PNN classification accuracy (Wdbc).

Fig. 28

PNN classification accuracy (Dia).

Fig. 29

PNN classification accuracy (Hill).

Fig. 30

PNN classification accuracy (Wave).

6 Comparative analysis

In this section, we implement comparative analysis.

In order to show the advantages and disadvantages of the proposed algorithm, we choose the other five algorithms to compare with the proposed algorithm in this paper. One is PCA mentioned in the reference [33], whose contribution rate of characteristic root is assumed to be 0.95; one is FSRAR in the reference [23]; one is “DAI" in the reference [7]. The other two algorithms are COEN and DEP from the reference [3]. The computer used for the experiment is Lenovo PC, which is configured with Intel Core i7-4790cpu @ 3.60GHZ and 16G memory. The algorithm was programmed by Matlab 2015b. The four algorithms are tested with 10 datasets given in Table 2. Because the parameters selected by each algorithm are inconsistent, there will be multiple reduction sets calculated by one algorithm. We choose the set with the highest average classification accuracy as the reduction set. The classification accuracy is that each algorithm randomly selects 80% of the data as the sample, the remaining 20% of the data as the training set, runs 10 times and takes the average value.

Table 3 shows the best reduction set under the maximum accuracy of the four algorithms. The ‘raw data’ list shows that the datasets has not been reduced. It can be seen from Table 3 that Hillville dataset has not been reduced under COEN. The reason is that the data set must be α-consistent, but α cannot satisfy the conditions from 0.1 to 0.99, so the algorithms have no effect on the dataset. The reduction effect of DEP and FRM-IV algorithm is the best, and the average set size is 7.1. Figure 31 graphically shows the reduction effects of various algorithms under 10 datasets.

Fig. 31

Reduced set comparison.

Table 3

Number of selected features

Data sets	Raw data	PCA	FSRAR	DAI	COEN	DEP	FRM-IV
Fish	13	2	5	2	1	2	2
Car	7	2	6	2	1	1	2
Water	48	24	11	5	4	4	5
Ion	33	24	8	10	14	9	5
SH	44	23	24	10	15	13	5
IS	19	4	13	7	4	5	8
Wdbc	30	1	27	12	12	11	9
Dia	19	2	5	8	8	6	9
Hill	100	1	15	15	100	9	12
Wave	40	16	35	12	14	11	14
Average	35.3	9.9	14.9	8.3	17.3	7.1	7.1

Table 4 shows the classification accuracy obtained by calculating 10 datasets with KNN classifier, where k = 3. Underlined values indicate the highest accuracy. It can be seen that FRM-IV algorithm does not perform well on fish and car datasets, but shows high accuracy on other eight datasets. Figure 32 shows the KNN classification accuracy curve on 10 datasets. The horizontal axis represents 10 datasets, and the vertical axis represents the classification accuracy. Through figure 32, we can more intuitively see that the data set reduced by algorithm FRM-IV is significantly better than the other three algorithms in classification accuracy.

Table 4

Comparison of classification accuracies of reduced data with KNN

Data sets	Raw data	PCA	FSRAR	DAI	COEN	DEP	FRM-IV
Fish	0.6667±0.0000	0.3100±0.0000	0.7576±0.0000	0.7500±0.0000	0.5567±0.0367	0.7500±0.0000	0.4567±0.0213
Car	0.6447±0.0172	0.6717±0.0000	0.7665±0.0000	0.8788±0.0012	0.6486±0.0230	0.7310±0.0142	0.6887±0.0154
Water	0.7934±0.0269	0.8714±0.0065	0.8890±0.0025	0.7288±0.0056	0.6703±0.0189	0.6818±0.0125	0.7860±0.0140
Ion	0.8466±0.0118	0.8714±0.0065	0.8890±0.0025	0.8926±0.0084	0.8797±0.0049	0.8846±0.0207	0.9012±0.0072
SH	0.7426±0.0148	0.7486±0.0053	0.7303±0.0110	0.7940±0.0120	0.7446±0.0232	0.7510±0.0080	0.7980±0.0063
IS	0.9556±0.0028	0.9053±0.0034	0.9434±0.0025	0.9238±0.0142	0.8576±0.0064	0.6246±0.0036	0.9442±0.0083
Wdbc	0.9236±0.0048	0.8832±0.0065	0.9285±0.0140	0.9162±0.0075	0.9214±0.0094	0.9235±0.0124	0.9347±0.0171
Dia	0.6386±0.0144	0.5895±0.0124	0.5258±0.0010	0.5540±0.0113	0.6308±0.0030	0.6526±0.0036	0.6626±0.0235
Hill	0.5924±0.0126	0.4837±0.0124	0.5866±0.0110	0.5743±0.0098	0.5924±0.0126	0.5924±0.0126	0.5953±0.0215
Wave	0.8100±0.0086	0.8267±0.0070	0.7540±0.0045	0.7640±0.0101	0.7698±0.0069	0.7308±0.0050	0.8178±0.0025
Average	0.7398±0.0114	0.7065±0.0050	0.7638±0.0061	0.7777±0.0080	0.7272±0.0145	0.7322±0.0093	0.7585±0.0137

Fig. 32

Comparison of classification accuracy (KNN).

Table 5 shows the accuracy of the reduced datasets classified by PNN classifier. It can also be seen that FRM-IV algorithm is not good enough in fish and car datasets, and the classification accuracy is not as good as the other three algorithms. However, the algorithm achieves high classification accuracy on the other eight datasets. Figure 33 is a visual display of data in Table 5. In general, the proposed algorithm can reduce the number of attributes and obtain better classification accuracy than the other three algorithms.

Table 5

Comparison of classification accuracies of reduced data with PNN

Data sets	Raw data	PCA	FSRAR	DAI	COEN	DEP	FRM-IV
Fish	0.8333±0.0333	0.2900±0.0000	0.8212±0.0000	0.9167±0.0064	0.4667±0.0400	0.6667±0.0000	0.6667±0.0075
Car	0.6447±0.0172	0.7106±0.0050	0.7886±0.0000	0.9091±0.0350	0.6311±0.0356	0.6161±0.0286	0.5360±0.0290
Water	0.7357±0.0029	0.7097±0.0115	0.7215±0.0120	0.7139±0.0136	0.6423±0.0288	0.6772±0.0040	0.7331±0.0100
Ion	0.8680±0.0259	0.6486±0.0055	0.7416±0.0036	0.8758±0.0041	0.8715±0.0080	0.8570±0.0039	0.8958±0.0056
SH	0.7076±0.0162	0.7508±0.0030	0.7754±0.0130	0.7042±0.0029	0.7026±0.0260	0.7429±0.0046	0.8048±0.0105
IS	0.8037±0.0086	0.8924±0.0025	0.9429±0.0015	0.8572±0.0048	0.8440±0.0109	0.6505±0.0035	0.8655±0.0018
Wdbc	0.9121±0.0051	0.8985±0.0045	0.9287±0.0020	0.9269±0.0033	0.9101±0.0099	0.9190±0.0127	0.9428±0.0046
Dia	0.5419±0.0074	0.6332±0.0025	0.5365±0.0033	0.6017±0.0024	0.5522±0.0111	0.6745±0.0027	0.6968±0.0110
Hill	0.5303±0.0038	0.5214±0.0035	0.5244±0.0065	0.5182±0.0026	0.5303±0.0038	0.5303±0.0038	0.5552±0.0135
Wave	0.7789±0.0065	0.7593±0.0050	0.7850±0.0065	0.8102±0.0046	0.7419±0.0069	0.7791±0.0041	0.8255±0.0075
Average	0.7469±0.0186	0.6815±0.0043	0.7566±0.0048	0.7834±0.0080	0.6893±0.0181	0.7113±0.0068	0.7522±0.0101

Fig. 33

Comparison of classification accuracy (PNN).

In addition to the comparison of algorithm accuracy, the reduction speed of several algorithms is also compared. Because PCA algorithm is the dimensionality reduction of data and does not involve feature selection, it does not compare the time consumption. In this work, the time consumption of five algorithms is compared. Adjust the termination conditions so that the algorithms will not stop until all attributes in the set are reduced. All algorithms start from the time of reduction. Figures 34-41 shows the time consumption comparison of five algorithms under different datasets. It is easy to see from the figure that the algorithm FRM-IV proposed in this paper is faster than the other four algorithms. It takes less time to reduce all attributes. The second fastest is FSRAR, the third fastest is COEN and the slowest is DEP. When a dataset with a large number of attributes is encountered, such as Water, Hill and Wave, DEP algorithm does not perform well, and the time consumption curve shows a straight line trend. In general, algorithm FRM-IV in this paper is better than other algorithms in terms of reduction set size, classification accuracy and operation speed.

Fig. 34

Time consuming comparison (Water).

Fig. 35

Time consuming comparison (IS).

Fig. 36

Time consuming comparison (SH).

Fig. 37

Time consuming comparison (IS).

Fig. 38

Time consuming comparison (Wdbc).

Fig. 39

Time consuming comparison (Dia).

Fig. 40

Time consuming comparison (Hill).

Fig. 41

Time consuming comparison (Wave).

7 Statistical tests of different algorithms

In this section, Friedman test and its post-hoc test should be performed for illustrating the differences of classification accuracies among the proposed algorithms statistically.

Friedman test [10, 32] is an non-parametric test basing on the following basic ideas: firstly, the algorithms are sorted in ascending order by their comparing index values, and the ranks of each algorithm under different datasets are obtained; secondly, calculating the average ranks of different algorithms respectively and the average rank differences between each pairs of algorithms; at last, comparing the average rank differences with a corresponding critical value of the test.

The specific inspection process is as follows. In a test of m datasets and n algorithms, it ranks the n algorithms by their classification accuracies respectively. For the i-th data set, the algorithm with the highest classification accuracy ranks 1st, the algorithm with the second highest classification accuracy ranks 2nd, and the algorithm with the third highest classification accuracy ranks 3rd,⋯. If the two algorithms have the same classification accuracy, their ranks are averaged. Denoting the rank of the classification accuracy of the j-th algorithm under the i-th data set as a_ij, then the average rank of the j-th algorithm can be calculated as $a_{j} = \frac{1}{m} \sum_{i = 1}^{m} a_{ij}$ . The significance of the difference can be tested by the statistic $F = \frac{(m - 1) χ^{2}}{m (n - 1) - χ^{2}}$ (29) where $χ^{2} = \frac{12 m}{n (n + 1)} \sum_{j = 1}^{n} a_{j}^{2} - 3 m (n + 1)$ . The statistic F is Fisher distributed with n - 1 and (m - 1)(n - 1) degrees of freedom. If the statistic value of F is bigger than the critical value of F_α(n - 1, (m - 1)(n - 1)) which depends on F(n - 1, (m - 1)(n - 1)) distribution and the given significance levels α, it indicates there are differences among the algorithms. This means a post-hoc test, such as Nemenyi test, is needed. Nemenyi test can be used to compare all classification algorithms against each other. It comes up with a critical distance CD_α, which can be denoted as ${CD}_{α} = q_{α} \sqrt{\frac{n (n + 1)}{6 m}}$ (30) where q_α is a critical value of the Tukey distribution with n degrees of freedom. If the differences of the average ranks between two algorithms exceeds the critical distance CD_α, then the classification performances of the two algorithms are significantly different.

(1) Ranking the six algorithms by their classification accuracies on the ten datasets, respectively (see Tables 6 and 7).

Table 6

Ranks of the six algorithms on different datasets with KNN classifier

Data sets	PCA	FSRAR	DAI	COEN	DEP	FRM-IV
Fish	6	1	2.5	4	2.5	5
Car	5	2	1	6	3	4
Water	2	1	4	6	5	3
Ion	6	3	2	5	4	1
SH	4	6	2	5	3	1
IS	4	2	3	5	6	1
Wdbc	6	2	5	4	3	1
Dia	4	6	5	3	2	1
Hill	5	3	4	2.5	2.5	1
Wave	1	5	4	3	6	2

Table 7

Ranks of the six algorithms on different datasets with PNN classifier

Data sets	PCA	FSRAR	DAI	COEN	DEP	FRM-IV
Fish	6	2	1	5	3.5	3.5
Car	3	2	1	4	5	6
Water	4	2	3	6	5	1
Ion	6	5	2	3	4	1
SH	3	2	5	6	4	1
IS	2	1	4	5	6	3
Wdbc	6	2	3	5	4	1
Dia	3	6	4	5	2	1
Hill	5	4	6	2.5	2.5	1
Wave	5	3	2	6	4	1

(2) Conducting Friedman test. Under the six algorithms and the ten datasets, F follows the distribution F(5.45). Given the significance level α = 0.05, the critical value F_α(5.45) =2.422. χ² = 26.157, F = 9.873, F > F_α(5.45), this means the performance of the six algorithms are significantly different under the KNN and PNN classifier.

(3) To check out which algorithms perform better, a post-hoc test, that is Nemenyi test, is introduced. The results of Nemenyi test can be visually represented in Figure 42. The top line in the figure is the interval of CD_α, and the second line is the axis on which we plot the average ranks of different algorithms. The groups of algorithms that are not significantly different are connected by a horizontal line. Above all, the following results are obtained:

Fig. 42

The Nemenyi test result with KNN and PNN.

a) The classification accuracy of FRM-IV is significantly higher than that of PCA, COEN and DEP;

b)The classification accuracy of FSRAR is significantly higher than that of PCA, COEN and DEP;

c)The classification accuracy of DAI is significantly higher than that of PCA, COEN and DEP;

d) There is no significant difference between the classification accuracy of FRM-IV, FSRAR and DAI;

e) There is no significant difference between the classification accuracy of PCA, COEN and DEP.

8 Conclusion and future work

In this paper, some attribute evaluation functions that express the classification ability of feature selection have been put forward. Based on these functions, FRIC-model in an IVDIS has been established. An feature selection algorithm in an IVDIS has been presented by using this model. Experiments have been carried out and statistical tests have been used to evaluate the performance of the presented algorithm. Experimental results explain that the presented algorithm is more effective than some existing algorithms, and does not occupy too much memory. In the future, we will consider the application of FRIC-model.

Footnotes

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by National Natural Science Foundation of Guangxi (2022GXNSFAA035552, 2021GXNSFAA220114).

References

Chen

L.L.

, Chen

D.G.

and Wang

, Fuzzy kernel alignment with application to attribute reduction of heterogeneous data, IEEE Transactions on Fuzzy Systems 27(7) (2019), 1469–1478.

Cornelis

, Jensen

, Martin

G.H.

and Slezak

, Attribute selection with fuzzy decision reducts, Information Sciences 180 (2010), 209–224.

Chen

Y.Y.

, Li

Z.W.

and Zhang

G.Q.

, Attribute reduction in an incomplete interval-valued decision information system, IEEE Access 9 (2021), 64539–64557.

Chen

D.G.

, Zhang

, Zhao

S.Y.

, Hu

Q.H.

and Zhu

P.F.

, A novel algorithm for finding reducts with fuzzy rough sets, IEEE Transaction on Fuzzy Systems 20 (2012), 385–389.

De Carvalho

F.D.A.T.

De Souza

R.M.C.R.

Chavent

Lechevallier

, Adaptive hausdorff distances and dynamic clustering of symbolic interval data, Pattern Recognition Letters 27(3) (2006), 167–179.

Dubois

and Prade

, Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems 17 (1990), 191–208.

Dai

J.H.

and Yan

Y.J.

, Dominance-based fuzzy rough set approach for incomplete interval-valued data, Journal of Intelligent & Fuzzy Systems 34 (2018), 423–436.

Dai

J.H.

, Wang

W.T.

, Tian

H.W.

and Liu

, Attribute selection based on a new conditional entropy for incomplete decision systems, Knowledge-Based Systems 39 (2013), 207–213.

Dai

J.H.

, Wang

W.T.

, Xu

and Tian

H.W.

, Uncertainty measurement for interval-valued decision systems based on extended conditional entropy, Knowledge-Based Systems 27 (2012), 443–450.

10.

Demsar

, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research 7 (2006), 1–30.

11.

Facchinetti

, Ricci

and Muzzioli

, Note on ranking fuzzy triangular numbers, International Journal of Intelligent Systems 13 (1998), 613–622.

12.

Frank

, Asuncion

UCI Machine Learning Repository. [Online]. Available:, http://archive.ics.uci.edu/ml.

13.

Gadekallu

T.R.

and Gao

X.Z.

, An efficient attribute reduction and fuzzy logic classifier for heart disease and diabetes prediction, Recent Advances in Computer Science and Communications 14(1) (2021), 158–165.

14.

Hedjazi

, Aguilar-Martin

and Lann

M.V.L.

, Similaritymargin based feature selection for symbolic interval data, Pattern Recognition Letters 32(4) (2011), 578–585.

15.

Huang

Y.Y.

, Li

T.R.

, Luo

, Fujita

and Horng

S.J.

, Dynamic variable precision rough set approach for probabilistic set-valued information systems, Knowledge-Based Systems 122 (2017), 131–147.

16.

, Tsang

E.C.C.

, Guo

Y.T.

, Xu

W.H.

Fast and robust attribute reduction based on the separability in fuzzy decision systems, IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2020.3040803.

17.

, Yu

, Pedrycz

and Chen

, Kernelized fuzzy rough sets and their applications, IEEE Transactions on Knowledge and Data Engineering 23 (2011), 1649–1667.

18.

Weston

, Andre

and Bernhard

, Use of the zero-norm with linear models and kernel methods, Journal of Machine Learning Research 3 (2003), 1439–1461.

19.

Jia

X.Y.

, Rao

, Shang

and Li

T.J.

, Similarity-based attribute reduction in rough set theory: a clustering perspective, International Journal of Machine Learning and Cybernetics 11 (2020), 1047–1060.

20.

Kohavi

and John

G.H.

, Wrappers for feature subset selection, Artificial Intelligence 97(1-2) (1997), 273–324.

21.

Kira

, Rendell

L.A.

The feature selection problem: traditional methods and a newalgorithm. Proceedings of the 10th National Conference on Artificial Intelligence. Menlo Park, USA: AAAl Press (1992), 129–134.

22.

Y.L.

, Chen

, Lv

M.Q.

, Li

Y.J.

and Li

Y.Y.

, Extracting semantic event information from distributed sensing devices using fuzzy sets, Fuzzy Sets and Systems 337 (2018), 74–92.

23.

Liu

X.F.

, Dai

J.H.

, Chen

J.L.

, Zhang

C.C.

A fuzzy α-similarity relation-based attribute reduction approach in incomplete interval-valued information systems, Applied Soft Computing (2021), 107593.

24.

Z.W.

, Huang

, Liu

X.F.

, Xie

N.X.

and Zhang

G.Q.

, Information structures in a covering information system, Information Sciences 507 (2020), 449–471.

25.

Z.W.

, Zhang

P.F.

, Ge

, Xie

N.X.

, Zhang

G.Q.

, Wen

C.F.

Uncertainty measurement for a fuzzy relation information system, IEEE Transactions on Fuzzy Systems 27(12) (2019), 2338–2352.

26.

Moresi

N.N.

and Yankout

M.M.

, Axiomatic for fuzzy rough sets, Fuzzy Sets and Systems 100 (1998), 327–342.

27.

Navarrete

, Viejo

and Cazorla

, Color smoothing for RGBD data using entropy information, Applied Soft Computing 46 (2016), 361–380.

28.

Navara

M'aria

Navarov'a , Principles of inclusion and exclusion for interval-valued fuzzy sets and IF-sets, Fuzzy Sets and Systems 324 (2017), 60–73.

29.

Nakahara

, Sasaki

and Gen

, On the linear programming problems with set coefficients, Computers and Industrial Engineering 23 (1992), 301–304.

30.

Pawlak

, Rough sets, International Journal of Computer and Information Science 11 (1982), 341–356.

31.

Pawlak

Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, 1991.

32.

Salvador

and Herrera

, An extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all pairwise comparisons, Journal of Machine Learning Research 9(12) (2008), 2677–2694.

33.

Thippa

R.G.

, Praveen

K.R.M.

, Lakshmanna

, Kaluri

and Baker

, Analysis of dimensionality reduction techniques on big data, IEEE Access 8 (2020), 54776–54788.

34.

Wang

Y.B.

, Chen

X.J.

and Dong

, Attribute reduction via local conditional entropy, International Journal of Machine Learning and Cybernetics 10(12) (2019), 3619–3634.

35.

Wang

C.Z.

, Qi

, Shao

M.W.

, Hu

Q.H.

, Chen

D.G.

, Qian

Y.H.

and Lin

Y.J.

, A fitting model for feature selection with fuzzy rough sets, IEEE Transaction on Fuzzy Systems 25 (2016), 741–753.

36.

Wang

C.Z.

, Wang

, Shao

M.W.

, Qian

Y.H.

and Chen

D.G.

, Fuzzy rough attribute reduction for categorical data, IEEE Transactions on Fuzzy Systems 28(5) (2020), 818–830.

37.

Xie

N.X.

, Liu

, Li

Z.W.

and Zhang

G.Q.

, New measures of uncertainty for an interval-valued information system, Information Sciences 470 (2019), 156–174.

38.

Xie

N.X.

, Li

Z.W.

, Wu

W.Z.

and Zhang

G.Q.

, Fuzzy information granular structures: A further investigation, International Journal of Approximate Reasoning 114 (2019), 127–150.

39.

Yao

Y.Y.

, Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111 (1998), 239–259.

40.

Yuan

, Chen

H.M.

, Li

T.R.

, Yu

, Sang

B.B.

and Luo

, Unsupervised attribute reduction for mixed data based on fuzzy rough sets, Information Sciences 572 (2021), 67–87.

41.

Zadeh

L.A.

, Fuzzy sets, Information and Control 8 (1965), 338–356.

42.

Zhang

Y.Y.

, Li

T.R.

, Luo

, Zhang

J.B.

and Chen

H.M.

, Incremental updating of rough approximations in interval-valued information systems under attribute generalization, Information Sciences 373 (2016), 461–475.

43.

Zhang

G.Q.

, Li

Z.W.

, Wu

W.Z.

, Liu

X.F.

and Xie

N.X.

, Information structures and uncertainty measures in a fully fuzzy information system, International Journal of Approximate Reasoning 101 (2018), 119–149.

44.

Zhao

and Qin

K.Y.

, Mixed feature selection in incomplete decision table, Knowledge-Based Systems 57 (2014), 181–190.

Feature selection for interval-valued data via FRIC -model

Abstract

Keywords

1 Introduction

2 Preliminaries

Table 2 Data sets Data sets Abbreviation Instances Attributes Classes Fish Fish 12 13 4 Car Car 33 7 4 Water Water 316 48 2 Ionosphere Ion 351 33 2 Spectf Heart SH 267 44 2 ImageSegmentation IS 2310 19 7 Wdbc Wdbc 569 30 2 Diabetic Dia 1151 19 2 Hillvalley Hill 1212 100 2 Waveform Wave 5000 40 3

Footnotes

Acknowledgments

References