Attribute reduction for multiset-valued data based on FRIC -model

Abstract

A heart attack is a common cause of death globally. It can be treated successfully through a simple and accurate diagnosis. Getting the right diagnosis at the right time is very important for the treatment of heart failure. Currently, the conventional method of diagnosing heart disease is not reliable. Machine learning is a type of artificial intelligence that can be used to analyze the data collected by sensors. Data mining is another type of technology that can be utilized in the healthcare industry. These techniques help predict heart disease based on various factors. We developed a prediction and recommendation model aimed at predicting heart disease using the Optimized Deep Belief Network. It does so by taking into account the various features of the heart disease UCI and Stalog database. Finally, the proposed method classifies healthy people and people with heart illness with an accuracy of 97.91%.

Keywords

MSVDIS Evaluation function

1 Introduction

Rough set theory (RST), brought forward by Pawlak [21, 22], can effectively deal with the uncertainty of an information system (IS). The application of RST is mostly related to ISs [12 , 36].

Attribute reduction (A-reduction) as a technology of data ming is to use the attribute evaluation function to evaluate the attributes, and filter the redundant attributes in the conditional attribute set, so as to reduce the data dimension and to improve the generalization performance and efficiency of algorithms. The core step of A-reduction is to construct the attribute evaluation function. This function can be used to select key representative attributes from high-dimensional data. So far, there have been many outstanding results. Yao et al. [33] brought up class-specific A-reduction in RST. Lang et al. [14] gave A-reduction in dynamic fuzzy covering information systems based on homomorphism. Li et al. [15] investigated A-reduction for heterogeneous data based on information entropy. Trabelsi et al. [26] put forward heuristic method for A-reduction from partially uncertain data using rough sets.

As the extension of RST, fuzzy rough sets (FR-sets) can effectively deal with the uncertainty and ambiguity in the actual scene, and has been successfully applied to A-reduction. Wang et al. [29] proposed a fitting model for A-reduction based on FR-sets. Chen et al. [4] presented a novel A-reduction algorithm based on FR-sets. Dai et al. [6] presented maximal-discernibility-pair-based approach to A-reduction based on FR-sets. Wang et al. [28] constructed A-reduction based on FR-sets using distance measures.

Usually, RST deals with MV-data in the following way. Similarity measure in an MSVDIS is introduced in the sense of the similarity between information values which is fed back to the object set. Then, the tolerance relations on the object set are constructed. But these kinds of tolerance relations have deficiencies when they are used in fuzzy rough computation.

Given an MSVDIS and a threshold λ. This paper defines the fuzzy symmetry relations on the object set based on the number of attributes about the similarity between information values which is not less than λ, where λ controls the similarity between information values. This leads to FR-sets for MV-data. Fuzzy positive regions and fuzzy dependency functions associated with the FR-sets are constructed. Thus, fuzzy rough iterative computation model (FRIC-model) for MV-data is obtained. A-reduction algorithms in an MSVDIS based on FRIC-model are given. Finally, the experimental results show that the given algorithm is more effective than some existing algorithms.

This paper is organized as follows. Section 2 recalls some notions about FR-sets, multisets and probability distribution sets. Section 3 introduces an MSVDIS. Section 4 introduces fuzzy symmetric relations in an MSVDIS. Section 5 defines some evaluation functions in an MSVDIS. Section 6 establishes FRIC-model. Section 7 presents A-reduction algorithm based on FRIC-model. Section 8 carries out numerical experiments to show the feasibility of the presented algorithm. Section 9 executes Friedman test and post-hoc test to further verify the stability of classification of the presented algorithm. Section 10 concludes this paper.

2 Preliminaries

In this section, we give an overview of FR-sets, multisets and probability distribution sets.

Throughout this paper, O signifies a finite set, 2^O means the family of all subsets of O and |X| denotes the cardinality of X ∈ 2^O. Put $O = {o_{1}, o_{2}, \dots, o_{n}} .$

2.1 FR-sets

Suppose that R is a fuzzy relation on O. Then R may be represented by M(R) =(R(o_i, o_j)) _n×n, where R(o_i, o_j) ∈ I means the similarity between two objects o_i and o_j. the lower and upper fuzzy rough approximations of the fuzzy set F are defined as $\underline{R} (F) (o) = ⋀_{o^{'} \in O} ((1 - R (o, o^{'})) \lor F (o^{'})), \forall o \in O;$ $\bar{R} (F) (o) = ⋁_{o^{'} \in O} (R (o, o^{'}) \land F (o^{'})), \forall o \in O .$

Fuzzy-rough set model is the generalization of classical rough set model.

2.2 Multisets and probability distribution sets

Definition 2.1 ([10]). Given the universe X. A multiset or bag M drawn from X is represented by a function count C_M defined as a map C_M : X → N ∪ {0}.

In order to facilitate, $C_{M} (x) is also denoted by M (x) (x \in X) .$

If M(x) = m, then x appears m times in M, we denoted it by m/x ∈ M or x ∈ ^mM.

Given X = {x₁, x₂, …, x_s}. If M(x_i) = m_i (i = 1, 2, ⋯ , s), then M is denoted by {m₁/x₁, m₂/x₂, ⋯ , m_n/x_s}, i.e., $M = {m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{s} / x_{s}} .$

Definition 2.2 ([10]). Given the universe X. Let A and B be two multisets drawn from X. Then the following are defined:

(1) A = B⇔ A(x) = B(x) (x ∈ X) ;

(2) A⊑ B ⇔ A(x) ≤ B(x) (x ∈ X) ;

(3) (A⊔ B)(x) = max {A(x) , B(x)} (x ∈ X) ;

(4) (A ⊔ B)(x) = min {A(x) , B(x)} (x ∈ X) .

Definition 2.3 ([20]). Given X = {x₁, x₂, ⋯ , x_s}. Suppose $P = {\frac{x_{1}, x_{2}, \dots, x_{s}}{p_{1}, p_{2}, \dots, p_{s}}} .$ If for each i, 0 ≤ p_i ≤ 1 and $\sum_{i = 1}^{n} p_{i} = 1,$ then P is referred to as a probability distribution set over X.

P can be seen as a map P : X → [0, 1], i.e., ∀ i, P(x_i) = p_i.

Definition 2.4. Given X = {x₁, x₂, ⋯ , x_s}. Suppose $P = {\frac{x_{1}, x_{2}, \dots, x_{s}}{p_{1}, p_{2}, \dots, p_{s}}}$ is a probability distribution set over X. Then P is called a rational probability distribution set over X, if ∀ i, p_i is a rational number; otherwise, P is called an irrational probability distribution set over X.

Definition 2.5. Given X = {x₁, x₂, ⋯ , x_s} and Y = {y₁, y₂, ⋯ , y_t}. Suppose that P and Q are probability distribution sets over X and Y, respectively. Denote $P = {\frac{x_{1}, x_{2}, \dots, x_{s}}{p_{1}, p_{2}, \dots, p_{s}}}, Q = {\frac{y_{1}, y_{2}, \dots, y_{t}}{q_{1}, q_{2}, \dots, q_{t}}} .$

(1) If s = t and ∀ i, x_i = y_i, p_i = q_i, then P and Q are said to be equal. Denote P = Q.

(2) If s = t and ∀ i, x_i = y_i, then P and Q are said to be approximately equal. Denote P ≃ Q.

Obviously, P = Q ⇒ P ≃ Q.

Definition 2.6 ([20]).Given P and Q are two probability distribution sets overs X. Denote $P = {\frac{x_{1}, x_{2}, \dots, x_{s}}{p_{1}, p_{2}, \dots, p_{s}}}, Q = {\frac{x_{1}, x_{2}, \dots, x_{s}}{q_{1}, q_{2}, \dots, q_{s}}} .$ Then Hellinger distance between P and Q is defined as $HD (P, Q) = \frac{1}{\sqrt{2}} \sqrt{\sum_{i = 1}^{s} (\sqrt{p_{i}} - \sqrt{q_{i}})^{2}} .$

Obviously, $HD (P, Q) = \sqrt{1 - \sum_{i = 1}^{s} \sqrt{p_{i} q_{i}}} .$

Definition 2.7. Given X = {x₁, x₂, ⋯ , x_s}. Suppose that M = {m₁/x₁, m₂/x₂, …, m_s/x_s} is a multiset drawn from X. Put $P_{M} = {\frac{x_{1}, x_{2}, \dots, x_{s}}{p_{1}, p_{2}, \dots, p_{s}}},$ where $p_{i} = \frac{m_{i}}{m_{1} + m_{2} + \dots + m_{s}} (i = 1, 2, \dots, s)$ . Then P_M is a rational probability distribution set over X. We call P_M the probability distribution set induced by M.

3 MSVDISs

Definition 3.1.Suppose that O is a finite set of objects or samples and A is a finite set of attributes. Then (O, A) is referred to as an information system (IS), if ∀ a ∈ A, a is able to decide a information function f_a : O → V_a, where V_a = {f_a(o) : o ∈ O}.

(O, C, d) is named as a decision information system (DIS), if (O, C ∪ {d} is an IS, where C and d are a set of conditional attributes and a decision attribute.

Suppose that * means a missing value. If ∃ a ∈ C, * ∈ V_a, but * ∉ V_d = {f_d(o) : o ∈ O}, then a DIS (O, C, d) is an incomplete decision information system (IDIS).

Let (O, C, d) be an IDIS. ∀ a ∈ C, indicate $V_{a}^{*} = V_{a} - {f_{a} (o) : f_{a} (o) = *} .$

Example 3.2.Suppose that a₁, a₂, a₃ and d express “Headache", “Muscle pain", “Temperature" and “Symptom", respectively. Put O = {o₁, o₂, ⋯ , o₉}, C = {a₁, a₂, a₃}. Let d be a decision attribute. Then, Table 1 is an IDIS (O, C, d).

Table 1
An IDIS

O Headache (a₁) Muscle pain (a₂) Temperature (a₃) Symptom (d)

o ₁ Sick Yes High Flu

o ₂ Sick Yes Low Flu

o ₃ Middle * Normal Flu

o ₄ No Yes Normal Flu

o ₅ * Yes Normal Rhinitis

o ₆ Middle No * Rhinitis

o ₇ No No Low Health

o ₈ No * * Health

o ₉ * Yes Low Health

O	Headache (a₁)	Muscle pain (a₂)	Temperature (a₃)	Symptom (d)
o ₁	Sick	Yes	High	Flu
o ₂	Sick	Yes	Low	Flu
o ₃	Middle	*	Normal	Flu
o ₄	No	Yes	Normal	Flu
o ₅	*	Yes	Normal	Rhinitis
o ₆	Middle	No	*	Rhinitis
o ₇	No	No	Low	Health
o ₈	No	*	*	Health
o ₉	*	Yes	Low	Health

Example 3.3.(Continued from Example 3.2) $V_{a_{1}}^{*} = {Sick (S), Middle (M), No (N)},$ $V_{a_{2}}^{*} = {Yes (Y), No (N)},$ $V_{a_{3}}^{*} = {High (H), Low (L), Normal (N)},$ $V_{a_{4}}^{*} = V_{a_{4}} = {Flu, Rhinitis, Health} .$

Definition 3.4. Let (O, C, d) be an DIS with O = {o₁, o₂, ⋯ , o_n}. Then the pair (O, C, d) is said to be a multiset-valued decision information system (MSVDIS), if for each a ∈ C, f_a(o₁), f_a(o₂), ⋯ f_a(o_n) are multisets drawn from the same set.

If P ⊆ C, then (O, P, d) is known as a subsystem of (O, C, d).

Definition 3.5. Let (O, C, d) be an IDIS with O = {o₁, o₂, ⋯ , o_n}. Given a ∈ C. Denote $V_{a}^{*} = {x_{1}, x_{2}, \dots, x_{s}} .$ Then $V_{a}^{*}$ is an ordinary set.

Let M_a = {f_a(o₁) , f_a(o₂) , ⋯ , f_a(o_n)} - {*} be a multiset (i.e., if f_a(o₁) = f_a(o₃), then it is allowed).

Suppose that ∀ i, m_i expresses the number of occurrences of x_i in M_a. Then $M_{a} = {m_{1} / x_{1}, m_{2} / x_{2}, \dots, m_{s} / x_{s}} .$

If f_a(o) =*, then f_a(o) is replaced by {m₁/x₁, m₂/x₂, ⋯ , m_s/x_s}; if f_a(o) = x_j, then f_a(o) is replaced by ${0 / x_{1}, \dots, 0 / x_{j - 1}, 1 / x_{j}, 0 / x_{j + 1}, \dots, 0 / x_{s}} .$

After this treatment, (O, C, d) is an MSVDIS. We call it the MSVDIS induced by the IDIS (O, C, d).

The following example shows that an IIS induces an MSVDIS. Thus, an MSVDIS can be teated like an IIS.

Example 3.6. (Continued from Example 3.2) The following MSVDIS is induced by an IDIS (O, C, d) given in Table 1.

Table 2

An MSVDIS (O, A)

O	Headache (a₁)	Muscle pain (a₂)	Temperature (a₃)	Symptom (d)
o ₁	{1/S, 0/M, 0/N}	{1/Y, 0/N}	{1/H, 0/N, 0/L}	Flu
o ₂	{1/S, 0/M, 0/N}	{1/Y, 0/N}	{0/H, 0/N, 1/L}	Flu
o ₃	{0/S, 1/M, 0/N}	{5/Y, 2/N}	{0/H, 1/N, 0/L}	Flu
o ₄	{0/S, 0/M, 1/N}	{1/Y, 0/N}	{0/H, 1/N, 0/L}	Flu
o ₅	{2/S, 2/M, 3/N}	{1/Y, 0/N}	{0/H, 1/N, 0/L}	Rhinitis
o ₆	{0/S, 1/M, 0/N}	{0/Y, 1/N}	{1/H, 3/N, 3/L}	Rhinitis
o ₇	{0/S, 0/M, 1/N}	{0/Y, 1/N}	{0/H, 0/N, 1/L}	Health
o ₈	{0/S, 0/M, 1/N}	{5/Y, 2/N}	{1/H, 3/N, 3/L}	Health
o ₉	{2/S, 2/M, 3/N}	{1/Y, 0/N}	{0/H, 0/N, 1/L}	Health

4 Fuzzy symmetric relations in an MSVDIS

Definition 4.1.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then the fuzzy relation on O can be defined as $R_{P}^{λ} (o, o^{'}) = \frac{1}{θ} | {a \in P : HD (P_{f_{a} (o)}, P_{f_{a} (o^{'})}) \leq λ} |,$ where θ ∈ [|P|, + ∞) is a constant, for instance, θ = |C|.

For convenience, denote $P_{{oo}^{'}}^{λ} = {a \in P : HD (P_{f_{a} (o)}, P_{f_{a} (o^{'})}) \leq λ} .$ Then $R_{P}^{λ} (o, o^{'}) = \frac{1}{θ} | P_{{oo}^{'}}^{λ} | .$

Clearly, $R_{P}^{λ}$ is a fuzzy symmetric relation on O (i.e, $\forall o, o^{'} \in O, R_{P}^{λ} (o, o^{'}) = R_{P}^{λ} (o^{'}, o)$ ), which is called the fuzzy symmetric relation induced by the subsystem (O, P, d).

If θ = |P|, then $R_{P}^{λ}$ is a fuzzy tolerance relation on O (i.e, $\forall o \in O, R_{P}^{λ} (o, o) = 1$ , and $\forall o, o^{'} \in O, R_{P}^{λ} (o, o^{'}) = R_{P}^{λ} (o^{'}, o)$ ).

Proposition 4.2.Let (O, C, d) be an MSVDIS.

(1) If P₁ ⊆ P₂ ⊆ C, then ∀ λ ∈ [0, 1], $R_{P_{1}}^{λ} \subseteq R_{P_{2}}^{λ}$ .

(2) If 0 ≤ λ₁ < λ₂ ≤ 1, then ∀ P ⊆ C, $R_{P}^{λ_{2}} \subseteq R_{P}^{λ_{1}}$ .

Proof. (1) By Definition 4.1, $R_{P_{1}}^{λ} (o, o^{'}) = \frac{1}{θ} | (P_{1})_{{oo}^{'}}^{λ} |, R_{P_{2}}^{λ} (o, o^{'}) = \frac{1}{θ} | (P_{2})_{{oo}^{'}}^{λ} | .$

Since P₁ ⊆ P₂, we have ∀ o, o′ ∈ O, $(P_{1})_{{oo}^{'}}^{λ} \subseteq (P_{2})_{{oo}^{'}}^{λ} .$

So ∀ o, o′ ∈ O, $R_{P_{1}}^{λ} (o, o^{'}) \leq R_{P_{2}}^{λ} (o, o^{'})$ .

Thus, $R_{P_{1}}^{λ} \subseteq R_{P_{2}}^{λ}$ .

(2) By Definition 4.1, $R_{P}^{λ_{1}} (o, o^{'}) = \frac{1}{θ} | P_{{oo}^{'}}^{λ_{1}} |, R_{P}^{λ_{2}} (o, o^{'}) = \frac{1}{θ} | P_{{oo}^{'}}^{λ_{2}} | .$

Since 0 ≤ λ₁ < λ₂ ≤ 1, we have ∀ o, o′ ∈ O, $P_{{oo}^{'}}^{λ_{2}} \subseteq P_{{oo}^{'}}^{λ_{1}} .$

So ∀ o, o′ ∈ O, $R_{P}^{λ_{2}} (o, o^{'}) \leq R_{P}^{λ_{1}} (o, o^{'})$ .

Thus, $R_{P}^{λ_{2}} \subseteq R_{P}^{λ_{1}}$ . □

Proposition 4.2 illustrates that $R_{P}^{λ}$ monotonically increases and decreases with respect to P and λ, respectively. The monotonicity of $R_{P}^{λ}$ makes it easy for us to deal with $R_{P}^{λ}$ .

5 Some evaluation functions in an MSVDIS

This part presents some evaluation functions in an MSVDIS.

Definition 5.1.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Define $\underline{R_{P}^{λ}} (X) (o) = ⋀_{o^{'} \notin X}, \forall o \in O;$ $\bar{R_{P}^{λ}} (X) (o) = ⋁_{o^{'} \in X} R_{P}^{λ} (o, o^{'}), \forall o \in O .$ Then $\underline{R_{P}^{λ}} (X)$ and $\bar{R_{P}^{λ}} (X)$ are called the lower and upper fuzzy approximations of X ∈ 2^O, respectively.

Proposition 5.2.Let (O, C, d) be an MSVDIS. Then the following properties hold. (1) ∀ P ⊆ C, ∀ λ ∈ [0, 1], ∀ X ∈ 2^O, $\underline{R_{P}^{λ}} (X) \subseteq X \subseteq \bar{R_{P}^{λ}} (X)$ , where $X (o) = {\begin{matrix} 1, & o \in X; \\ 0, & u \notin X . \end{matrix}$ (2) If P₁ ⊆ P₂ ⊆ C, then ∀ λ ∈ [0, 1], ∀ X ∈ 2^O, $\underline{R_{P_{1}}^{λ}} (X) \subseteq \underline{R_{P_{2}}^{λ}} (X), \bar{R_{P_{2}}^{λ}} (X) \subseteq \bar{R_{P_{1}}^{λ}} (X) .$

(3) If 0 ≤ λ₁ < λ₂ ≤ 1, then ∀ P ⊆ C, ∀ X ∈ 2^O, $\underline{R_{P}^{λ_{1}}} (X) \subseteq \underline{R_{P}^{λ_{2}}} (X), \bar{R_{P}^{λ_{1}}} (X) \subseteq \bar{R_{P}^{λ_{2}}} (X) .$

Proof. (1) (i) Given P ⊆ C, λ ∈ [0, 1], X ∈ 2^O. Then by Definition 4.1, we can obtain that $\forall o, o^{'} \in O, 0 \leq R_{P}^{λ} (o, o^{'}) \leq 1 .$

Then $0 \leq ⋀_{o^{'} \notin X} [1 - R_{P}^{λ} (o, o^{'})] \leq 1 .$

So ∀ o ∈ O, $0 \leq \underline{R_{P}^{λ}} (X) (o) \leq 1 .$

Thus ∀ o ∈ X, $\underline{R_{P}^{λ}} (X) (o) \leq 1 = X (o) .$

Note that ∀ u ∉ X, $\underline{R_{P}^{λ}} (X) (o) = 1 - R_{P}^{λ} (u, u) = 0$ . Then ∀ u ∉ X, $\underline{R_{P}^{λ}} (X) (o) = 0 \leq 0 = X (o) .$

This implies that ∀ o ∈ O, $\underline{R_{P}^{λ}} (X) (o) \leq X (o) .$

Thus $\underline{R_{P}^{λ}} (X) \subseteq X .$

(ii) Obviously, $\forall o \in O, 0 \leq \bar{R_{P}^{λ}} (X) (o) \leq 1 .$

Then ∀ u ∉ X, $X (o) = 0 \leq \bar{R_{P}^{λ}} (X) (o) .$

Note that ∀ o ∈ X, $\bar{R_{P}^{λ}} (X) (o) = R_{P}^{λ} (u, u) = 1$ . Then ∀ o ∈ X, $\bar{R_{P}^{λ}} (X) (o) = 1 \leq 1 = X (o) .$

This implies that ∀ o ∈ O, $X (o) \leq \bar{R_{P}^{λ}} (X) (o) .$

Thus $X \subseteq \bar{R_{P}^{λ}} (X) .$

From the above, $\underline{R_{P}^{λ}} (X) \subseteq X \subseteq \bar{R_{P}^{λ}} (X) .$

(2) (i) Since P₁ ⊆ P₂ ⊆ C, by Proposition 4.2, we have $R_{P_{1}}^{λ} \subseteq R_{P_{2}}^{λ}$ .

Then ∀ o ∈ O, o′ ∉ X, $R_{P_{1}}^{λ} (o, o^{'}) \leq R_{P_{2}}^{λ} (o, o^{'}) .$

This implies that ∀ o ∈ O, $⋀_{o^{'} \notin X} [1 - R_{P_{2}}^{λ} (o, o^{'})] \leq ⋀_{o^{'} \notin X} [1 - R_{P_{1}}^{λ} (o, o^{'})] .$

Thus ∀ o ∈ O, $\underline{R_{P_{2}}^{λ}} (X) (o) \leq \underline{R_{P_{1}}^{λ}} (X) (o) .$

Therefore, $\underline{R_{P_{2}}^{λ}} (X) \subseteq \underline{R_{P_{1}}^{λ}} (X)$ .

(ii) Since P₁ ⊆ P₂ ⊆ C, by Proposition 4.2, we have $R_{P_{1}}^{λ} \subseteq R_{P_{2}}^{λ}$ .

Then ∀ o ∈ O, o′ ∈ X, $R_{P_{1}}^{λ} (o, o^{'}) \leq R_{P_{2}}^{λ} (o, o^{'}) .$

So $⋁_{o^{'} \in X} R_{P_{1}}^{λ} (o, o^{'}) \leq ⋁_{o^{'} \in X} R_{P_{2}}^{λ} (o, o^{'}) .$

Thus $\bar{R_{P_{1}}^{λ}} (X) (o) \leq \bar{R_{P_{2}}^{λ}} (X) (o) .$

Therefore, $\bar{R_{P_{1}}^{λ}} (X) \subseteq \bar{R_{P_{2}}^{λ}} (X)$ .

(3) (i) Since 0 ≤ λ₁ < λ₂ ≤ 1, by Proposition 4.2, we have $R_{P}^{λ_{1}} \subseteq R_{P}^{λ_{2}}$ .

Then ∀ o ∈ O, o′ ∉ X, $R_{P}^{λ_{1}} (o, o^{'}) \leq R_{P}^{λ_{2}} (o, o^{'}) .$

This implies that ∀ o ∈ O, $⋀_{o^{'} \notin X} [1 - R_{P}^{λ_{2}} (o, o^{'})] \leq ⋀_{o^{'} \notin X} [1 - R_{P}^{λ_{1}} (o, o^{'})] .$

Thus ∀ o ∈ O, $\underline{R_{P}^{λ_{2}}} (X) (o) \leq \underline{R_{P}^{λ_{1}}} (X) (o) .$

Therefore, $\underline{R_{P}^{λ_{2}}} (X) \subseteq \underline{R_{P}^{λ_{1}}} (X)$ .

(ii) Since P₁ ⊆ P₂ ⊆ C, by Proposition 4.2, we have $R_{P}^{λ_{1}} \subseteq R_{P}^{λ_{2}}$ .

Then ∀ o ∈ O, o′ ∈ X, $R_{P}^{λ_{1}} (o, o^{'}) \leq R_{P}^{λ_{2}} (o, o^{'}) .$

So $⋁_{o^{'} \in X} R_{P}^{λ_{1}} (o, o^{'}) \leq ⋁_{o^{'} \in X} R_{P}^{λ_{2}} (o, o^{'}) .$

Thus $\bar{R_{P}^{λ_{1}}} (X) (o) \leq \bar{R_{P}^{λ_{2}}} (X) (o) .$

Therefore, $\bar{R_{P}^{λ_{1}}} (X) \subseteq \bar{R_{P}^{λ_{2}}} (X)$ . □

Definition 5.3.Let (O, C, d) be an MSVDIS. Suppose P ⊆ C and λ ∈ [0, 1]. The fuzzy positive region of decision d relative to P can be defined as ${POS}_{P}^{λ} (d) = ⋃_{D \in O / d} \underline{R_{P}^{λ}} (D) = ⋃_{i = 1}^{r} \underline{R_{P}^{λ}} (D_{i}) .$

Proposition 5.4.Let (O, C, d) be an MSVDIS.

(1) If P₁ ⊆ P₂ ⊆ C, then ∀ λ ∈ [0, 1], ${POS}_{P_{2}}^{λ} (d) \subseteq {POS}_{P_{1}}^{λ} (d) .$

(2) If 0 ≤ λ₁ < λ₂ ≤ 1, then ∀ P ⊆ C, ${POS}_{P}^{λ_{1}} (d) \subseteq {POS}_{P}^{λ_{2}} (d) .$

Proof. It follows from Proposition 5.2. □

Definition 5.5.Let (O, C, d) be an MSVDIS. Suppose P ⊆ C and λ ∈ [0, 1]. Then λ-degree of dependence of d relative to P can be defined as $Γ_{P}^{λ} (d) = \frac{| {POS}_{P}^{λ} (d) |}{n} .$

Proposition 5.6.Let (O, C, d) be an MSVDIS.

(1) If P₁ ⊆ P₂ ⊆ C, then ∀ λ ∈ [0, 1], $Γ_{P_{2}}^{λ} (d) \subseteq Γ_{P_{1}}^{λ} (d) .$

(2) If 0 ≤ λ₁ < λ₂ ≤ 1, then ∀ P ⊆ C, $Γ_{P}^{λ_{1}} (d) \subseteq Γ_{P}^{λ_{2}} (d) .$

Proof. It follows from Proposition 5.4. □

Definition 5.7.Let (O, C, d) be an MSVDIS. Suppose P ⊆ C, λ ∈ [0, 1] and a ∈ C - P. Then λ-significance of a relative to P about d can be defined as ${sig}^{λ} (a, P, d) = Γ_{P}^{λ} (d) - Γ_{P \cup {a}}^{λ} (d) .$

6 FRIC-model in an MSVDIS

This section gives FRIC-model in an MSVDIS.

Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Suppose |P| ≤ θ₁ < θ₂ < ⋯ ≤ |C|. Then ∀ i, denote $(R_{P}^{λ})_{i} (o, o^{'}) = \frac{1}{θ_{i}} | P_{{oo}^{'}}^{λ} | (o, o^{'} \in O);$ $\underline{R_{P}^{λ}} (D)_{i} (o) = ⋀_{o^{'} \notin D} [1 - (R_{P}^{λ})_{i} (o, o^{'})] (D \in O / d, o \in O);$ $\bar{R_{P}^{λ}} (D)_{i} (o) = ⋁_{o^{'} \in D} (R_{P}^{λ})_{i} (o, o^{'}) (D \in O / d, o \in O);$ ${POS}_{P}^{λ} (d)_{i} = ⋃_{D \in O / d} \underline{R_{P}^{λ}} (D)_{i};$ $Γ_{P}^{λ} (d)_{i} = \frac{| {POS}_{P}^{λ} (d)_{i} |}{n};$ ${sig}^{λ} (a, P, d))_{i} = Γ_{P}^{λ} (d)_{i} - Γ_{P \cup {a}}^{λ} (d)_{i} (P \subseteq C, a \in C - P) .$

Theorem 6.1.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ D ∈ O/d, ∀ o ∈ O, ∀ i, $\underline{R_{P}^{λ}} (D)_{i + 1} (o) = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} \underline{R_{P}^{λ}} (D)_{i} (o) .$

Proof. $\begin{matrix} \underline{R_{P}^{λ}} (D)_{i} (o) & = ⋀_{o^{'} \notin D} (1 - \frac{1}{θ_{i}} | P_{{oo}^{'}}^{λ} |) \\ = 1 - ⋁_{o^{'} \notin D} \frac{1}{θ_{i}} | P_{{oo}^{'}}^{λ} | \\ = 1 - \frac{1}{θ_{i}} ⋁_{o^{'} \notin D} | P_{{oo}^{'}}^{λ} | . \end{matrix}$

Similarly, $\underline{R_{P}^{λ}} (D)_{i + 1} (o) = 1 - \frac{1}{θ_{i + 1}} ⋁_{o^{'} \notin D} | P_{{oo}^{'}}^{λ} | .$

Then, $\frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} \underline{R_{P}^{λ}} (D)_{i} (o)$ $\begin{matrix} = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} (1 - \frac{1}{θ_{i}} ⋁_{o^{'} \notin D} | P_{{oo}^{'}}^{λ} |) \\ = 1 - \frac{1}{θ_{i + 1}} ⋁_{o^{'} \notin D} | P_{{oo}^{'}}^{λ} | . \end{matrix}$

Thus $\underline{R_{P}^{λ}} (D)_{i + 1} (o) = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} \underline{R_{P}^{λ}} (D)_{i} (o) .$ □

Corollary 6.2.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ D ∈ O/d, ∀ o ∈ O, ∀ i ∈ [|P|, |C|-1], $\underline{R_{P}^{λ}} (D)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} \underline{R_{P}^{λ}} (D)_{i} (o) .$

Theorem 6.3.Let (O, C, d) be an MSVDIS. Given P ⊆ C and D ∈ O/d. Then ∀ i, $\bar{R_{P}^{λ}} (D)_{i + 1} = \frac{θ_{i}}{θ_{i + 1}} \bar{R_{P}^{λ}} (D)_{i} .$

Proof. Obviously. □

Corollary 6.4.Let (O, C, d) be an MSVDIS. Given P ⊆ C and D ∈ O/d. Then ∀ i ∈ [|P|, |C|-1], $\bar{R_{P}^{λ}} (D)_{i + 1} = \frac{i}{i + 1} \bar{R_{P}^{λ}} (D)_{i} .$

Theorem 6.5.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ o ∈ O, ∀ i, ${POS}_{P}^{λ} (d)_{i + 1} (o) = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} {POS}_{P}^{λ} (d)_{i} (o) .$

Proof. By Theorem 6.1, we can obtain that ∀ o ∈ O, ∀ i, ${POS}_{P}^{λ} (d)_{i + 1} (o)$ $\begin{matrix} = ⋁_{D \in O / d} {\underline{R_{P}^{λ}}}^{(i + 1)} (D) (o) \\ = ⋁_{D \in O / d} (\frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} {\underline{R_{P}^{λ}}}^{(i)} (D) (o)) \\ = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + ⋁_{D \in O / d} \frac{θ_{i}}{θ_{i + 1}} {\underline{R_{P}^{λ}}}^{(i)} (D) (o) \\ = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} ⋁_{D \in O / d} {\underline{R_{P}^{λ}}}^{(i)} (D) (o) . \end{matrix}$

Thus ${POS}_{P}^{λ} (d)_{i + 1} (o) = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} {POS}_{P}^{λ} (d)_{i} (o) .$ □

Corollary 6.6.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ o ∈ O, ∀ i ∈ [|P|, |C|-1], ${POS}_{P}^{λ} (d)_{i + 1} (o) = \frac{1}{i + 1} + \frac{i}{i + 1} {POS}_{P}^{λ} (d)_{i} (o) .$

Theorem 6.7.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ i, $Γ_{P}^{λ} (d)_{i + 1} = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} Γ_{P}^{λ} (d)_{i} .$

Proof. By Theorem 6.5, $\begin{matrix} | {POS}_{P}^{λ} (d)_{i + 1} | & = \sum_{j = 1}^{n} {POS}_{P}^{λ} (d)_{i + 1} (o_{j}) \\ = \sum_{j = 1}^{n} (\frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} {POS}_{P}^{λ} (d)_{i} (o_{j})) \\ = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} n + \frac{θ_{i}}{θ_{i + 1}} \sum_{j = 1}^{n} {POS}_{P}^{λ} (d)_{i} (o_{j}) \\ = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} n + \frac{θ_{i}}{θ_{i + 1}} | {POS}_{P}^{λ} (d)_{i} | . \end{matrix}$

Then $\frac{| {POS}_{P}^{λ} (d)_{i + 1} |}{n} = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} \frac{| {POS}_{P}^{λ} (d)_{i} |}{n} .$

Thus $Γ_{P}^{λ} (d)_{i + 1} = \frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} Γ_{P}^{λ} (d)_{i} .$ □

Corollary 6.8.Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Then ∀ i ∈ [|P|, |C|-1], $Γ_{P}^{λ} (d)_{i + 1} = \frac{1}{i + 1} + \frac{i}{i + 1} Γ_{P}^{λ} (d)_{i} .$

Theorem 6.9.Let (O, C, d) be an MSVDIS. Suppose P ⊆ C, a ∈ C - P. Then ∀ i, ${sig}^{λ} (a, P, d)_{i + 1} = \frac{θ_{i}}{θ_{i + 1}} {sig}^{λ} (a, P, d)_{i} .$

Proof. By Theorem 6.7, sig^λ(a, P, d) _i+1 $\begin{matrix} = Γ_{P}^{λ} (d)_{i + 1} - Γ_{P \cup {a}}^{(i + 1)} (d) \\ = (\frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} Γ_{P}^{λ} (d)_{i}) - \\ (\frac{θ_{i + 1} - θ_{i}}{θ_{i + 1}} + \frac{θ_{i}}{θ_{i + 1}} Γ_{P \cup {a}}^{λ} (d)_{i}) \\ = \frac{θ_{i}}{θ_{i + 1}} Γ_{P}^{λ} (d)_{i} - \frac{θ_{i}}{θ_{i + 1}} Γ_{P \cup {a}}^{λ} (d)_{i} \\ = \frac{θ_{i}}{θ_{i + 1}} (Γ_{P}^{λ} (d)_{i} - Γ_{P \cup {a}}^{λ} (d)_{i}) . \end{matrix}$

Thus ${sig}^{λ} (a, P, d)_{i + 1} = \frac{θ_{i}}{θ_{i + 1}} {sig}^{λ} (a, P, d)_{i} .$ □

Corollary 6.10.Let (O, C, d) be an MSVDIS. Suppose P ⊆ C, a ∈ C - P. Then ∀ i ∈ [|P|, |C|-1], ${sig}^{λ} (a, P, d)_{i + 1} = \frac{i}{i + 1} {sig}^{λ} (a, P, d)_{i} .$

7 A-reduction in an MSVDIS

This part studies A-reduction in an MSVDIS and proposes the corresponding algorithm.

Algorithm 1 Computing $R_{P}^{λ}$

REQUIRE (O, C, d), P ⊆ C and λ ∈ [0, 1].

ENSURE $R_{P}^{λ}$ .

1: foro, o′ ∈ O

2: fora ∈ C

3: Calculate HD(P_{f_a(o)}, P_{f_a(o′)});

4: ifHD(P_{f_a(o)}, P_{f_a(o′)}) ≤ λ

5: $a \in P_{{oo}^{'}}^{λ}$ ;

6: end if

7: end for

8: Obtain $P_{{oo}^{'}}^{λ}$ ;

9: Calculate $R_{P}^{λ} (o, o^{'}) = | P_{{oo}^{'}}^{λ} |$ ;

10: end for

11: return $R_{P}^{λ} = (R_{P}^{λ} (o, o^{'}))_{n \times n}$ ;

It can be seen that Algorithm 1 has two cycles, and the time complexity is mainly determined by the scale of O and C. Therefore, the time complexity of the overall algorithm is O(|C| * |O|²). Based on the above FRIC-model, we propose Algorithm 2 for a reduct. This algorithm uses FRIC-model to solve the MV-data problem, so we call this algorithm MFRIC.

Algorithm 2 Obtain a reduct in an MSVDIS (FRIC)

REQUIRE (O, C, d), D ∈ O/d, $R_{P}^{λ}$ , δ ∈ [0, 1].

ENSURE A reduct P.

1: let P =∅;

2: while true do

3: θ = |P|;

4: fora ∈ C - P

5: Compute $\underline{R_{P ⋃ {a}}^{λ}} (D) = \min {1 - \frac{1}{| θ | + 1} R_{P ⋃ {a}}^{λ} (D)$ };

$Γ_{P ⋃ {a}}^{λ} (d) = \frac{\sum_{D \in O / d} \underline{R_{P ⋃ {a}}^{λ}} (D)}{| O |}$ ;

6: end for

7: Find a^* ∈ C - P such that $Γ_{P ⋃ {a^{*}}}^{λ} (d) = \max {Γ_{P ⋃ {a}}^{λ} (d) : a \in C - P}$ ;

Then compute $sig (a^{*}, P, d) = \frac{θ}{θ + 1} (Γ_{P}^{λ} (d) - Γ_{P ⋃ {a^{*}}}^{λ} (d))$ ;

8: ifsig(a^*, P, d) > δ

9: P = P ∪ {a^*

10: else

11: break;

12: end if

13: end while

14: return P.

By Algorithm 2 (MFRIC), we obtain a reduct about data. It adds a parameter δ in step 10, which is a threshold and usually take (0.05, 0.3). The δ is mainly used to control the significance. When δ takes different values, different reducts can be obtained. The time complexity of Algorithm 2 is polynomial. The time complexity from step(4) to step(7) is O(|C - P| * |O|²). The time complexity from step(8) is O(|C - P|). There are no loops in the remaining steps, and the time complexity is O(1). From step 2 to step 15, attribute reduction is realized, and the number of attributes is reduced from |C| to |P|. The time complexity is $O (\frac{(| C | + | P |)}{2} * (| C | - | P | + 1) * | O |^{2}) = O (\frac{| C |^{2} + | C | - | P |^{2} + | P |}{2} * | O |^{2})$ . Its space complexity is O(|C| * |O|²).

8 Experiments

In order to verify the effectiveness of the proposed algorithm, we will introduce the results of numerical experiments in this section. All numerical experiments were completed on Lenovo computer with Intel Core i7-4790cup @ 3.60GHZ and 16G memory. Eight classified data were selected from UCI [27] machine learning repository for experiment. Since there is no MV-data in UCI, we choose incomplete data to replace MV-data. These eight data sets are detailed in Table 3.

Table 3
A description of data sets

Data set Abbreviation Instance Attribute Class

Annealing Ann 798 38 6

Contraceptive Method Choice CMC 1473 9 3

German Ger 1000 20 2

Chess Che 3196 36 2

Primary Tumor PT 339 17 22

Tic-Tac-Toe TTT 958 9 2

Molecular Biology MB 3190 61 3

Mushroom Mus 8124 22 2

Data set	Abbreviation	Instance	Attribute	Class
Annealing	Ann	798	38	6
Contraceptive Method Choice	CMC	1473	9	3
German	Ger	1000	20	2
Chess	Che	3196	36	2
Primary Tumor	PT	339	17	22
Tic-Tac-Toe	TTT	958	9	2
Molecular Biology	MB	3190	61	3
Mushroom	Mus	8124	22	2

In order to facilitate operation and observe the impact of deletion rate on the algorithm, 5%, 10% and 20% of the information are deleted randomly in the all data set. Our goal is to reduce data attributes while keeping the classification accuracy of data unchanged. The subset with the largest classification accuracy is recorded while reducing the data set, and the subset with the largest classification accuracy is taken as the optimal subset. Therefore, MFRIC algorithm is programmed with MATLAB to find the best reduction set. Because the proposed algorithm has a key parameter λ, the selection of λ will directly affect the reduced subset. Parameter λ is selected from 0.5 to 0.9 in steps of 0.05.

The attribute subset with the highest classification accuracy is selected as the final result. The number of reduced subsets and the value of parameter λ are shown in Table 4. At the same time, in order to verify the advantages and disadvantages of the proposed algorithm, we looked for the other three algorithms to compare with it. Because there are few literatures on multiset-valued information systems, algorithms KGIRA-M [35], FSRS [24] and FSNTDJE [25] were selected for comparison. KGIRA-M and FSNTDJE discuss incomplete information systems, and FSRS studies set-valued information systems. The average reduction subsets of the other three algorithms under the three deletion rates are shown together in Table 4.

Table 4

Number of selected attributes

Data set	Raw data	KGIRA-M	FSRS	FSNTDJE	MFRIC
					λ	Number 5%	λ	Number 10%	λ	Number 20%
Ann	38	7	8.67	6.67	0.70	8	0.80	9	0.80	8
CMC	9	4.33	4.67	4.67	0.75	4	0.85	4	0.8	5
Ger	20	8.33	9.67	6.33	0.65	8	0.85	9	0.85	12
Che	36	12	11.33	10.33	0.65	5	0.65	9	0.55	12
PT	17	10.67	9.33	9	0.80	5	0.80	7	0.75	10
TTT	9	5	5.67	5.67	0.70	6	0.75	5	0.75	6
MB	61	13.67	13	14	0.75	6	0.75	10	0.80	14
Mus	22	8	7	7.33	0.70	9	0.75	9	0.70	9
Average	26.50	7.75	8.13	7.63	0.71	6.38	0.78	7.75	0.75	9.50

It can be seen from Table 4 that the average number of reduced attributes of algorithm MFRIC in the case of data loss of 5%, 10% and 20% is 6.38, 7.75 and 9.50 respectively, and the overall average number is 7.86. The reduction effect of this algorithm is equivalent to that of the other three algorithms, and can be considered to be effective.

Next, we will discuss the classification accuracy of the reduced subsets of the four algorithms under the same classifier. Two classifiers Classification And Regression Trees (CART) and Support Vector Machines (SVM) are selected to calculate the classification accuracy of reduced subsets. 5-fold cross validation is used to test the stability of the four algorithms. Tables 5 and 6 show the accuracy comparison of the four algorithms after reduction under the two classifiers. For comparison, the classification accuracy of all sets is calculated and the results are placed in column 2. Black bold font indicates the best result for this line. From the tables, it can be seen that MFRIC calculates slightly more quantities with higher accuracy than the other three algorithms. The accuracy of CART classifier is significantly higher than that of SVM. Further, it can be found that when the deletion rate increases gradually, the classification accuracy decreases, which is consistent with the intuitive understanding. However, does this mean that the proposed algorithm must be better than other algorithms? Statistical analysis is discussed in the next section. Through statistical analysis, we can judge whether the performance of the algorithm is significant.

Table 5

Comparison of classification accuracies of reduced data with Classification And Regression Trees (CART)

Data set	Raw data		KGIRA-M			FSRS			FSNTDJE			MFRIC
		5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
Ann	0.9799	0.9060	0.8270	0.8020	0.9135	0.8396	0.9022	0.9010	0.8258	0.9197	0.9160	0.8436	0.9340
CMC	0.5377	0.5322	0.4869	0.4670	0.5126	0.4840	0.4664	0.5173	0.4949	0.4657	0.5234	0.4969	0.4786
Ger	0.7160	0.7170	0.7070	0.6950	0.7150	0.7060	0.6840	0.7220	0.7010	0.6880	0.7250	0.7130	0.7020
Che	0.9727	0.9080	0.6884	0.6827	0.8833	0.6824	0.7337	0.8479	0.8356	0.7024	0.8874	0.8091	0.7775
PT	0.4248	0.3540	0.3363	0.3156	0.3362	0.3069	0.2802	0.3274	0.3126	0.3038	0.3304	0.3392	0.2891
TTT	0.7871	0.7244	0.7015	0.6848	0.6972	0.6868	0.6681	0.7139	0.7098	0.6973	0.7401	0.7174	0.6812
MB	0.8618	0.7853	0.7649	0.7508	0.7833	0.7740	0.7602	0.7857	0.7937	0.7749	0.8119	0.8045	0.7543
Mus	1	0.9857	0.9761	0.9650	0.9934	0.9844	0.9724	0.9921	0.9835	0.9712	0.9970	0.9877	0.9746
Average	0.7850	0.7390	0.6860	0.6704	0.7293	0.6830	0.6834	0.7259	0.7071	0.6904	0.7368	0.7119	0.7013

Table 6

Comparison of classification accuracies of reduced data with Support Vector Machines (SVM)

Data set	Raw data	KGIRA-M			FSRS			FSNTDJE			MFRIC
		5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
Ann	0.9453	0.8640	0.8537	0.8324	0.8516	0.8428	0.8317	0.8763	0.8505	0.8320	0.9040	0.7644	0.8567
CMC	0.5154	0.5010	0.4823	0.4606	0.4937	0.4812	0.4600	0.5002	0.4817	0.4693	0.5079	0.4984	0.4476
Ger	0.7570	0.7320	0.7150	0.7000	0.6940	0.6850	0.6630	0.7330	0.7010	0.6810	0.7410	0.7050	0.7010
Che	0.9725	0.5713	0.5147	0.5006	0.5450	0.5319	0.5075	0.5128	0.5009	0.4950	0.5282	0.5219	0.5172
PT	0.4130	0.3363	0.3097	0.2537	0.3333	0.3009	0.2389	0.3304	0.3038	0.2625	0.3274	0.3128	0.2772
TTT	0.6545	0.6419	0.6284	0.6169	0.6441	0.6378	0.6263	0.6430	0.6325	0.6273	0.6534	0.6482	0.6545
MB	0.8288	0.5172	0.5147	0.5166	0.5047	0.5053	0.5025	0.5125	0.5119	0.5110	0.5191	0.5188	0.5185
Mus	0.9490	0.8063	0.7447	0.6145	0.8099	0.7376	0.5539	0.7999	0.7509	0.6155	0.8154	0.7607	0.6278
Average	0.7544	0.6213	0.5954	0.5619	0.6095	0.5903	0.5480	0.6135	0.5917	0.5617	0.6250	0.5921	0.5751

In the proposed algorithm, the parameter λ affects the change of classification accuracy. The relationship between λ and classification accuracy under different data sets is observed by drawing. Figure 1 shows the relationship between parameter λ and classification accuracy. It can be seen from the figure that the curve drawn by CART is slightly higher than SVM under the same deletion rate, which also shows that CART is slightly better than SVM in classification effect. The curve in the figure has no obvious change law, which shows that the value of parameter λ will affect the classification accuracy, but which value will have a greater impact depends on the data set and deletion rate.

Fig. 1

Classification accuracy of reduced subset of algorithm MFRIC under classifier CART and SVM.

In order to compare the classification accuracy of the four algorithms, the classification accuracy of all the best reduced subsets is found, and the corresponding receiver operation characteristic curves(ROC) were drawn. The deletion rate of 5% was used for all experimental data. The ROC curves obtained by using CART classifier for the reduced subsets of the four algorithms are shown in Figure 2. The Area Under Curve (AUC) is calculated separately and displayed on the legend of each subgraph. The horizontal axis of Figure 2 is false positive rate and the vertical axis is true positive rate. It can be seen from Figure 2 that most of the areas are greater than 0.5, indicating that the classification effect is very nice. In most cases, the AUC of MFRIC algorithm is slightly larger than that of other algorithms, which shows that it not only has higher classification accuracy, but also has good classification balance. In order to confirm the effect of cart classifier, we also selected four data sets to verify whether the ROC curve and AUC under SVM classifier are as obvious as each other. As can be seen from Figure 3, the effect of classifier SVM is slightly worse than CART, but the AUC still exceeds 0.5. This shows that although the classification effect of SVM is not as good as CART, it also meets the requirements of classification.

Fig. 2

ROC curve and AUC value under CART classifier.

Fig. 3

ROC curve and AUC value under SVM classifier.

9 Friedman test and post-hoc test

Friedman test is an nonparametric hypothesis test which is applied to compare the performance of classification algorithms statistically [5]. If Friedman test result is significant, then the null hypothesis “all algorithms have the same performance” will be rejected. Thus a post-hoc test such as Nemenyi test should be carried out to further test which algorithms perform better.

For T data sets and G algorithms, C_i is the average rank of the i-th algorithm with i = 1, 2, ⋯ , G. The original Friedman statistic is defined as $χ_{F}^{2} = \frac{T}{G (G + 1)} (12 \sum_{i = 1}^{G} C_{i}^{2} - 3 G (G + 1)^{2}) .$

Since the original Friedman test result is generally not significant when the value of G is too small, a different Friedman test with a test statistic F_F is usually used. The statistic F_F follows the F-distribution with G - 1 and (G - 1)(T - 1) degrees of freedom and it is defined as $F_{F} = \frac{(T - 1) χ_{F}^{2}}{T (G - 1) - χ_{F}^{2}} .$

Table 9
The Friedman tests results with CART and SVM

Classifiers $χ_{F}^{2}$ F _F α F_α((G - 1) , (G - 1)(T - 1)) p - value

CART 21.05 9.5025 0.05 F_0.05(3, 69) =2.7375 0.0001028

SVM 20.65 9.2493 0.05 F_0.05(3, 69) =2.7375 0.0001244

Classifiers	$χ_{F}^{2}$	F _F	α	F_α((G - 1) , (G - 1)(T - 1))	p - value
CART	21.05	9.5025	0.05	F_0.05(3, 69) =2.7375	0.0001028
SVM	20.65	9.2493	0.05	F_0.05(3, 69) =2.7375	0.0001244

Nemenyi test can be used to test whether there is difference between the performance of two algorithms. The critical value of Nemenyi test is defined as ${CD}_{α} = q_{α} \sqrt{\frac{G (G + 1)}{6 T}},$ where α is significance level, and q_α is the critical value of the Tukey distribution with parameters 1 - α and G. The difference between the average ranks of each pair of algorithms is compared with the critical value CD_α. When the average rank difference is larger than the critical value, it indicates the pair of algorithms perform significantly different.

The ranks of the proposed algorithms with different classifiers can be calculated by their classification accuracies from Tables 5 and 6. They are showed in Tables 7 and 8.

Table 7

Ranks of different algorithms with CART

Data set	KGIRA-M			FSRS			FSNTDJE			MFRIC
	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
Ann	3	3	4	2	2	3	4	4	2	1	1	1
CMC	2	3	2	4	4	3	3	2	4	1	1	1
Ger	3	2	2	4	3	4	2	4	3	1	1	1
Che	1	3	4	3	4	2	4	1	3	2	2	1
PT	1	2	1	2	4	4	4	3	2	3	1	3
TTT	2	3	2	4	4	4	3	2	1	1	1	3
MB	3	3	4	4	4	2	2	2	1	1	1	3
Mus	4	4	4	2	2	2	3	3	3	1	1	1

Table 8

Ranks of different algorithms with SVM

Data set	KGIRA-M			FSRS			FSNTDJE			MFRIC
	5%	10%	20%	5%	10%	20%	5%	10%	20%	5%	10%	20%
Ann	3	1	2	4	3	4	2	2	3	1	4	1
CMC	2	2	2	4	4	3	3	3	1	1	1	4
Ger	3	1	2	4	4	4	2	3	3	1	2	1
Che	1	3	3	2	1	2	4	4	4	3	2	1
PT	1	2	3	2	4	4	4	3	2	3	1	1
TTT	4	4	4	2	2	3	3	3	2	1	1	1
MB	2	2	2	4	4	4	3	3	3	1	1	1
Mus	3	3	3	2	4	4	4	2	2	1	1	1

The Friedman tests results with CART and SVM in Table 8. F_F > F_0.05(3, 69), p - value< α. The results of statistical analysis point out that the null hypothesis of “there is no significant difference between the classification accuracies of the four algorithms" is rejected both with CART and with SVM. This means that two Nemenyi tests are needed. The results of the Nemenyi tests in Figures 4 and 5 are illustrated as follows:

(1) Under the CART classifier

a) The algorithm MFRIC had a higher classification accuracy than that of KGIRA-M, FSRS, and FSNTDJE;

b) There was no significant difference in classification accuracy among algorithms KGIRA-M, FSRS, FSNTDJE.

(2) Under the SVM classifier

a) The algorithm MFRIC had a higher classification accuracy than that of FSRS and FSNTDJE;

b) The algorithm KGIRA-M had a higher classification accuracy than that of FSRS and FSNTDJE;

c) There was no significant difference in classification accuracy between algorithms MFRIC and KGIRA-M;

d) There was no significant difference in classification accuracy between algorithms FSRS and FSNTDJE.

Fig. 4

The result of Nemenyi test with CART.

Fig. 5

The result of Nemenyi test with SVM.

10 Conclusions

In this paper, the fuzzy symmetry relations on object set of an MSVDIS have been established based on the similarity between information values which is fed back to the attribute set, and FR-sets have been proposed. FR-sets use the fuzzy symmetry relations to define fuzzy rough approximations, fuzzy positive region and fuzzy dependency and then overcome the deficiency of classical rough sets. In the fuzzy symmetry relation, there is a parameter that controls the similarity between two objects. FRIC-model in an MSVDIS has been presented based on the iteration of fuzzy positive region and fuzzy dependency. A-reduction algorithm based on FRIC-model has been given. Experiments on eight datasets from UCI have been carried out. The experimental results and statistical analysis show that the given algorithm is fast and occupies few memory. The advantage of the given algorithm is to define an iterative formula for A-reduction algorithm in an MSVDIS. In the near future, we will develop FRIC-model for other type of data.

Footnotes

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by Natural Science Foundation of Guangxi (2021GXNSFAA220114).

References

Beaubouef

and Petry

F.E.

, Fuzzy rough set techniques foruncertainty processing in a relational database, InternationalJournal of Intelligent Systems 15 (2000), 389–424.

Cornelis

, Jensen

, Martin

G.H.

and Slezak

, Attributeselection with fuzzy decision reducts, Information Sciences 180 (2010), 209–224.

Chen

Y.M.

, Xue

, Ma

and Xu

F.F.

, Measures of uncertainty forneighborhood rough sets, Knowledge-Based Systems 120, (2017), 226–235.

Chen

D.G.

, Zhang

, Zhao

S.Y.

, Hu

Q.H.

and Zhu

P.F.

, A novelalgorithm for finding reducts with fuzzy rough sets, IEEETransaction on Fuzzy Systems 20 (2012), 385–389.

Demisar

, Statistical comparisons of classifiers over multipledata sets, Journal of Machine Learning Research 7, (2006), 1–30.

Dai

J.H.

, Hu

, Wu

W.Z.

, Qian

Y.H.

and HuangMaximaldiscernibility-pair-based

D.B.

, approach to attribute reduction infuzzy rough sets, IEEE Transactions on Fuzzy Systems 26(4) (2018), 2175–2187.

Dai

J.H.

and Tian

H.W.

, Fuzzy rough set model for set-valued data, Fuzzy Sets and Systems 229 (2013), 54–68.

Dai

J.H.

, Wang

W.T.

, Tian

H.W.

and Liu

, Attribute selection basedon a new conditional entropy for incomplete decision systems, Knowledge-Based Systems 39 (2013), 207–213.

Holcapek

, , A graded approach to cardinal theory of finitefuzzy sets, part I: Graded equipollence, Fuzzy Sets andSystems 298 (2016), 158–193.

10.

Girish

K.P.

and John

S.J.

, Relations and functions in multisetcontext, Information Sciences 179 (2009), 758–768.

11.

Y.L.

, Chen

, Lv

M.Q.

, Li

Y.J.

and Li

Y.Y.

, Extracting semanticevent information from distributed sensing devices using fuzzy sets, Fuzzy Sets and Systems 337 (2018), 74–92.

12.

Z.W.

, Huang

, Liu

X.F.

, Xie

N.X.

and Zhang

G.Q.

, Informationstructures in a covering information system, InformationSciences 507 (2020), 449–471.

13.

Z.W.

, Liu

X.F.

, Dai

J.H.

, Chen

J.L.

and Fujita

, Measures ofuncertainty based on Gaussian kernel for a fully fuzzy informationsystem, Knowledge-Based Systems 196 (2020), 105791.

14.

Lang

G.M.

, Li

Q.G.

and Guo

L.K.

, Homomorphisms-based attributereduction of dynamic fuzzy covering information systems, International Journal of General Systems 44(7-8) (2015), 791–811.

15.

Z.W.

, Qu

L.D.

, Zhang

G.Q.

and Xie

N.X.

, Attribute selection forheterogeneous data based on information entropy, InternationalJournal of General Systems 50(5) (2021), 548–566.

16.

Liu

and Zhong

, Attribute reduction of set-valued decisioninformation system based on dominance relation, Journal ofInterdisciplinary Mathematics 19(3) (2016), 469–479.

17.

Liu

Y.J.

, Zhang

W.G.

and Gupta

, International asset allocationoptimization with fuzzy return, Knowledge-Based Systems 139 (2018), 189–199.

18.

Z.W.

, Zhang

P.F.

, Ge

, Xie

N.X.

, Zhang

G.Q.

and WenUncertainty

C.F.

, measurement for a fuzzy relation information system, IEEE Transactions on Fuzzy Systems 27(12) (2019), 2338–2352.

19.

Meng

Z.Q.

and Shi

Z.Z.

, A fast approach to attribute reduction inincomplete decision systems with tolerance relation-based roughsets, Information Sciences 179 (2009), 2774–2793.

20.

Nikulin

M.S.

“Hellinger distance", in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, 2001.

21.

Pawlak

, Rough sets, International Journal of Computer andInformation Science 11 (1982), 341–356.

22.

Pawlak

Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, 1991.

23.

Qian

Y.H.

, Liang

J.Y.

, Wu

W.Z.

and Dang

C.Y.

, Informationgranularity in fuzzy binary GrC model, IEEE Transactions onFuzzy Systems 19(2) (2011), 253–264.

24.

Singh

, Shreevastava

, Som

and Somani

, A fuzzysimilarity-based rough set approach for attribute selection insetvalued information systems, Soft Computing 24(6) (2020), 4675–4691.

25.

Sun

, Wang

L.Y.

and Qian

Y.H.

, et al., Feature selection usingLebesgue and entropy measures for incomplete neighborhood decisionsystems, Knowledge-Based Systems 186 (2019), 104942.

26.

Trabelsi

and Elouedi

, Heuristic method for attribute selectionfrom partially uncertain data using rough sets, InternationalJournal of General Systems 39(3) (2010), 271–290.

27.

UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html.

28.

Wang

C.Z.

, Huang

, Shao

M.W.

and Fan

X.D.

, Fuzzy rough setbasedattribute reduction using distance measures, Knowledge-BasedSystems 164 (2019), 205–212.

29.

Wang

C.Z.

, Qi

, Shao

M.W.

, Hu

Q.H.

, Chen

D.G.

, Qian

Y.H.

and Lin

Y.J.

, A fitting model for feature selection with fuzzy rough sets, IEEE Transaction on Fuzzy Systems 25 (2016), 741–753.

30.

Wang

C.Z.

, Wang

, Shao

M.W.

, Qian

Y.H.

and Chen

D.G.

, Fuzzy roughattribute reduction for categorical data, IEEE Transactions onFuzzy Systems 28(5) (2020), 818–830.

31.

Xie

N.X.

, Liu

, Li

Z.W.

and Zhang

G.Q.

, New measures ofuncertainty for an interval-valued information system, Information Sciences 470 (2019), 156–174.

32.

Yao

Y.Y.

, Probabilistic approaches to rough sets, ExpertSystems 20 (2003), 287–297.

33.

Yao

Y.Y.

and Zhang

X.Y.

, Class-specific attribute reducts in roughset theory, Information Sciences 418-419 (2017), 601–618.

34.

Zadeh

L.A.

, Fuzzy sets, Information and Control 8(1965), 338–356.

35.

Zhang

C.C.

, Dai

J.H.

and Chen

J.L.

, Knowledge granularity basedincremental attribute reduction for incomplete decision systems, International Journal of Machine Learning and Cybernetics 11(5) (2020), 1141–1157.

36.

Zhang

G.Q.

, Li

Z.W.

, Wu

W.Z.

, Liu

X.F.

and Xie

N.X.

, Informationstructures and uncertainty measures in a fully fuzzy informationsystem, International Journal of Approximate Reasoning 101 (2018), 119–149.

Attribute reduction for multiset-valued data based on FRIC -model

Abstract

Keywords

1 Introduction

2 Preliminaries

2.1 FR-sets

2.2 Multisets and probability distribution sets

3 MSVDISs

Table 1 An IDIS O Headache (a1) Muscle pain (a2) Temperature (a3) Symptom (d) o 1 Sick Yes High Flu o 2 Sick Yes Low Flu o 3 Middle * Normal Flu o 4 No Yes Normal Flu o 5 * Yes Normal Rhinitis o 6 Middle No * Rhinitis o 7 No No Low Health o 8 No * * Health o 9 * Yes Low Health

5 Some evaluation functions in an MSVDIS

6 FRIC-model in an MSVDIS

7 A-reduction in an MSVDIS

8 Experiments

Table 3 A description of data sets Data set Abbreviation Instance Attribute Class Annealing Ann 798 38 6 Contraceptive Method Choice CMC 1473 9 3 German Ger 1000 20 2 Chess Che 3196 36 2 Primary Tumor PT 339 17 22 Tic-Tac-Toe TTT 958 9 2 Molecular Biology MB 3190 61 3 Mushroom Mus 8124 22 2

Table 9 The Friedman tests results with CART and SVM Classifiers χ F 2 F F α F α((G - 1) , (G - 1)(T - 1)) p - value CART 21.05 9.5025 0.05 F0.05(3, 69) =2.7375 0.0001028 SVM 20.65 9.2493 0.05 F0.05(3, 69) =2.7375 0.0001244

Footnotes

Acknowledgments

References

Table 9
The Friedman tests results with CART and SVM

Classifiers $χ_{F}^{2}$ F _F α F_α((G - 1) , (G - 1)(T - 1)) p - value

CART 21.05 9.5025 0.05 F_0.05(3, 69) =2.7375 0.0001028

SVM 20.65 9.2493 0.05 F_0.05(3, 69) =2.7375 0.0001244