Abstract
Attribute selection in an information system (IS) is an important issue when dealing with a large amount of data. An IS with incomplete interval-value data is called an incomplete interval-valued information system (IIVIS). This paper proposes attribute selection approaches for an IIVIS. Firstly, the similarity degree between two information values of a given attribute in an IIVIS is proposed. Then, the tolerance relation on the object set with respect to a given attribute subset is obtained. Next, θ-reduction in an IIVIS is studied. What is more, connections between the proposed reduction and information entropy are revealed. Lastly, three reduction algorithms base on θ-discernibility matrix, θ-information entropy and θ-significance in an IIVIS are given.
Introduction
Rough set theory was put forward by Pawlak [20–23]. This theory is a significant approach for managing uncertainty. One of the advantages of rough set is that it does not need any preliminary or additional data information, but it is directly based on the original data, so it is more objective and credible. Many applications of rough set theory are connected with information systems (ISs) [2, 27].
Information system (IS) based on rough set theory was also introduced by Pawlak. They may reveal large databases and knowledge discovery process mathematically. An interval-valued information system (IVIS) means an IS where its information values are interval numbers. An incomplete interval-valued information system (IIVIS) means an IVIS with missing values. As a common IS, IVISs have been studied by many scholars. For example, Dai et al. introduced θ-similarity entropy and then proposed θ-rough degree based on θ-similarity entropy to measure the uncertainty of rough sets in IIVISs. Yao [30] presented an interval set model for IVISs with upper and lower approximations, as well as introduced generalized decision logic; Leung et al. [15] investigated a rough set approach based on knowledge induction process for selecting decision rules with minimum feature sets in IVISs; Xie et al. [28] considered new measures of uncertainty for an IVIS; Yang et al. [31] raised a dominance relation and generated the optimal decision rules in IIVISs; Sakai et al. [25] developed a rule generation prototype system for incomplete information databases in Lipski that can process IVISs.
As an important tool for estimating incomplete information, information entropy has been applied. We refer to the articles about information entropy of other scholars. For instance, D
In general, some attributes in the same IIVIS are redundant. we want to find a reduct that has the fewest attributes. Attribute selection in a given IIVIS mean deleting irrelevant or unimportant attributes under the condition of keeping the classification ability of this IIVIS. As one of the core contents of rough set theory, attribute selection has attracted a great deal of attention. For instance, Zhang et al. [33] put forward multi-confidence rule acquisition and confidence-preserved attribute selection in interval-valued decision systems. Yang et al. [31] discussed dominance-based rough set approach to IIVIS.
Given an IIVIS, we define tolerance relation based on similarity degree and put forward a rough set model based on this relation. The θ-discernibility matrix of this IIVIS can be obtained by the given threshold θ. Reduction algorithms based on θ-discernibility matrix, θ-information entropy and θ-significance in an IIVIS are proposed. And the work process is displayed in Fig. 1.

The work process of the paper.
The rest of this article is designed as follows. Section 2 retrospects the essential notions of binary relations, interval numbers and IIVISs. Section 3 gives similarity degree and tolerance relations in an IIVIS. Section 4 investigates θ-reduction in an IIVIS and reveals connections between the proposed the proposed reduction and information entropy. Section 5 obtains three algorithms on θ-reduction in an IIVIS. Section 6 compares with reduction in other kinds of ISs and concludes this article.
In this section, the essential notions of binary relations, interval numbers and IIVISs are reviewed.
In this article, U signifies the finite universe, 2 U expresses a set of all subsets of U and |X| means the number of elements in X ∈ 2 U .
Put
Binary relations
R is said to be a binary relation on U whenever R ⊆ U × U. If (u, v) ∈ R, then uRv.
Let R be a binary relation on U. Then R is regarded as a universal relation on U whenever R = δ; R is regarded as an identity relation on U whenever R =▵.
Assume that R is a binary relation on U. Then R is referred to as an equivalence relation on U, if R meets:
(1) reflexive: ∀ u ∈ U, uRu;
(2) symmetric: ∀ u, v ∈ U, uRv implies vRu;
(3) transitive: ∀ u, v, w ∈ U, uRv and vRw imply uRw.
In addition, R is said to be a tolerance relation on U if it is reflexive and symmetric.
Interval-valued numbers
Let
For any m ∈ R, express
For any m, n ∈ [R], define
(1) m = n ⇒ m- = n-, m+ = n+.
(2) m ≤ n ⇒ m- ≤ n-, m+ ≤ n+; m < n ⇒ m ≤ n, m ≠ n.
(1) ∀ m, n ∈ [R], 0 ≤ p (m, n) ≤1;
(2) ∀ m ∈ [R], p (m, m) =0.5;
(3) ∀ m, n ∈ [R], p (m, n) + p (n, m) =1.
(1) ∀ m, n ∈ [R], q (m, n) = q (n, m);
(2) ∀ m, n ∈ [R], 0 ≤ q (m, n) ≤1;
(3) ∀ m, n ∈ [R], q (m, n) =1 ⇒ m = n.
An IIVIS
If (U, A) is an IS. Given B ⊆ A. Then we can define
Evident, ind (B) is an equivalence relation on U, ind (B) = ⋂ a∈Bind ({a}) .
Denote
We call (U, A) is an IIS. Given B ⊆ A. Then a binary relation on U can be defined as
sim (B) = {(u, v) ∈ U × U : ∀ a ∈ B, a (u) = a (v) or a (u) = * or a (v) = *} ,
here * is a missing value.Evident, sim (B) is a tolerance relation on U and sim (B) = ⋂ a∈Bsim ({a}) .
Let (U, A) be an IIS. For each a ∈ A, denote
If B ⊆ A, then (U, B) is known as the subsystem of (U, A).
An IIVIS
An IIVIS
In this section, the concept of IIVIS was proposed, the similarity degree between two information values on a given attribute in an IIVIS is constructed and the tolerance relation induced by a given subsystem is given.
The similarity degree between information values on an attribute in an IIVIS
s (a (u) , a (v)) =
For the convenience of expression, denote
Obviously,
(1) If B1 ⊆ B2 ⊆ A, then ∀ θ ∈ (0, 1],
(2) If 0 ≤ θ1 ≤ θ2 ≤ 1, then ∀ B ⊆ A,
By Definition 3.3, we have
∀ a ∈ B, s (a (u) , a (v)) ≥ θ and ∀ a ∈ C, s (a (u) , a (v)) ≥ θ.
Then ∀ a ∈ B ∪ C, s (a (u) , a (v)) ≥ θ. By Definition 3.3,
Consequently,
“ ⇐ " Owing to B ⊆ B ∪ C and C ⊆ B ∪ C, by Proposition 3.4,
Clearly,
In addition, put
(1) If C ⊆ B ⊆ A, then ∀ θ ∈ (0, 1] and u ∈ U,
(2) If 0 ≤ θ1 ≤ θ2 ≤ 1, then ∀ B ⊆ A and u ∈ U,
Pick θ = 0.6. Then ∀ i, ∀ u ∈ U, the tolerance class
(i = 1, 2, 3, θ = 0.6, u ∈ U)
An algorithm for computing the tolerance class is designed as follows.
θ-reduction, θ-core and θ-discernibility matrix in an IIVIS
Given an IIVIS (U, A), each subset B of A determines a tolerance relation (or indiscernibility relation)
(1) B is called θ-coordinate subset of A, if
(2) a ∈ B is called θ-independent in B, if
(3) B is called a θ-reduct of A, if B is called θ-coordinate subset of A and B is θ-independent.
“
In this paper, the family of all θ-coordinate subsets (resp., θ-reducts) of A is denoted by co θ (A) (resp., red θ (A)).
Obviously,
On the base of the previous analysis, B ∈ red θ (A) means that B is a minimal attribute subset where the subsystem (U, B) has the same classification ability as the system (U, A) with respect to a given θ.
Suppose that ∃ a1 ∈ A, A - {a1} ∈ co θ (A). Then, we consider A - {a1}. Again suppose that ∀ a ∈ A - {a1}, (A - {a1}) - {a} ∉ co θ (A). Then A - {a1} ∈ red θ (A). Again suppose that ∃ a2 ∈ A - {a1}, (A - {a1}) - {a2} ∈ co θ (A). Then, we consider A - {a1, a2}. Repeat this process. Since A is finite, we can find a θ-reduct of A.
Thus, there always exists a θ-reduct of A.□
(1) a ∈ A is called a necessary θ-attribute, if a ∈ core θ (A).
(2) a ∈ A is called a relatively necessary θ-attribute, if a ∈ ⋃ B∈red θ (A)B - core θ (A).
(3) a ∈ A is called an unnecessary θ-attribute, if a ∈ A - ⋃ B∈red θ (A)B.
(1) D θ (u, v) is called the θ-discernibility set on u and v.
(2) Dθ (A) = (d ij ) n×n is called the θ-discernibility matrix where U = {u1, u2, ·· · , u n } and d ij = D θ (u i , u j ) (1 ≤ i, j ≤ n).
Pick θ = 0.6, Dθ (A) of A is obtained as follows:
Some properties
(1) D θ (u, u) =∅.
(2) D θ (u, v) = D θ (v, u).
(3) D θ (u, v) ⊆ D θ (u, w) ∪ D θ (w, v).
(3) Suppose that D
θ
(u, v) ⊊ D
θ
(u, w) ∪ D
θ
(w, v). Then D
θ
(u, v) - D
θ
(u, w)∪ D
θ
(w, v) ≠ ∅. Pick
a ∈ D
θ
(u, v) implies
Since a ∉ D
θ
(u, w) ∪ D
θ
(w, v), we have a ∉ D
θ
(u, w) and a ∉ D
θ
(w, v). Then
Consequently, D θ (u, v) ⊆ D θ (u, w) ∪ D θ (w, v). □
Consequently, B ∩ D θ (u, v) ≠ ∅ .
“⇐". Suppose B ∉ co
θ
(A). Then
Since
Note that
Hence B ∈ co θ (A).□
θ-discernibility sets can easily determine θ-reduction.
B ∈ red
θ
(A) ⇒ (i) if
(ii) ∀ a ∈ B,
“⇐". Denote red θ (A) = {B k : 1 ≤ k ≤ n}. We only need to prove n = 1.
Suppose n ≥ 2. Since core
θ
(A) ∈ red
θ
(A), there exists i such that core
θ
(A) = B
i
. Pick j ≠ i. Then
Thus n = 1.□
θ-discernibility sets can easily determine the θ-core.
(1) a is a necessary θ-attribute;
(2) a is θ-independent in A;
(3) ∃ u, v ∈ U, D θ (u, v) = {a}.
(1) ⇒ (2). Let a be a necessary θ-attribute. Suppose that a is not θ-independent in A. Then
B ⊆ A - {a} implies a ∉ B. Then a is not a necessary θ-attribute. This is a contradiction.
(2) ⇒ (1). Let a be θ-independent in A. Suppose that a is not a necessary θ-attribute. Then ∃ B ∈ red
θ
(A), a ∉ B. So B ⊆ A - {a} ⊆ A. This implies
Since B ∈ red
θ
(A), we have
(2) ⇒ (3). Since a is θ-independent in A, we have
Then a j ∈ D θ (u, v), a i ∉ D θ (u, v) (i ≠ j) .
Consequently, D θ (u, v) = {a j } = {a}.
(3) ⇒ (2). Since ∃ u, v ∈ U, D
θ
(u, v) = {a}, we have
Then
Consequently,
(1) a is an unnecessary θ-attribute;
(2) ∀ B ∈ co θ (A), B - {a}∈ co θ (A) ;
(3)
(4)
(1) ⇒ (2). Given B ∈ co
θ
(A). By Proposition 4.2, ∃ C ⊆ B, C ∈ red
θ
(A). Since a is an unnecessary θ-attribute, we have a ∉ C, which implies C ⊆ A - {a}. Then
Note that B ∈ co
θ
(A) and C ∈ red
θ
(A). Then
Consequently,
Hence B - {a} ∈ co θ (A) .
(2) ⇒ (3) ⇒ (4) are obvious.
(4) ⇒ (1). Suppose that a is not an unnecessary θ-attribute. Then ∃ B ∈ red
θ
(A), a ∈ B. This implies B - {a} ⊂ B. Since B ∈ red
θ
(A), we have B - {a} ∉ co
θ
(A). Then
Pick
Since B ∈ co
θ
(A) and
(1) a is a necessary θ-attribute ⇒ A - {a} ∉ co θ (A).
(2) a is a relatively necessary θ-attribute ⇒ A - {a} ∈ co
θ
(A),
(3) a is an unnecessary θ-attribute ⇒ ∀ B ∈ co θ (A), B - {a} ∈ co θ (A) .
Pick θ = 0.6, we have
(1) a2, a3 are two necessary θ-attributes.
(2) a1, a4, a5 and a6 are four relatively necessary θ-attributes.
Entropy measurement for an IIVIS
By Definition 4.15,
If
If
(1) If B1 ⊆ B2 ⊆ A, then for any θ ∈ (0, 1], H θ (B1) ≤ H θ (B2).
(2) If 0 < θ1 ≤ θ2 ≤ 1, then for any B ⊆ A, H θ 1 (B) ≤ H θ 2 (B).
(2) ⇒ (1). Suppose H
θ
(B) = H
θ
(A). Then, we have
So
Note that
So ∀ i,
This implies that ∀ i,
Consequently, R B = R A .
Hence
(1) ⇒ (3). This is clear.
(3) ⇒ (1). The proof is similar to (2) ⇒ (1).□
(1) B ∈ red (A);
(2) H θ (B) = H θ (A) and ∀ a ∈ B, H θ (B - {a}) ≠ H θ (A).
We stipulate
(1) B ∈ red (A);
(2) H θ (B) = H θ (A) and ∀ a ∈ B, Sig θ (a, B) >0.
Algorithms on θ-reduction in an IIVIS
It is more convenient to calculate all θ-reduction and the θ-core in an IIVIS by using the following θ-discernibility function.
Below, we give an algorithm on θ-reduction in an IIVIS by using mathematical logic.
“⋁”(disjunction), “⋀”(conjunction), “→”(implication), “↔”(biimplication) are propositional connectives in mathematical logic. They are read as “or”, “and”, “if-then”, “if and only if”, respectively.
Let (U, A) be an IIVIS. ∀ a ∈ A, we specify a Boolean variable “a”. If D θ (u, v) = {a1, a2, ·· · , a k } with u, v ∈ U, then we specify a Boolean function a1 ∨ a2 ∨ ·· · ∨ a k .
Denote
We stipulate that ∨ ∅ =1 and ∧ ∅ =0 where 0 and 1 are two Boolean constants.
Pick θ = 0.6, we have
Δ0.6 (A) = a3 ∧ a2 ∧ (a1 ∨ a2) ∧ (a3 ∨ a4) ∧ (a1 ∨ a6) ∧ (a5 ∨ a6) ∧ (a1 ∨ a3) ∧ (a2 ∨ a3) ∧ (a1 ∨ a3 ∨ a5) ∧ (a1 ∨ a2 ∨ a6) ∧ (a1 ∨ a4 ∨ a5) ∧ (a1 ∨ a4 ∨ a6) ∧ (a3 ∨ a4 ∨ a6) ∧ (a1 ∨ a3 ∨ a4) ∧ (a2 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a4) ∧ (a2 ∨ a3 ∨ a4) ∧ (a3 ∨ a5 ∨ a6) ∧ (a1 ∨ a3 ∨ a4 ∨ a5) ∧ (a3 ∨ a4 ∨ a5 ∨ a6) ∧ (a1 ∨ a4 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a3 ∨ a5) ∧ (a1 ∨ a3 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a3 ∨ a4) ∧ (a1 ∨ a2 ∨ a4 ∨ a6) ∧ (a1 ∨ a2 ∨ a4 ∨ a5] ∧ (a2 ∨ a3 ∨ a5 ∨ a6) ∧ (a2 ∨ a3 ∨ a4 ∨ a5) ∧ (a1 ∨ a2 ∨ a3 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a6) ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5) ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6) ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a6) ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6)
Denote
A binary relation “≤” on L (A) is defined as follows:
For any ⋁d ij , ⋁ d kl ∈ L (A), we denote
Consequently, (L (A) , ≤ , ⊔ , ⊓) is a lattice with top element and bottom element.□
Pick θ = 0.6, we have a2 ∧ (a1 ∨ a2) = a2, a2∧ (a2 ∨ a3) = a2, a2 ∧ (a1 ∨ a2 ∨ a6) = a2, a2 ∧ (a2∨ a5 ∨ a6) = a2, a2 ∧ (a1 ∨ a2 ∨ a4) = a2, a2 ∧ (a2 ∨ a3 ∨ a4) = a2, a2 ∧ (a1 ∨ a2 ∨ a3 ∨ a5) = a2, a2∧ (a1 ∨ a2 ∨ a3 ∨ a4) = a2, a2 ∧ (a1 ∨ a2 ∨ a4 ∨ a6) = a2, a2∧ (a1 ∨ a2 ∨ a4 ∨ a5) = a2, a2 ∧ (a2 ∨ a3 ∨ a5 ∨ a6) = a2, a2 ∧ (a2 ∨ a3 ∨ a4 ∨ a5) = a2, a2∧ (a1 ∨ a2 ∨ a3 ∨ a5 ∨ a6) = a2, a2 ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a6) = a2, a2 ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5) = a2, a2 ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6) = a2, a2 ∧ (a1 ∨ a2 ∨a3∨ a4 ∨ a6) = a2, a2 ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6) = a2, a3 ∧ (a3 ∨ a4) = a3, a3 ∧ (a1 ∨ a3) = a3, a3 ∧ (a1 ∨ a3 ∨ a5) = a3, a3 ∧ (a3 ∨ a4 ∨ a6) = a3, a3 ∧ (a1 ∨ a3 ∨ a4) = a3, a3 ∧ (a3 ∨ a5 ∨ a6) = a3, a3 ∧ (a1 ∨ a3 ∨ a4 ∨ a5) = a3, a3 ∧ (a3 ∨ a4 ∨ a5 ∨ a6) = a3, a3 ∧ (a1 ∨ a3 ∨ a5 ∨ a6) = a3, a3 ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a6) = a3, (a1 ∨ a6) ∧ (a1 ∨ a4 ∨ a6) = (a1 ∨a6) , (a1 ∨ a6) ∧ (a1 ∨ a4 ∨ a5 ∨ a6) = (a1 ∨ a6) .
Then Δ0.6 (A) = a2 ∧ a3 ∧ (a1 ∨ a6) ∧ (a5 ∨ a6) ∧ (a1 ∨ a4 ∨ a5) = (a1 ∧ a2 ∧ a3 ∧ a5) ∨ (a1 ∧ a2 ∧ a3 ∧ a6) ∨ (a2 ∧ a3 ∧ a4 ∧ a6) ∨ (a2 ∧ a3 ∧ a5 ∧a6) ∨ (a1 ∧ a2 ∧ a3 ∧ a4 ∧ a5) ∨ (a1 ∧ a2 ∧ a3 ∧ a4 ∧ a6) ∨ (a1 ∧ a2 ∧ a3 ∧ a5 ∧ a6) ∨ (a2 ∧ a3 ∧ a4 ∧ a5 ∧ a6).
Consequently,
(i) Clearly,
Since
Then ∀ u, v ∈ U, ⋀B k 0 → ⋁ D θ (u, v).
So
Now ⋀B
k
0
⇒ a
k
0
l
for any l ≤ p
k
0
and ⋁D
θ
(u, v) ↔ a for some a ∈ D
θ
(u, v). Then
So
Consequently,
By Proposition 4.8, B k 0 ∈ co θ (A).
(ii) To prove B k 0 ∈ red θ (A), by Theorem 3.8, we only need to show that
Suppose that ∃ a0 ∈ B
k
0
such that (B
k
0
- {a0})∩ D
θ
(u, v) ≠ ∅ for any
Consequently,
It follows that ∀ u, v ∈ U, ⋀ (B k 0 - {a0}) → ⋁ D θ (u, v) .
Since
= (⋀ (B k 0 - {a0})) ⋀ ({a0} ⋁1)
= (⋀ (B k 0 - {a0})) ⋀ 1
= ⋀ (B k 0 - {a0}).
So B k 0 ∉ {B k : k ≤ q}. This is a contradiction.
Consequently, B k 0 ∈ red θ (A). This shows that red θ (A) ⊇ {B k : k ≤ q}.
(2) Let B ∈ red
θ
(A). Then B ∈ co
θ
(A). By Proposition 4.8, B∩ D
θ
(u, v) ≠ ∅ for any
Similar to the proof of (1) (ii), we can show that B ∈ {B k : k ≤ q} .
Consequently, red θ (A) ⊆ {B k : k ≤ q}.
Hence red θ (A) = {B k : k ≤ q} . □
When
In order to ensure that there always exists a θ-reduct of A, we demand
From Example 5.6, we obtain Δ0.6 (A) and
Consequently, red0.6 (A) = {{a1, a2, a3, a5} , {a1, a2, a3, a6} , {a2, a3, a4, a6} , {a2, a3, a5, a6} , {a1, a2, a3, a4, a5} , {a1, a2, a3, a4, a6} , {a1, a2, a3, a5, a6} , {a2, a3, a4, a5, a6}},
Obviously, core0.6 (A) = {a2, a3}
We can get reduction algorithm based on θ-discernibility matrix in Algorithm 2.
Below, we analyze the time complexity and space complexity of Algorithm 2. |A| and |U| are applied to respectively denote the the numbers of attributes and samples. It should be pointed out that, we need to compute such the tolerance relation matrices with respect to each attribute that cost O (|A||U|2) and the θ-discernibility matrix that cost O (|A| (|U|2 - |U|)/2). We compute reduce set cost O (((|U|2 - |U|)/2) ((|U|2 - |U|)/2)). Thus, the overall time complexity of Algorithm 2 is O ((|U|4 + 2|U|3)/4 + (6|A||U|2 + 1)/4 - |A||U|/2). The space for all the subsequent matrices and local variables can be reused. So, the space complexity of Algorithm 2 is O (|A||U|2).
Reduction algorithm based on θ-information entropy is obtained which is showed in Algorithm 3.
Algorithm 3 employs θ-information entropy H θ to determine the optimal attribute which is added into the current selected attribute subset in each loop. In the worst case, this process is terminated when the whole feature set has been exhausted. The worst search time for a reduct will result in |A| evaluations. Thus, the overall time complexity of Algorithm 3 is O (|A||U|2 + |A|). Also, at the initial stage, |C| fuzzy relation matrices of |U| × |U| are computed and stored corresponding to each individual features. Similarly, the space complexity of Algorithm 3 is O (|A||U|2).
We can get reduction algorithm based on θ-significance which is showed in Algorithm 4.
Here, the worst search time for a reduct will result in |A| (|A|+1)/2 evaluations. So the time complexity and space complexity of Algorithm 4 are O (|A||U|2 + (|A|2 + |C|)/2) and O (|A||U|2), respectively.
The testing data sets
The testing data sets
Based on the above preparation, we take the testing data sets to do experiments.
(1) We randomly generate six interval-value data sets with Table 10 and then use Algorithm 2 to reduce. The result of one experiment is shown in the table 11. We get 112 reduction sets, and the average size of these reduction sets is 9.4 on wine data, 13 reduction sets, and the average size of these reduction sets is 9.4 on Breast Tissue data, 14 reduction sets, and the average size of these reduction sets is 2.9 on seeds data, 2847 reduction sets, and the average size of these reduction sets is 22.2 on wdbc data, 97 reduction sets, and the average size of these reduction sets is 7.3 on leaf data, 110 reduction sets, and the average size of these reduction sets is 10.2 on parkinsons data,respectively. θ-discernibility matrix algorithm can obtain multiple reductions. Figures 2–7 shows the number of attributes for each reduction with θ-discernibility matrix.

Number of attributes by Algorithm 2(Wine).

Number of attributes by Algorithm 2(Breast-Tissue).

Number of attributes by Algorithm 2(Seeds).

Number of attributes by Algorithm 2(Wbdc).

Number of attributes by Algorithm 2(Leaf).

Number of attributes by Algorithm 2(Parkinsons).
(2) Each time of Algorithm 3 is run, it will get a reduction set. Here we run it 20 times on each data set and get the reduction results in Table 12 as follows. Figure 8 shows the average of the number of selected features on each data set.
Numbers of selected features with θ-discernibility matrix
Numbers of selected features with θ-information entropy

Number of attributes by Algorithm 3.
(3) Each time of Algorithm 4 is run, it will get a result. Here we run it 20 times and get the reduction results in Table 13 as follows. Figure 9 shows the average of the number of attributes each data set.
Numbers of selected features with θ-significance

Number of attributes by Algorithm 4.
In this section, comparisons with reduction in other kinds of ISs are carried out and conclusions are given.
1) Aiming at the problem of attribute reduction in interval-value IS, Chen et al. [4] introduced a variable precision tolerance relation and a kind of maximal variable precision tolerance class. Then they defined a kind of discernibility function and relative discernibility function based on the discernibility matrix. Finally, they proposed attribute reduction and relative attribute reduction in interval-valued ISs.
2) Considering that the existing attribute reduction method for single valued data is not suitable for interval valued data. From the viewpoint of information theory, Dai et al. [6] raised attribute reduction in interval-valued data. They gave some information theory concepts in interval-valued IS. And thay put forward an information theory view for attribute reduction in interval-valued IS based on these concepts.
3) Zhang et al. [33] thought over compact decision rules in an interval-valued decision system for attribute reduction. First, they put forward the concept of interval-value granular rules in interval-value decision system. Next, they presented an index to measure the confidence of an interval-valued granular rule and defined implication relationship between the interval-valued granular rules whose confidences are not less than the threshold. Last, they proposed a confidence-preserved attribute reduction approach based on the implication relationship.
4) Generally, people study attribute selection from two aspects of relation reduction and information entropy reduction. In this way, two approaches of attribute selection are formed. This article has given attribute selection approaches for incomplete interval-value data. θ-reduction and θ-information entropy-reduction for incomplete interval-value data have been proposed. It is worth mentioning that connections between these two reducts have been researched. We has proved that these two reducts are essentially the same. This is one of the main contributions of this paper. By using these two approaches of attribute selection, reduction algorithms based on θ-discernibility matrix, θ-information entropy and θ-significance in an IIVIS have been given, respectively. Time complexity and space complexity of these three algorithms have been presented. In future work, we will study applications of attribute selection approaches for incomplete interval-value data.
Footnotes
Acknowledgments
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by National Natural Science Foundation of China (11971420), Natural Science Foundation of Guangxi (2018GXNSFDA294003, 2018GXNSFAA294134), Guangxi Higher Education Institutions of China ([2019] 52), Special Scientific Research Project of Young Innovative Talents in Guangxi (2019 AC20052), Key Laboratory of Software Engineering in Guangxi University for Nationalities (2018-18XJSY-03), Research Project of Institute of Big Data in Yulin (YJKY03) and Engineering Project of Undergraduate Teaching Reform of Higher Education in Guangxi (2017JGA179).
