Abstract
Set-valued data is a significant kind of data, such as data obtained from different search engines, market data, patients’ symptoms and behaviours. An information system (IS) based on incomplete set-valued data is called an incomplete set-valued information system (ISVIS), which generalized model of a single-valued incomplete information system. This paper gives feature selection for an ISVIS by means of uncertainty measurement. Firstly, the similarity degree between two information values on a given feature of an ISVIS is proposed. Then, the tolerance relation on the object set with respect to a given feature subset in an ISVIS is obtained. Next, λ-reduction in an ISVIS is presented. What’s more, connections between the proposed feature selection and uncertainty measurement are exhibited. Lastly, feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS are provided. In order to better prove the practical significance of the provided algorithms, a numerical experiment is carried out, and experiment results show the number of features and average size of features by each feature selection algorithm.
Introduction
Research background and related works
Rough set theory (RST) put forward by Pawlak [15]. The rough set model has been extensively extended. One is to put equivalence relations into tolerance (reflexive and symmetric) relations, dominance (reflexive and transitive) relations, and even reflexive relations; the other is to extend the division of equivalence classes to cover, which can be used to deal with more complex knowledge acquisition and feature selection. RST is a significant method for dealing with uncertainty. Many applications of RST are connected with an information system (IS) [1, 15].
An IS based on RST was also introduced by Pawlak. A set-valued information system (SVIS) means an IS where its information values are sets. As a common IS, a SVIS has been studied by many scholars.
As an important tool for estimating information, uncertainty measurement has been applied. We refer to the papers about uncertainty measurement of other scholars. For instance, Dai et al. [4] thought about entropy and granularity measures for a SVIS. Dai et al. [4] thought about entropy measure in a SVIS; Li et al. [14] used four kinds of entropy to measure the uncertainty of a fuzzy relation IS.
Feature selection is a significant issue in artificial intelligence. In the framework of RST [15], feature selection is also called attribute reduction, which is to find a minimal feature subset that provides the same discriminating information as the whole set of features. Specifically, some features in an IS are redundant. We want to find a reduct that has the fewest features. Feature selection in an IS mean deleting redundant features under keeping the classification ability. Feature selection has attracted a great deal of attention. For instance, Song et al. [19] discussed feature selection in a set-valued decision information system (SVDIS); Liu et al. [14] presented feature selection in a SVDIS based on dominance relation; Chen et al. [3] proposed feature selection in a SVIS based on a tolerance relation.
Sometimes we encounter incomplete data sets with missing values when analyzing data. Many scholars have carried out research on this situation. Tkachenko et al. [20] proposed a method to deal with the missing data in the Internet of Things (IoT) based on the GRNN-SGTM Ensemble. Izonin et al. [9] used high-performance extended-input neural-like structure to recover the missing values of IoT sensed Data. Izonin et al. [10] presented a prediction method for probable recovery of partially missing or completely lost data based on the improved GRNN-SGTM ensemble method. Li et al. [11, 13] studied fast assignment feature selection and quick feature selection in inconsistent incomplete decision information systems, respectively.
Motivation and inspiration
Handling incomplete data is becoming increasingly significant. However, the problem of feature selection for incomplete set-valued data has not been reported. So this paper focuses on it. Similarity measurement in an incomplete set-valued information system (ISVIS) is studied and the tolerance relations based on this measurement are put forward. Then, λ-discernibility matrix in an ISVIS is obtained by means of the given threshold λ. Next, λ-information granulation, λ-information entropy and λ-significance in an ISVIS are defined. Thus, some feature selection algorithms in an ISVIS based on them can given. And the work process is displayed in Fig. 1.

The work process of the paper.
In this part, we discuss several references for ISVISs so as to see the innovation and contribution of this paper.
1) Lang et al. [12] proposed three relations for feature selection of a SVDIS. Then, they designed an incremental algorithm to compress a dynamic SVDIS. Concretely, they mainly addressed the compression updating from three aspects: variations of attribute set, immigration and emigration of objects and alterations of attribute values.
2) Dai et al. [5] gave the relative bound difference similarity degree between two objects and presented a λ-tolerance relation by using this similarity degree. Moreover, they brought up some information theory concepts, including entropy, conditional entropy, and joint entropy. Based on these concepts, they provided an information theory view for feature selection in a SVDIS.
3) Wang et al. [21] dealt with feature selection of a SVDIS based on λ-level tolerance relation. They introduced the concepts of distribution reduct based on a SVDIS and examined the judgement theorems and discernibility matrices associated with the λ-level tolerance relation.
4) Liu et al. [14] defined a dominance relation in a SVDIS based on similarity degree of attribute value sets. They a discernibility matrix by this dominance relation and studied the problem of feature selection and the judgment in a SVDIS.
5) Singh et al. [18] introduced a new similarity degree between two set-valued attribute values and then proposed a fuzzy similarity-based rough set approach based on this fuzzy tolerance relation. In addition, feature selection of a SVDIS based on degree of dependency is postulated.
6) This paper aims to concentrate on feature selection for incomplete set-valued data. Given an ISVIS, we propose similarity measurement in an ISVIS and put forward the tolerance relations in an ISVIS based on the similarity measurement. Then, we can obtain θ-conditional entropy in an ISVIS by means of this tolerance relation. Lastly, we propose feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS. Generally, people study feature selection from two aspects of relation reduction and information entropy reduction. In this way, two methods of feature selection are formed. It has been proved that these two reducts are essentially the same. Thus, these two methods are actually the same. This is the main contribution of this paper.
The comparison and discussion between this paper and several representative literatures are shown in Table 1.
Organization and structure
The rest of this paper is designed as below. Section 2 retrospects an ISVIS. Section 3 gives the tolerance relation in an ISVIS. Section 4 investigates λ-reduction in an ISVIS. Section 5 gives feature selection algorithms in an ISVIS based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance. Section 6 concludes this paper.
Preliminaries
In this paper, U and AT denote two finite universes, 2 U expresses a set of all subsets of U and |X| means the number of elements in X ∈ 2 U .
Put
Let R be a binary relation on U. Then R is said to be a universal relation on U whenever R = δ; R is said to be an identity relation on U whenever R =▵.
Consider that U is a finite object set and AT is a finite feature set. If each feature a ∈ AT determines an information function a : U → V a , where V a = {a (u) : u ∈ U} is the set of information function values of the feature a, then (U, AT) is said to be an information system (IS).
Let (U, AT) be an IS. If there are u ∈ U and a ∈ AT such that a (u) is missing, then (U, AT) is regarded as an incomplete information system (IIS). For each a ∈ AT, denote
If P ⊆ AT, then (U, P) is referred to as a subsystem of (U, AT).
The comparison of this paper with other literatures
The comparison of this paper with other literatures
We have
In this section, the tolerance relation induced by each subsystem in an ISVIS is given.
The similarity degree between information values on a feature
s (a (u) , a (v)) =
For the convenience of expression, denote
An ISVIS
An ISVIS
Obviously,
In addition, if P = {a}, then
(1) If P1 ⊆ P2 ⊆ AT, then ∀ λ ∈ (0, 1],
(2) If 0 ≤ λ1 ≤ λ2 ≤ 1, then ∀ P ⊆ AT,
For any λ ∈ (0, 1], P ⊆ AT and u ∈ U, denote
Clearly,
In addition, express
(1) If C ⊆ P ⊆ AT, then ∀ λ ∈ (0, 1] and u ∈ U,
(2) If 0 ≤ λ1 ≤ λ2 ≤ 1, then ∀ P ⊆ AT and u ∈ U,
□
Pick λ = 0.4. Then ∀ i and u ∈ U, the tolerance class
The tolerance class of each object under
The tolerance class of each object under
The tolerance class of each object under
λ-reduction, λ-core and λ-discernibility matrix in an ISVIS
Given an ISVIS (U, AT), each subset P of AT determines a tolerance relation (or indiscernibility relation)
(1) P is regarded as a λ-coordinate subset of AT, if
(2) a ∈ P is said to be λ-independent in P, if
(3) P is regarded as a λ-reduct of AT, if P is both λ-coordinate and λ-independent.
“
In this paper, the family of all λ-coordinate subsets (resp., λ-reducts) of AT is denoted by co λ (AT) (resp., red λ (AT)).
Obviously,
P ∈ red λ (AT) means that P is a minimal feature subset where the subsystem (U, P) has the same classification ability as the system (U, AT) with respect to a given λ.
If ∃ a1 ∈ AT, AT - {a1} ∈ co λ (AT), then we consider AT - {a1}. Again if ∀ a ∈ AT - {a1}, (AT - {a1}) - {a} ∉ co λ (AT), then AT - {a1} ∈ red λ (AT). Again if ∃ a2 ∈ AT - {a1}, (AT - {a1}) - {a2} ∈ co λ (AT), then we consider AT - {a1, a2}. Repeat this process. Since AT is finite, a λ-reduct of AT can be found.□
Then
(1) D λ (u, v) is said to be the λ-discernibility set on u and v.
(2)
Some properties
(1) D λ (u, u) =∅.
(2) D λ (u, v) = D λ (v, u).
(3) D λ (u, v) ⊆ D λ (u, w) ∪ D λ (w, v).
(3) Suppose that D
λ
(u, v) ⊊ D
λ
(u, w) ∪ D
λ
(w, v). Then D
λ
(u, v) - D
λ
(u, w)∪ D
λ
(w, v) ≠ ∅. Pick
a ∈ D
λ
(u, v) implies
Since a ∉ D
λ
(u, w) ∪ D
λ
(w, v), we have a ∉ D
λ
(u, w) and a ∉ D
λ
(w, v). Then
Thus D λ (u, v) ⊆ D λ (u, w) ∪ D λ (w, v).□
Thus P ∩ D λ (u, v) ≠ ∅ .
“⇐". Suppose P ∉ co
λ
(AT). Then
Since
Note that
Hence P ∈ co λ (AT).□ λ-discernibility sets can easily determine λ-reducts.
P ∈ red
λ
(AT) ⇔ (i) if
(ii) ∀ a ∈ P,
(1) a ∈ AT is regarded as a necessary λ-feature, if a ∈ core λ (AT).
(2) a ∈ AT is said to be a relatively necessary λ-feature, if a ∈ ⋃ P∈red λ (AT)P - core λ (AT).
(3) a ∈ AT is regarded as an unnecessary λ-feature, if a ∈ AT - ⋃ P∈red λ (AT)P.
“⇐". Denote red λ (AT) = {P k : 1 ≤ k ≤ n}. We only need to prove n = 1.
Suppose n ≥ 2. Since core
λ
(AT) ∈ red
λ
(AT), there exists i such that core
λ
(AT) = P
i
. Pick j ≠ i. Then
Thus n = 1.□ λ-discernibility sets can easily determine the λ-core.
(1) a is a necessary λ-feature;
(2) a is λ-independent in AT;
(3) ∃ u, v ∈ U, D λ (u, v) = {a}.
(1) ⇒ (2). Let a be a necessary λ-feature. Suppose that a is not λ-independent in AT. Then
P ⊆ AT - {a} implies a ∉ P. Then a is not a necessary λ-feature. This is a contradiction.
(2) ⇒ (1). Let a be λ-independent in AT. Suppose that a is not a necessary λ-feature. Then ∃ P ∈ red
λ
(AT), a ∉ P. So P ⊆ AT - {a} ⊆ AT. This implies
Since P ∈ red
λ
(AT), we have
(2) ⇒ (3). Since a is λ-independent in AT, we have
Then a j ∈ D λ (u, v), a i ∉ D λ (u, v) (i ≠ j) .
Thus D λ (u, v) = {a j } = {a}.
(3) ⇒ (2). Since ∃ u, v ∈ U, D
λ
(u, v) = {a}, we have
Then
Thus
(1) a is an unnecessary λ-feature;
(2) ∀ P ∈ co λ (AT), P - {a}∈ co λ (AT) ;
(3)
(4)
(1) ⇒ (2). Suppose P ∈ co
λ
(AT). By Proposition 4.2, ∃ C ⊆ P, C ∈ red
λ
(AT). Since a is an unnecessary λ-feature, we have a ∉ C, which implies C ⊆ AT - {a}. Then
Note that P ∈ co
λ
(AT) and C ∈ red
λ
(AT). Then
Thus
Hence P - {a} ∈ co λ (AT) .
(2) ⇒ (3) ⇒ (4) are obvious.
(4) ⇒ (1). Suppose that a is not an unnecessary λ-feature. Then ∃ P ∈ red
λ
(AT), a ∈ P. This implies P - {a} ⊂ P. Since P ∈ red
λ
(AT), we have P - {a} ∉ co
λ
(AT). Then
Pick
Since P ∈ co
λ
(AT) and
(1) a is a necessary λ-feature ⇔ AT - {a} ∉ co λ (AT);
(2) a is a relatively necessary λ-feature ⇔ AT - {a} ∈ co
λ
(AT) and
(3) a is an unnecessary λ-feature ⇔ ∀ P ∈ co λ (AT), P - {a} ∈ co λ (AT) .
(1) a4, a7 and a8 are three necessary λ-feature;
(2) a2, a3, a5 and a6 are four relatively necessary λ-features;
(3) a1 is an unnecessary λ-features.
Uncertainty measurement for an ISVIS
By Definition 4.15,
If
If
(1) If P1 ⊆ P2 ⊆ AT, then ∀ λ ∈ (0, 1], G λ (P2) ≤ G λ (P1).
(2) If 0 < λ1 ≤ λ2 ≤ 1, then for any P ⊆ AT, G λ 2 (P) ≤ G λ 1 (P).
By Definition 4.18,
If
If
(1) If P1 ⊆ P2 ⊆ AT, then ∀ λ ∈ (0, 1], H λ (P1) ≤ H λ (P2).
(2) If 0 < λ1 ≤ λ2 ≤ 1, then for any P ⊆ AT, H λ 1 (P) ≤ H λ 2 (P).
(1) P ∈ co (AT);
(2) G λ (P) = G λ (AT) ;
(3) H λ (P) = H λ (AT) .
(1) ⇒ (2). This is obvious.
(2) ⇒ (1). Suppose G
λ
(P) = G
λ
(AT). Then, we have
So
Note that
So ∀ i,
Thus R P = R AT .
Hence
(1) ⇒ (3). This is clear.
(3) ⇒ (1). Suppose H
λ
(P) = H
λ
(AT). Then, we have
So
Note that
Thus ∀ i,
Consequently, R P = R AT .
Hence
We stipulate
(1) P ∈ red (AT);
(2) G λ (P) = G λ (AT) and ∀ a ∈ P, G λ (P - {a}) ≠ G λ (AT);
(3) H λ (P) = H λ (AT) and ∀ a ∈ P, H λ (P - {a}) ≠ H λ (AT);
(4) G
λ
(P) = G
λ
(AT) and ∀ a ∈ P,
(5) H
λ
(P) = H
λ
(AT) and ∀ a ∈ P,
Algorithms on λ-reduction in an ISVIS
Below, we give an algorithm on λ-reduction in an ISVIS by using mathematical logic.
Let (U, AT) be an ISVIS. ∀ a ∈ AT, we specify a Boolean variable “a”. If D λ (u, v) = {a1, a2, ·· · , a k } with u, v ∈ U, then we specify a Boolean function a1 ∨ a2 ∨ ·· · ∨ a k .
Denote
We stipulate that ∨ ∅ =1 and ∧ ∅ =0 where 0 and 1 are two Boolean constants.
▵0.4 (AT) = a8 ∧ a4 ∧ a7 ∧ (a4 ∨ a8) ∧ (a3 ∨ a4) ∧ (a1 ∨ a4) ∧ (a4 ∨ a7) ∧ (a4 ∨ a5) ∧ (a1 ∨ a7) ∧ (a4 ∨ a7 ∨ a8) ∧ (a4 ∨ a5 ∨ a7) ∧ (a1 ∨ a4 ∨ a6) ∧ (a4 ∨ a6 ∨ a7) ∧ (a4 ∨ a6 ∨ a8) ∧ (a3 ∨ a4 ∨ a8) ∧ (a3 ∨ a7 ∨ a8) ∧ (a2 ∨ a4 ∨ a8) ∧ (a2 ∨ a6 ∨ a7) ∧ (a2 ∨ a4 ∨ a7) ∧ (a2 ∨ a7 ∨ a8) ∧ (a6 ∨ a7 ∨ a8) ∧ (a3 ∨ a4 ∨ a6 ∨ a7) ∧ (a4 ∨ a5 ∨ a6 ∨ a7) ∧ (a4 ∨ a6 ∨ a7 ∨ a8) ∧ (a1 ∨ a3 ∨ a4 ∨ a7) ∧ (a1 ∨ a4 ∨ a7 ∨ a8) ∧ (a2 ∨ a4 ∨ a7 ∨ a8) ∧ (a1 ∨ a4 ∨ a6 ∨ a7) ∧ (a2 ∨ a5 ∨ a6 ∨ a8) ∧ (a1 ∨ a4 ∨ a5 ∨ a8) ∧ (a2 ∨ a4 ∨ a5 ∨ a7) ∧ (a1 ∨ a3 ∨ a5 ∨ a7) ∧ (a1 ∨ a2 ∨ a4 ∨ a7) ∧ (a3 ∨ a4 ∨ a5 ∨ a7) ∧ (a2 ∨ a3 ∨ a5 ∨ a6) ∧ (a3 ∨ a4 ∨ a6 ∨ a7 ∨ a8) ∧ (a2 ∨ a3 ∨ a5 ∨ a7 ∨ a8) ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a7) ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a7) ∧ (a2 ∨ a4 ∨ a5 ∨ a7 ∨ a8) ∧ (a1 ∨ a3 ∨ a4 ∨ a7 ∨ a8) ∧ (a1 ∨ a2 ∨ a5 ∨ a6 ∨ a7) ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a8) ∧ (a1 ∨ a3 ∨ a4 ∨ a6 ∨ a7) ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7) ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a6 ∨ a7) ∧ (a2 ∨ a4 ∨ a5 ∨ a6 ∨ a7 ∨ a8) ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5 ∨ a7) ∧ (a1 ∨ a2 ∨ a3 ∨ a5 ∨ a6 ∨ a7) ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a8) ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7) ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a7 ∨ a8) ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7 ∨ a8)
Denote
Define
⋁d ij ≤ ⋁ d kl ⇔ d ij ⊆ d kl , ∀ ⋁ d ij , ⋁ d kl ∈ L (AT) .
Denote
(⋁ d ij ) ⋃ (⋁ d kl ) = ⋁ (d ij ∪ d kl ) ,
(⋁ d ij ) ⋂ (⋁ d kl ) = ⋁ (d ij ∩ d kl ) .
Thus (L (AT) , ≤ , ⋃ , ⋂) is a lattice with top element and bottom element.□
Pick λ = 0.4, we have a4 ∧ (a4 ∨ a8) = a4, a4 ∧ (a3 ∨ a4) = a4, a4 ∧ (a1 ∨ a4) = a4, a4 ∧ (a4 ∨ a7) = a4, a4 ∧ (a4 ∨ a5) = a4, a4 ∧ (a4 ∨ a7 ∨ a8) = a4, a4 ∧ (a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a1 ∨ a4 ∨ a6) = a4, a4 ∧ (a4 ∨ a6 ∨ a7) = a4, a4 ∧ (a4 ∨ a6 ∨ a8) = a4, a4 ∧ (a3 ∨ a4 ∨ a8) = a4, a4 ∧ (a2 ∨ a4 ∨ a8) = a4, a4 ∧ (a2 ∨ a4 ∨ a7) = a4, a4 ∧ (a3 ∨ a4 ∨ a6 ∨ a7) = a4, a4 ∧ (a4 ∨ a5 ∨ a6 ∨ a7) = a4, a4 ∧ (a4 ∨ a6 ∨ a7 ∨ a8) = a4, a4 ∧ (a1 ∨ a3 ∨ a4 ∨ a7) = a4, a4 ∧ (a1 ∨ a4 ∨ a7 ∨ a8) = a4, a4 ∧ (a2 ∨ a4 ∨ a7 ∨ a8) = a4, a4 ∧ (a1 ∨ a4 ∨ a6 ∨ a7) = a4, a4 ∧ (a1 ∨ a4 ∨ a5 ∨ a8) = a4, a4 ∧ (a2 ∨ a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a1 ∨ a2 ∨ a4 ∨ a7) = a4, a4 ∧ (a3 ∨ a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a3 ∨ a4 ∨ a6 ∨ a7 ∨ a8) = a4, a4 ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a2 ∨ a4 ∨ a5 ∨ a7 ∨ a8) = a4, a4 ∧ (a1 ∨ a3 ∨ a4 ∨ a7 ∨ a8) = a4, a4 ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a8) = a4, a4 ∧ (a1 ∨ a3 ∨ a4 ∨ a6 ∨ a7) = a4, a4 ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7) = a4, a4 ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ∨ a6 ∨ a7) = a4, a4 ∧ (a2 ∨ a4 ∨ a5 ∨ a6 ∨ a7 ∨ a8) = a4, a4 ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5 ∨ a7) = a4, a4 ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a8) = a4, a4 ∧ (a1 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7) = a4, a4 ∧ (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a7 ∨ a8) = a4, a4 ∧ (a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7 ∨ a8) = a4, a7 ∧ (a1 ∨ a7) = a7, a7 ∧ (a3 ∨ a7 ∨ a8) = a7, a7 ∧ (a2 ∨ a6 ∨ a7) = a7, a7 ∧ (a2 ∨ a7 ∨ a8) = a7, a7 ∧ (a1 ∨ a3 ∨ a5 ∨ a7) = a7, a7 ∧ (a2 ∨ a3 ∨ a5 ∨ a7 ∨ a8) = a7, a7 ∧ (a1 ∨ a2 ∨ a5 ∨ a6 ∨ a7) = a7, a7 ∧ (a1 ∨ a2 ∨ a3 ∨ a5 ∨ a6 ∨ a7) = a7, a8 ∧ (a2 ∨ a5 ∨ a6 ∨ a8) = a8 .
Then
Thus
(i) Clearly,
Since
Then ∀ u, v ∈ U, ⋀P k 0 ⟶ ⋁ D λ (u, v).
So
Now ⋀P
k
0
⇔ a
k
0
l
∀ l ≤ p
k
0
and ⋁D
λ
(u, v) ⟷ a for some a ∈ D
λ
(u, v). Then
So
Thus
By Proposition 4.7, P k 0 ∈ co λ (AT).
(ii) To prove P k 0 ∈ red λ (AT), by Theorem 3.8, we only need to show that
Suppose that ∃ a0 ∈ P
k
0
such that (P
k
0
- {a0})∩ D
λ
(u, v) ≠ ∅ for any
Thus
It follows that ∀ u, v ∈ U,
Since
(⋀ P k 0 ) ⋁ (⋀ (P k 0 - {a0}))
= ((⋀ (P k 0 - {a0})) ⋀ {a0}) ⋁ ((⋀ (P k 0 - {a0})) ⋀1)
= (⋀ (P k 0 - {a0})) ⋀ ({a0} ⋁1)
= (⋀ (P k 0 - {a0})) ⋀1
= ⋀ (P k 0 - {a0}).
So P k 0 ∉ {P k : k ≤ q}. This is a contradiction.
Thus P k 0 ∈ red λ (AT). This shows that red λ (AT) ⊇ {P k : k ≤ q}.
(2) Let P ∈ red
λ
(AT). Then P ∈ co
λ
(AT). By Proposition 4.7, P∩ D
λ
(u, v) ≠ ∅ for any
Similar to the proof of (1) (ii), we can show that P ∈ {P k : k ≤ q} .
Thus red λ (AT) ⊆ {P k : k ≤ q}.
Hence
In order to ensure that there always exists a λ-reduct of AT, we demand
We can get a feature selection algorithm based on λ-discernibility matrix in an ISVIS (see Algorithm 2).
Below, we analyze the time complexity and space complexity of Algorithm 2. |AT| and |U| are denoted by the the numbers of features and objects, respectively. It should be pointed out that we need to compute such the tolerance relation matrices with respect to each feature that cost O (|AT||U|2) and the λ-discernibility matrix that cost O (|AT| (|U|2 - |U|)/2). Now we analyze the time of computing the reducts. Firstly, we use idempotent properties and absorptive properties to reduce conjunctive normal form. In the worst case, it costs O (((|U|2 - |U|)/2) ((|U|2 - |U|)/2)). Conjunctive normal form can be changed into disjunctive normal form by using the incremental algorithm in [24]. Then, we can call this algorithm to solve the reducts. We ignore the computation time here. Thus, the overall time complexity of Algorithm 2 is O ((|U|4 + 2|U|3)/4 + (6|AT||U|2 + 1)/4 - |AT||U|/2). The space for all the subsequent matrices and local variables can be reused. So, the space complexity of Algorithm 2 is O (|AT||U|2).
From Example 5.6, we obtain
Thus red0.4 (AT) = {{a2, a4, a7, a8} , {a3, a4, a7, a8} , {a4, a5, a7, a8} , {a4, a6, a7, a8}}.
Obviously, core0.4 (AT) = {a4, a7, a8}.
Based on relationships between λ-information granulation and the proposed feature selection, we can get a feature selection algorithm based on λ-information granulation in an ISVIS (see Algorithm 3).
Based on relationships between λ-information entropy and the proposed feature selection, we can get the following algorithm.
Algorithms 3-4 employ λ-information entropy G λ and H λ to determine the optimal feature which is added into the current selected feature subset in each loop. In the worst case, this process is terminated when the whole feature set has been exhausted. The worst search time for a reduct will result in |AT| evaluations. Thus, the overall time complexity of Algorithm 3-4 are O (|AT||U|2 + |AT|). Also, at the initial stage, |AT| fuzzy relation matrices of |U| × |U| are computed and stored corresponding to each individual features. Similarly, the space complexity of Algorithm 3 is O (|AT||U|2).
Based on relationships between λ-significance and the proposed feature selection, we can get two feature selection algorithms based on λ-significance in an ISVIS (see Algorithms 5 and 6).
Here, the worst search time for a reduct will result in |AT| (|AT|+1)/2 evaluations by Algorithm 5-6. So the time complexity and space complexity of Algorithm 5-6 are O (|AT||U|2 + (|AT|2 + |AT|)/2) and O (|AT||U|2), respectively.
The tolerance class of each object under
(a = a5, a6, λ = 0.4)
The tolerance class of each object under
In fact, the data set is a complete data. However, we need to study incomplete set value data. Therefore, We delete some data from raw data set to get the missing data set. In my paper, we randomly missed at a probability of 0.6 in |U||A|. In summary, some incomplete set value data are gained by this way.
Based on the above preparation, we take the new data to do the experiment.
(1) λ-feature selection algorithm:
We randomly generate ten incomplete set-valued data sets and then use λ-feature selection algorithm to reduce. The result is as follows in Table 15. Table 16 gives reduction set by λ-feature selection algorithms.
From Table 15, no reduction is one time, 1 reduction is one time, 2 reductions are four times and 3 reductions are four times in the 10 tests. The average size of the reduct is 2.95.
A practical conjunctive set-valued information system from the test for foreign language ability in Shanxi University
Reduction set by λ-feature selection algorithm
Fig. 2 shows the number of features for each reduction with λ-red algorithm.

Number of features by λ-feature selection algorithm.
(2) G λ -feature selection algorithm:
We randomly generate ten incomplete set-valued data sets. Here we run G λ -red algorithm on each data set and get ten reduction results. The average size of the reduct is 3.6. Fig. 3 shows the average of the number of selected features on each data set in 10 tests.

Number of features by G λ -red algorithm.

Number of features by H λ -red algorithm.

Number of features by G λ -sig algorithm.
(3) H λ -feature selection algorithm:
We randomly generate ten incomplete set-valued data sets. Here we run H λ -red algorithm on each data set and get ten reduction results. The average size of the reduct is 3.4. Fig.4 shows the average of the number of selected features on each data set in 10 tests.
(4) G λ -significance feature selection algorithm:
We randomly generate ten incomplete set-valued data sets. Here we run G λ -sig algorithm on each data set and get ten reduction results. The average size of the reduct is 3.2. Fig.5 shows the average of the number of selected features on each data set in 10 tests.
(5) H λ -significance feature selection algorithm:
We randomly generate ten incomplete set-valued data sets. Here we run H λ -sig algorithm on each data set and get ten reduction results. The average size of the reduct is 3.1. Fig.6 shows the average of the number of selected features on each data set in 10 tests.

Number of features by H λ -sig algorithm.
Fig.7 compares the average reduct size of the five algorithms.

Average size of features by each feature selection algorithm.
This paper has given feature selection for an ISVIS by using the combination of similarity measurement and uncertainty measurement. Based on λ-reduction, features in an ISVIS have been divided into three categories according to their importance. Connections between the proposed feature selection and uncertainty measurement have been researched. Feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS have been proposed. Time complexity and space complexity of these algorithms have been presented. Feature selection is studied from two aspects of relation reduction and information entropy reduction. In this way, two methods of feature selection are formed. Theorem 4.23 proves that relation reduction and information entropy reduction for incomplete set-valued data are essentially the same. Thus, these two methods are actually the same. The proposed algorithms has been compared by means of an example. Due to the lack of research on feature selection for incomplete set-valued data, we did not compare the proposed algorithms with other algorithms, this is the deficiency of this paper. Before long, we will study applications of feature selection for incomplete set-valued data.
Footnotes
Acknowledgments
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by the Project of Improving the Basic Scientific Research Ability of Young and Middle-aged Teachers in Guangxi Universities (2018KY0470).
