Information granularity-based incremental feature selection for partially labeled hybrid data

Abstract

Feature selection can reduce the dimensionality of data effectively. Most of the existing feature selection approaches using rough sets focus on the static single type data. However, in many real-world applications, data sets are the hybrid data including symbolic, numerical and missing features. Meanwhile, an object set in the hybrid data often changes dynamically with time. For the hybrid data, since acquiring all the decision labels of them is expensive and time-consuming, only small portion of the decision labels for the hybrid data is obtained. Therefore, in this paper, incremental feature selection algorithms based on information granularity are developed for dynamic partially labeled hybrid data with the variation of an object set. At first, the information granularity is given to measure the feature significance for partially labeled hybrid data. Then, incremental mechanisms of information granularity are proposed with the variation of an object set. On this basis, incremental feature selection algorithms with the variation of a single object and group of objects are proposed, respectively. Finally, extensive experimental results on different UCI data sets demonstrate that compared with the non-incremental feature selection algorithms, incremental feature selection algorithms can select a subset of features in shorter time without losing the classification accuracy, especially when the group of objects changes dynamically, the group incremental feature selection algorithm is more efficient.

Keywords

Partially labeled hybrid data incremental learning information granularity feature selection

1. Introduction

With the development of Internet of things, artificial intelligence and other information technologies, many features are collected to characterize the data in different practical applications. However, the high-dimensional characteristic of data may reduce the efficiency of the classification algorithm and degrade the classification performance of classifiers. Feature selection [1, 2, 3, 4, 5, 6] is an effective method of data preprocessing, which can effectively reduce data dimension, improve the data compactness and the efficiency of knowledge discovery. As an important theory of granular computing [4], rough set theory has becomes an active research work in the fields of feature selection, knowledge discovery, data mining and so on [7, 8, 9, 10]. The biggest advantage of this theory is that it does not need to provide any prior knowledge other than the data itself. It can directly process the data and mine the potentially useful knowledge from data sets. Therefore, we focus on the feature selection method using rough set theory in this paper. For many real-world data sets, most of the data sets are hybrid data including symbolic, numerical and missing features [11, 12, 13, 14, 15, 16, 17, 18]. At the same time, the label information acquisition of data is expensive, which requires expensive resources or a long experimental process for manual labeling to get labeled data. Therefore, the hybrid data with partial label is generated, i.e., only small portions of decision labels for hybrid data are obtained. For example, in the patients’ health system, this system is the hybrid information data which consists of symbolic features such as blood glucose and numerical features such as pain set. In addition, for privacy protection, some patients’ information is hidden, some feature values are missing. Meanwhile, doctors need to judge whether the patient is sick or not according to the patient’s health information, i.e., the label information. Since this judgment process is time-consuming and laborious, a large number of ‘unlabeled’ patients exist in the system. How to select a feature subset from partially labeled hybrid data is a hot topic in the field of knowledge discovery, data mining and intelligent information processing [19, 20, 21]. The relevant feature selection algorithms are mainly for static partially labeled data. However, in the background of big data, the data sets change dynamically in many practical applications [27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]. The dynamic changes of data sets include the variations of object set, feature set and feature values. As the main method of processing dynamic data, incremental learning has attracted much attention for many researchers. At present, the research on feature selection of dynamic partial labeled hybrid data is still less and needs further research. Therefore, this paper will propose incremental feature selection algorithms for the partially labeled hybrid data with the variation of an object set.

The main contributions of this paper include as follows. (1) Incremental computations of information granularity are given when an object set in the partially labeled hybrid data changes dynamically; (2) Incremental feature selection algorithms are developed when a single object and group of objects change dynamically in the partially labeled hybrid data, respectively; (3) The efficiency and effectiveness of the proposed incremental feature selection algorithms are demonstrated by experimental results.

The remainder of the paper is organized as follow. Section 2 discusses the previous studies on feature selection and incremental learning. Section 3 briefly reviews some basic concepts revolved in this paper. In Section 4, the non-incremental feature selection algorithm for dynamic partially labeled hybrid data is given. In Section 5, the incremental feature selection algorithms with the variation of a single object and group objects based on the information granularity are proposed, respectively. In Section 6, the experimental analyses on the non-incremental feature selection algorithms and the incremental feature selection algorithms on six UCI data sets are given. Section 7 presents the conclusions and the future work.

2. Related works

At present, some achievements have been made in the research of feature selection for the partially labeled data. For large-scale single type data, Wang et al. [19] proposed a feature selection algorithm based on the complementary information entropy for partially labeled symbolic data. Dai et al. [20] proposed a feature selection algorithm based on discernible matrix for partially labeled symbolic data. Liu et al. [21] proposed a rough set based feature selection algorithm via ensemble selector for partially labeled numerical data. In recent years, many scholars have proposed corresponding feature selection algorithms for other different types of data. For large-scale hybrid data consisting of categorical and numerical data, Xiao et al. [22] proposed a feature selection algorithm based on feature dependence. Wang and Liang [23] proposed an efficient feature selection algorithm based on the technology of decomposition and fusion. For the heterogeneous data with symbolic and real-valued condition features, Chen and Yang [24] proposed an attribute reduction algorithm by the combination of the classical rough set and the fuzzy rough set. Zhang et al. [25] proposed a feature selection algorithm based on the fuzzy rough set theory for mixed-type data including symbolic and numerical features. Kim et al. [26] proposed the feature selection algorithm based on rough set model for mixed-type data including categorical and numerical features. The above feature selection algorithms are proposed for static data.

Incremental feature selection has attracted much research interest because of the dynamic nature of data in practical application. For example, with the variation of an object set, Yang et al. [27] proposed an incremental feature selection algorithm based on fuzzy rough set for the dynamic incremental objects. Liang et al. [28] proposed a group incremental feature selection algorithm based on information entropy for discrete data. Ma et al. [29] proposed a group incremental attribute reduction based on compressed binary discernibility matrix for discrete data. Shu et al. [30] proposed two incremental feature selection algorithms through the use of the dependency function when a group of objects are added into or deleted from the discrete data, respectively. Zhang et al. [31] proposed an incremental feature selection algorithm by updating the approximations based on neighborhood rough set when a single object or group objects changes dynamically. With the variation of feature set, Jing et al. [32] proposed an incremental feature selection algorithm based on relation matrix for multiple objects and features changing simultaneously. Zheng et al. [33] proposed an incremental attribute reduction algorithm by using the relationship matrix when the attributes increase. Wang et al. [34] proposed the attribute reduction algorithm based on three representative entropies for data set with attributes increasing. With the variation of feature values, Wei et al. [35] proposed an incremental feature selection algorithm based on the discernible matrix of compressed decision table. Luo et al. [36] proposed an incremental method for updating approximations based on dominance rough set. Shu and Hong [37] proposed a rough set-based incremental feature selection algorithm in dynamic incomplete data. However, there are few researches on feature selection algorithms for the partially labeled hybrid data with the variation of an object set.

3. Preliminaries

In this section, we briefly review some basic concepts revolved in this paper. In addition, the concepts of the partially labeled hybrid decision system and the information granularity are introduced.

3.1 The partially labeled hybrid decision system

In the granular computing theory, the data table is represented as a 4-tuple $IS=(U,A,V,f)$ , where $U=\{x_{1},x_{2},\ldots,x_{n}\}$ is a finite nonempty set of objects, also called the universe; $A=\{a_{1},a_{2},\ldots,a_{m}\}$ is a finite nonempty set of features; $V={\textstyle\bigcup_{a\in A}V_{a}}$ is a finite set of features values, where denotes the set of feature values under feature a; $f:U\times A\to V$ is an information function with $f(x,a)\in V_{a}$ , for each $u\in U$ and $a\in A$ . If $A=C\cup D$ and $C\cap D=\varnothing$ , where $C$ is a condition feature set, and $D$ is a decision feature set, then the system is called a decision system.

Given a decision system $DS=(U,A=C\cup D,V,f)$ , for $\forall B\subseteq C$ , an indiscernibility relation is denoted by $\textit{IND}(B)=\{(x,y)\in U\times U|\forall b\in B,f(x,b)=f(y,b)\}$ . The relation $\textit{IND}(B)$ divides the universe $U$ into some equivalence classes, which is denoted by $U/\textit{IND}(B)=\{X_{1},X_{2},\ldots,X_{j}\}$ , where $X_{i}(1\leqslant i\leqslant j)$ represents the equivalence class.

The partially labeled hybrid decision system is represented by $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ , where $U_{u}$ is the set of unlabeled data, $U_{l}$ is the set of labeled data, and $U_{u}\cap U_{l}=\varnothing$ ; the condition feature set $C=C_{s}\cup C_{n}$ , $C_{s}$ is the symbolic feature set, $C_{n}$ is the numerical feature set, the missing feature values are existed in the symbolic features and the numerical features, which is represented by *; the symbols $V$ and $f$ are consistent with the expression of the decision system. For Table 1, this system is the partially labeled hybrid data that introduces the patients’ healthy and personal information. It consists of eight patients, i.e. $U=\{x_{1},x_{2},\ldots,x_{8}\}$ , $U_{u}=\{x_{1},x_{4},x_{5},x_{7}\}$ is the set of unlabeled data and $U_{l}=\{x_{2},x_{3},x_{6},x_{8}\}$ is the set of labeled data. Blood glucose $c_{1}$ , Cholesterol $c_{2}$ and Blood pressure $c_{3}$ are numerical features, i.e. $C_{n}=\{c_{1},c_{2},c_{3}\}$ . Gender $c_{4}$ and Pain site $c_{5}$ are symbolic features, i.e. $C_{s}=\{c_{4},c_{5}\}$ , the conditional feature set is $C=C_{s}\cup C_{n}$ . Heart attack $d$ is the decision feature, i.e. $D=\{d\}$ . For privacy protection of patients, some information is hidden or missing and represented by *. The symbolic feature values are given as follows: $V_{c_{4}}=\{\textit{Male},\textit{Female}\}$ , $V_{c_{5}}=\{\textit{Stemum},\textit{Else}\}$ and $V_{d}=\{\textit{Yes},\textit{No}\}$ . The numerical feature values are normalized in $[0,1]$ .

Table 1
The partially labeled hybrid decision system

	Blood glucose $c_{1}$	Cholesterol $c_{2}$	Blood pressure $c_{3}$	Gender $c_{4}$	Pain site $c_{5}$	Heart attack $d$
Patient $x1$	0.1	0.3	0.1	Male	Sternum	*
Patient $x2$	*	*	0.5	Female	Else	No
Patient $x3$	0.8	0.1	0.8	Female	Else	Yes
Patient $x4$	0.2	0.3	0.1	Male	Sternum	*
Patient $x5$	*	0.9	0.3	Male	*	*
Patient $x6$	0.9	0.7	*	Female	Sternum	Yes
Patient $x7$	0.7	0.6	0.3	Female	Sternum	*
Patient $x8$	0.8	0.7	0.3	*	Sternum	No

3.2 Information granularity

In the following, the related concepts are given as follows, such as the information granularity.

Definition 1. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ and $B=B_{s}\cup B_{n}$ , the neighborhood relation $NR_{B}$ under $B$ is defined as

$\displaystyle NR_{B}=\{(x,y)\in U\times U|\forall b\in B_{s},f(x,b)=f(y,b)\vee f% (x,b)=*\vee f(y,b)=*\}\cap$

(1) $\displaystyle\{(x,y)\in U\times U|\forall b\in B_{n},\textit{DIS}_{B_{n}}(x,y)% \leqslant\varepsilon\vee f(x,b)=\vee f(y,b)=\}$

where $\varepsilon\in[0,1]$ , $\textit{DIS}_{B_{n}}(x,y)=\sqrt{\sum_{i=1}^{|Bn|}|f(x,b)-f(y,b)|^{2}}$ is the Euclidean distance between two objects $x$ and $y$ on $B$ . The neighborhood granules of $U$ by the neighborhood relation $NR_{B}$ are denoted by $U/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\ldots,\delta_{B}(x_{|U|})\}$ .

Example 1. As shown in Table 1, given a partially labeled hybrid decision system $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ , where $U_{u}=\{x_{1},x_{4},x_{5},x_{7}\}$ , $U_{l}=\{x_{2},x_{3},x_{6},x_{8}\}$ . Let $\varepsilon=0.3$ , the neighborhood granules of unlabeled objects $U_{u}$ under the feature set $C$ are computed respectively as follows: $\delta_{C}(x_{1})=\{x_{1},x_{4}\}$ , $\delta_{C}(x_{4})=\{x_{1},x_{4}\}$ , $\delta_{C}(x_{5})=\{x_{5}\}$ , $\delta_{C}(x_{7})=\{x_{7}\}$ . The neighborhood granules of labeled objects $U_{l}$ under the feature set $C$ are computed respectively as follows: $\delta_{C}(x_{2})=\{x_{2},x_{3}\}$ , $\delta_{C}(x_{3})=\{x_{2},x_{3}\}$ , $\delta_{C}(x_{6})=\{x_{6},x_{8}\}$ , $\delta_{C}(x_{8})=\{x_{6},x_{8}\}$ . The equivalence classes of labeled objects under the decision feature set $D$ are computed as follows: $D_{1}=\{x_{2},x_{8}\}$ , $D_{2}=\{x_{3},x_{6}\}$ .

Definition 2. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{u}=\{x_{1},x_{2},\ldots,x_{|U_{u}|}\}$ , the classification is $U_{u}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\linebreak\ldots,\delta_{B}% (x_{|U_{u}|})\}$ , then the information granularity of $B$ is defined as

$\displaystyle IG_{u}(B)=\frac{1}{|U_{u}|}{\textstyle\sum_{i=1}^{|U_{u}|}\frac{% |\delta_{B}(x_{i})|}{|U_{u}|}},x_{i}\in U_{u}$ (2)

Example 2. (Continuation of Example 1) It follows from Eq. (2) that the information granularity of $C$ is $IG_{u}(C)=\frac{1}{|U_{u}|}{\textstyle\sum_{i=1}^{|U_{u}|}\frac{|\delta_{C}(x_% {i})|}{|U_{u}|}}=\frac{1}{4}\times\left(\frac{2+2+1+1}{4}\right)=\frac{3}{8}$ .

Definition 3. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{l}=\{x_{1},x_{2},\ldots,x_{|U_{l}|}\}$ , the classification is $U_{l}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\linebreak\ldots,\delta_{B}% (x_{|U_{l}|})\}$ , $U_{l}/\textit{IND}(D)=\{D_{1},D_{2},\ldots,D_{j}\}$ , then the information granularity of $D$ relative to $B$ is defined as

$\displaystyle IG_{l}(D|B)=IG_{l}(B)-IG_{l}(B\cup D)=\frac{1}{|U_{l}|}{% \textstyle\sum_{i=1}^{|U_{l}|}\frac{|\delta_{B}(x_{i})|-|\delta_{B}(x_{i})\cap D% _{k}|}{|U_{l}|}}=\frac{1}{|U_{l}|}{\textstyle\sum_{i=1}^{|U_{l}|}\frac{|\delta% _{B}(x_{i})-D_{k}|}{|U_{l}|}}(1\leqslant k\leqslant j)$ (3)

where $IG_{l}(B\cup D)=\frac{1}{|U_{l}|}{\textstyle\sum_{i=1}^{|U_{l}|}\frac{|\delta_% {B}(x_{i})\cap D_{k}|}{|U_{l}|}(1\leqslant k\leqslant j)}$ .

Example 3. (Continuation of Example 1) It follows from Eq. (3) that the information granularity of $D$ relative to $C$ is $IG_{l}(D|C)=\frac{1}{|U_{l}|}{\textstyle\sum_{i=1}^{|U_{l}|}\frac{|\delta_{C}(% x_{i})-D_{k}|}{|U_{l}|}}=\frac{1}{4}\times\left(\frac{1+1+1+1}{4}\right)=\frac% {1}{4}$ .

Theorem 1 (Monotonicity). Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B_{1},B_{2}\subseteq C$ and $B_{1}\subseteq B_{2}$ , we have $IG_{u}(B_{2})\leqslant IG_{u}(B_{1})$ and $IG_{l}(D|B_{2})\leqslant IG_{l}(D|B_{1})$ .

Proof. Since $B_{1}\subseteq B_{2}$ , for $\forall x\in U_{u}$ , according to Definition 1, it is obvious that $\delta_{B_{2}}(x)\subseteq\delta_{B_{1}}(x)$ , then ${\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B_{2}}(x)|}\leqslant{\textstyle\sum_{i% =1}^{|U_{u}|}|\delta_{B_{1}}(x)|}$ . It follows from Definition 2 that $IG_{u}(B_{2})\leqslant IG_{u}(B_{1})$ . Similarly, for $\forall x\in U_{l}$ , we have $\delta_{B_{2}}(x)\subseteq\delta_{B_{1}}(x)$ , suppose $U/D=\{D_{1},D_{2},\ldots,D_{j}\}$ , for $D_{k}(1\leqslant k\leqslant j)$ , we have $|\delta_{B_{2}}(x)\cap D_{k}|\leqslant|\delta_{B_{1}}(x)\cap D_{k}|$ , it follows from Definition 3 that $IG_{l}(D|B_{2})\leqslant IG_{l}(D|B_{1})$ .

Definition 4. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ and $b\in B$ , the feature significance of $b$ in $B$ is defined as

$\displaystyle\textit{Sig}_{in}(b,B,D)=\textit{Sig}_{U_{u}}^{in}(b,B)+\textit{% Sig}_{U_{l}}^{in}(b,B,D)$ (4)

where $\textit{Sig}_{U_{u}}^{in}(b,B)=IG_{u}(B-\{b\})-IG_{u}(B)$ is the feature significance of $b$ in $B$ for the unlabeled data, and $\textit{Sig}_{U_{l}}^{in}(b,B,D)=IG_{l}(D|B-\{b\})-IG_{l}(D|B)$ is the feature significance of $b$ in $B$ for the labeled data.

From Definition 4, it can be seen that if $\textit{Sig}_{in}(b,B,D)=0$ , then the feature $b$ in $B$ is redundant. Thus, we can use it to remove redundant features from the feature subset $B$ .

Definition 5. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ and $b\in C-B$ , the feature significance of $b$ to $B$ is defined as

$\displaystyle\textit{Sig}_{\textit{out}}(b,B,D)=\frac{\textit{Sig}_{U_{u}}^{% \textit{out}}(b,B,D)}{IG_{u}(B)}+\frac{1+\textit{Sig}_{U_{l}}^{\textit{out}}(b% ,B,D)}{1+IG_{l}(D|B)}$ (5)

where $\textit{Sig}_{U_{u}}^{\textit{out}}(b,B,D)=IG_{u}(B)-IG_{u}(B\cup\{b\})$ is the feature significance of $b$ in $C-B$ for the unlabeled data, and $\textit{Sig}_{U_{l}}^{\textit{out}}(b,B,D)=IG_{l}(D|B)-IG_{l}(D|B\cup\{b\})$ is the feature significance of $b$ in $C-B$ for the labeled data.

From Definition 5, it is shown that $\textit{Sig}_{\textit{out}}(b,B,D)\geqslant 0$ . The feature significance of $b$ in $C-B$ can be calculated by the change of information granularity when $b$ is added. Based on the feature significance, it can be used to select the most significant feature relative to the candidate feature subset in the process of feature selection.

Definition 6. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, $B\subseteq C$ is a subset of features iff

(1)
$IG_{u}(B)=IG_{u}(C)$ , $IG_{l}(D|B)=IG_{l}(D|C)$ ;
(2)
$\forall b\in B$ , $IG_{u}(B)<IG_{u}(B-\{b\})$ , $IG_{l}(D|B)<IG_{l}(D|B-\{b\})$ ;

Condition (1) indicates that the subset of features $B$ has the same information granularity as $C$ in $U_{u}$ and the same information granularity relative to $D$ as $C$ in $U_{l}$ ; Condition (2) indicates that the subset of features $B$ is the minimum feature subset, i.e., there is no redundant feature for each feature in $B$ .
4. The non-incremental feature selection based on the information granularity for the dynamic partially labeled hybrid data

In this section, in view of the dynamic partially labeled hybrid data, the non-incremental feature selection algorithm based on the information granularity is introduced. When an object set changes dynamically, the traditional (non-incremental) method regards the changed data set as a new data set and recompute the new feature subset. The detailed description is given as Algorithm NIFS.

1 The non-incremental feature selection algorithm based on information granularity (NIFS)

[1] A partially labeled hybrid decision system $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ , a dynamic object set $U_{ch}=U_{ad}\cup U_{de}$ , where $U_{ad}$ is a adding object set and $U_{de}$ is a deleting object set. A feature subset Red. Set $U^{\prime}=U\cup U_{ad}-U_{de}$ , $\textit{Red}\leftarrow\varnothing$ and normalize the numerical feature values; Compute the information granularity $IG_{u}(C)$ and $IG_{l}(D|C)$ ; $IG_{u}(\textit{Red})\neq IG_{u}(C)$ , $IG_{l}(D|\textit{Red})\neq IG_{l}(D|C)$ select the feature $a_{k}$ that satisfies $\textit{Sig}_{\textit{out}}(a_{k},\textit{Red},D)=\textit{max}_{a\in C-\textit% {Red}}\textit{Sig}_{\textit{out}}(a,\textit{Red},D)$ ; set $\textit{Red}\leftarrow\textit{Red}\cup\{a_{k}\}$ ; each $b\in\textit{Red}$ $\textit{Sig}_{in}(b,\textit{Red},D)=0$ set $\textit{Red}\leftarrow\textit{Red}-\{b\}$ ; Red;

The time complexity of NIFS. Step 2 is used to compute the information granularity of $C$ and the information granularity of $D$ relative to $C$ , the time complexity is $O(|U^{\prime}|^{2}|C|)$ . In Steps 3–6, the most significance feature in each loop is selected to add into Red until the termination condition is reached, the time complexity is $O(|U^{\prime}|^{2}|C|\cdot|C|+|U^{\prime}|^{2}|C|\cdot|C-1|+\ldots+|U^{\prime}% |^{2}|C|\cdot|1|)=O(|U^{\prime}|^{2}|C|^{3})$ . Steps 7–11 are to remove the redundant features, the time complexity is $O(|U^{\prime}|^{2}|C|^{2})$ . Therefore, the time complexity of Algorithm NIFS is $O(|U^{\prime}|^{2}|C|^{3})$ .

5. Incremental feature selection with the variation of an object set for the partially labeled hybrid data

Since the traditional (non-incremental) feature selection algorithm has a lot of repeated calculations for selecting a new feature subset, the algorithm is inefficient and time-consuming. Therefore, we will select a feature subset by the incremental strategy when an object set changes dynamically. The incremental learning is an effective method for feature selection in dynamic data. On the basis of the original feature selection result, a new feature subset is quickly obtained by updating the information granularity in an incremental manner. Section 5.1 will give the incremental mechanism and the corresponding algorithm when a single object changes dynamically. Section 5.2 will present the group incremental algorithm when a group of objects changes dynamically.

5.1 Incremental feature selection with the variation of a single object for the partially labeled hybrid data

When a single object is added into or deleted from the system, the incremental mechanism can effectively reduce calculation time by updating the new information granularity based on the change of the neighborhood granules, decision classes and the exiting information granularity of the original system. On this basis, the incremental feature selection algorithm with the variation of a single object is developed. In the following, we will select a new feature subset based on the previous feature selection results of the system, which is no need to select a new feature subset from scratch.

The incremental mechanisms of the information granularity are analyzed when adding a new labeled object or unlabeled object to the system, the detailed descriptions are given by Theorem 1 and Theorem 2.

Theorem 1. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , suppose a new unlabeled object $x$ is added into PDS, $\delta_{B}(x)$ is the neighborhood granule with $x$ under $B$ . Then the information granularity of $U_{u}\cup\{x\}$ under $B$ becomes

$\displaystyle IG_{U_{u}\cup\{x\}}(B)=\left(\frac{|U_{u}|}{|U_{u}|+1}\right)^{2% }IG_{u}(B)+\frac{1}{(|U_{u}|+1)^{2}}(2|\delta_{B}(x)|-1)$ (6)

Proof. Suppose $U_{u}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\ldots,\delta_{B}(x_{|U_{u}% |})\}$ , when a new unlabeled object $x$ is added into PDS, there are two cases about the changes of the neighborhood granules of $B$ as follows. (1) All the changed neighborhood granules of the unlabeled objects are $\delta^{\prime}_{B}(x_{k})=\delta_{B}(x_{k})\cup\{x\}(1\leqslant k\leqslant|U_% {u}|)$ , according to the symmetrical relation, $\delta_{B}(x)=\delta_{B}(x)\cup\{x_{k}\}$ . Thus, it follows from Eq. (2) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })|}={\textstyle\sum_{x_{i}\in\delta_{B}(x)}|\delta^{\prime}_{B}(x_{i})|}+{% \textstyle\sum_{x_{i}\notin\delta_{B}(x)}|\delta^{\prime}_{B}(x_{i})|}={% \textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\neq x}|\delta_{B}(x_{i})\cup% \{x\}|}+|\delta_{B}(x)|+{\textstyle\sum_{x_{i}\notin\delta_{B}(x)}|\delta_{B}(% x_{i})|}.$

Since

$\displaystyle{\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\neq x}|\delta_% {B}(x_{i})\cup\{x\}|}={\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\neq x% }|\delta_{B}(x_{i})|}+(|\delta_{B}(x)|-1)|\{x\}|,$

then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}+2|\delta_{B}(x)|-1.$

(2) The neighborhood granule of the new object $x$ is that $\delta_{B}(x)=\{x\}$ . Thus, it follows from Eq. (2) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}+|\delta_{B}(x)|.$

Since $|\{x\}|=2|\{x\}|-|\{x\}|$ , namely, $|\delta_{B}(x)|=2|\delta_{B}(x)|-1$ , then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}+2|\delta_{B}(x)|-1.$

As analyzed above, the information granularity of $B$ becomes $IG_{U_{u}\cup\{x\}}(B)=\left(\frac{|U_{u}|}{|U_{u}|+1}\right)^{2}IG_{u}(B)+% \frac{1}{(|U_{u}|+1)^{2}}(2|\delta_{B}(x)|-1)$ .

Example 4. (Continuation of Example 2) If a new object $x=\{0.7,*,0.1,*,\textit{Else},*\}$ is added into the partially labeled hybrid decision system shown in Table 1. For Example 2, the original information granularity is $IG_{u}(C)=\frac{3}{8}$ . By computing, it can obtain that $\delta_{C}(x)=\{x,x_{5}\}$ . From Eq. (6), the new information granularity of $C$ is $IG_{U_{u}\cup\{x\}}(C)=\left(\frac{|U_{u}|}{|U_{u}|+1}\right)^{2}IG_{u}(C)+% \frac{1}{(|U_{u}|+1)^{2}}(2|\delta_{C}(x)|-1)=\frac{9}{25}$ .

Theorem 2. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , suppose a new labeled object $x$ is added into PDS, $\delta_{B}(x)$ is the neighborhood granule with $x$ under $B$ , $[x]_{D}$ is the decision equivalence class of $x$ . The information granularity of $D$ relative to $B$ becomes

$\displaystyle IG_{U_{l}\cup\{x\}}(D|B)=\left(\frac{|U_{l}|}{|U_{l}|+1}\right)^% {2}IG_{l}(D|B)+\frac{1}{(|U_{l}|+1)^{2}}(2|\delta_{B}(x)-[x]_{D}|)$ (7)

Proof. Suppose $U_{l}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\ldots,\delta_{B}(x_{|U_{l}% |})\}$ , $U_{l}/IND(D)=\{D_{1},D_{2},\ldots,D_{j}\}$ . When a new labeled object $x$ is added into PDS, there are four cases about the changes of the neighborhood granules of $U_{l}\cup\{x\}$ under $B$ and the equivalence class of $D$ as follows. (1) All the changed neighborhood granules of the labeled objects are $\delta^{\prime}_{B}(x_{p})=\delta_{B}(x_{p})\cup\{x\}(1\leqslant p\leqslant|U_% {l}|)$ , according to the symmetrical relation, $\delta_{B}(x)=\delta_{B}(x)\cup\{x_{p}\}$ . The decision class of $x$ is $[x]_{D}=D^{\prime}_{q}=D_{q}\cup\{x\}(1\leqslant q\leqslant j)$ . Then it follows from Eq. (3) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\in[x]_{% D}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}$ $\displaystyle{}+{\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\notin[x]_{D% }}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}+{\textstyle\sum_{x_{i}\notin% \delta_{B}(x)\wedge x_{i}\in[x]_{D}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}$ $\displaystyle{}+{\textstyle\sum_{x_{i}\notin\delta_{B}(x)\wedge x_{i}\notin[x]% _{D}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}.$

Since

$\displaystyle{\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\in[x]_{D}}|% \delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta_{B% }(x)\wedge x_{i}\in[x]_{D}\wedge x_{i}\neq x}|\delta_{B}(x_{i})\cup\{x\}-D_{q}% \cup\{x\}|}+|\delta_{B}(x)-[x]_{D}|={\textstyle\sum_{x_{i}\in\delta_{B}(x)% \wedge x_{i}\in[x]_{D}\wedge x_{i}\neq x}|\delta_{B}(x_{i})-D_{q}|}+|\delta_{B% }(x)-[x]_{D}|,{\textstyle\sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\notin[x]_{D}}% |\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta_{% B}(x)\wedge x_{i}\notin[x]_{D}}}|\delta_{B}(x_{i})\cup\{x\}-D_{k}|={\textstyle% \sum_{x_{i}\in\delta_{B}(x)\wedge x_{i}\notin[x]_{D}}|\delta_{B}(x_{i})-D_{k}|% }+|\delta_{B}(x)-[x]_{D}||\{x\}|,{\textstyle\sum_{x_{i}\notin\delta_{B}(x)% \wedge x_{i}\in[x]_{D}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}={% \textstyle\sum_{x_{i}\notin\delta_{B}(x)\wedge x_{i}\in[x]_{D}}|\delta_{B}(x_{% i})-D_{q}\cup\{x\}|}={\textstyle\sum_{x_{i}\notin\delta_{B}(x)\wedge x_{i}\in[% x]_{D}}|\delta_{B}(x_{i})-D_{q}|},$

and

$\displaystyle{\textstyle\sum_{x_{i}\notin\delta_{B}(x)\wedge x_{i}\notin[x]_{D% }}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\notin% \delta_{B}(x)\wedge x_{i}\notin[x]_{D}}|\delta_{B}(x_{i})-D_{k}|},$

it can be obtained by merging operations that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% 2|\delta_{B}(x)-[x]_{D}|.$

(2) All the changed neighborhood granules of the labeled objects are $\delta^{\prime}_{B}(x_{p})=\delta_{B}(x_{p})\cup\{x\}(1\leqslant p\leqslant|U_% {l}|)$ , according to the symmetrical relation, $\delta_{B}(x)=\delta_{B}(x)\cup\{x_{p}\}$ . The decision class of $x$ is $[x]_{D}=\{x\}$ . Then it follows from Eq. (3) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta_{B}(x)}|\delta^{\prime}_{B% }(x_{i})-D^{\prime}_{k}|}+{\textstyle\sum_{x_{i}\notin\delta_{B}(x)}|\delta^{% \prime}_{B}(x_{i})-D^{\prime}_{k}|}.$

It is similar to case (1), then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% 2|\delta_{B}(x)-[x]_{D}|.$

(3) The neighborhood granule of $x$ is $\delta_{B}(x)=\{x\}$ . The decision class of $x$ is $[x]_{D}=D^{\prime}_{q}=D_{q}\cup\{x\}(1\leqslant q\leqslant j)$ . Then it follows from Eq. (3) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in D^{\prime}_{q}}|\delta^{\prime}_% {B}(x_{i})-D^{\prime}_{k}|}+{\textstyle\sum_{x_{i}\notin D^{\prime}_{q}}|% \delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}.$

Since

$\displaystyle{\textstyle\sum_{x_{i}\in D^{\prime}_{q}}|\delta^{\prime}_{B}(x_{% i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in D^{\prime}_{q}\wedge x_{i}\neq x% }|\delta_{B}(x_{i})-D_{q}\cup\{x\}|}+|\delta_{B}(x)-[x]_{D}|={\textstyle\sum_{% x_{i}\in D^{\prime}_{q}\wedge x_{i}\neq x}|\delta_{B}(x_{i})-D_{q}|}+|\delta_{% B}(x)-[x]_{D}|,$

and

$\displaystyle{\textstyle\sum_{x_{i}\notin D^{\prime}_{q}}|\delta^{\prime}_{B}(% x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\notin D^{\prime}_{q}}|\delta_{B% }(x_{i})-D_{k}|},$

then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% |\delta_{B}(x)-[x]_{D}|.$

Because $\delta_{B}(x)-[x]_{D}=\varnothing$ , the

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% 2|\delta_{B}(x)-[x]_{D}|.$

(4) The neighborhood granule of $x$ is $\delta_{B}(x)=\{x\}$ and the decision class of $x$ is $[x]_{D}=\{x\}$ . Then it follows from Eq. (3) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% |\delta_{B}(x)-[x]_{D}|.$

Since $\delta_{B}(x)-[x]_{D}=\varnothing$ , then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup\{x\}|}|\delta^{\prime}_{B}(x_{i% })-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+% 2|\delta_{B}(x)-[x]_{D}|.$

As analyzed above, the information granularity of $D$ relative to $B$ becomes

$\displaystyle IG_{U_{l}\cup\{x\}}(D|B)=\left(\frac{|U_{l}|}{|U_{l}|+1}\right)^% {2}IG_{l}(D|B)+\frac{1}{(|U_{l}|+1)^{2}}(2|\delta_{B}(x)-[x]_{D}|)).$

Example 5. (Continuation of Example 3) If a new object $x=\{0.1,0.3,0.7,*,\textit{Sternum},\textit{Yes}\}$ is added into the partially labeled hybrid decision system shown in Table 1. For Example 2, the original information granularity of $D$ relative to $C$ is $IG_{l}(D|C)=\frac{1}{4}$ . By computing, it can obtain that $\delta_{C}\{x\}=\{x\},[x]_{D}=\{x,x_{3},x_{6}\}$ . From Eq. (7), the new information granularity of $D$ relative to $C$ is $IG_{U_{l}\cup\{x\}}(D|C)=\left(\frac{|U_{l}|}{|U_{l}|+1}\right)^{2}IG_{l}(D|C)% +\frac{1}{(|U_{l}|+1)^{2}}(2|\delta_{C}(x)-[x]_{D}|)=\frac{4}{25}$ .

In the following, the incremental mechanisms of information granularity are analyzed when deleting a labeled object or unlabeled object from the system, the detailed descriptions are given as Theorem 3 and Theorem 4.

Theorem 3. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , suppose an object $y\in U_{u}$ is deleted from PDS, $\delta_{B}(y)$ is the neighborhood granule with $y$ under $B$ . Then the information granularity of $U_{u}-\{y\}$ under B becomes

$\displaystyle IG_{U_{u}-\{y\}}(B)=\left(\frac{|U_{u}|}{|U_{u}|-1}\right)^{2}IG% _{u}(B)-\frac{1}{(|U_{u}|-1)^{2}}(2|\delta_{B}(y)|-1)$ (8)

Proof. Suppose $U_{u}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\ldots,\delta_{B}(x_{y}),% \ldots,\delta_{B}(x_{|U_{u}|})\}$ , when an object $y\in U_{u}$ is deleted from PDS, $\delta^{\prime}_{B}(y)=\delta_{B}(y)-\{y\}$ , then for $x\in\delta^{\prime}_{B}(y)$ , $\delta^{\prime}_{B}(x)=\delta_{B}(x)-\{y\}$ . It follows from Definition 2 that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-\{y\}|}|\delta^{\prime}_{B}(x_{i})|% }={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)}|\delta^{\prime}_{B}(x_{i})|% }+{\textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y)}|\delta^{\prime}_{B}(x_{i% })|}={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)}|\delta_{B}(x_{i})-\{y\}|% }+{\textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y)}|\delta_{B}(x_{i})|}+|% \delta_{B}(y)|-|\delta_{B}(y)|.$

Since

$\displaystyle{\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)}|\delta_{B}(x_{i}% )-\{y\}|}={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)}|\delta_{B}(x_{i})|}% -|\delta^{\prime}_{B}(y)||\{y\}|,$

and

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-\{y\}|}|\delta^{\prime}_{B}(x_{i})|% }={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}-|\delta^{\prime}_{B}(y)|% -|\delta_{B}(y)|.$

Because $|\delta^{\prime}_{B}(y)|=|\delta_{B}(y)-\{y\}|$ , then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-\{y\}|}|\delta^{\prime}_{B}(x_{i})|% }={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}-(2|\delta_{B}(y)|-1).$

As analyzed above, the information granularity of $B$ becomes

$\displaystyle IG_{U_{u}-\{y\}}(B)=\left(\frac{|U_{u}|}{|U_{u}|-1}\right)^{2}IG% _{u}(B)-\frac{1}{(|U_{u}|-1)^{2}}(2|\delta_{B}(y)|-1).$

Example 6. (Continuation of Example 2) If the object $y=x_{7}$ is deleted from the partially labeled hybrid decision system shown in Table 1. For Example 2, the original information granularity of $C$ is $IG_{u}(C)=\frac{3}{8}$ . It can be obtained that $\delta_{B}(y)=\{y\}$ . From Eq. (8), the new information granularity of $C$ is $IG_{U_{u}-\{y\}}(C)=\left(\frac{|U_{u}|}{|U_{u}|-1}\right)^{2}IG_{u}(C)-\frac{% 1}{(|U_{u}|-1)^{2}}(2|\delta_{C}(y)|-1)=\frac{5}{9}$ .

Theorem 4. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , suppose an object $y\in U_{l}$ is deleted from PDS, $\delta_{B}(y)$ is the neighborhood granule with $y$ under $B$ , $[y]_{D}$ is the decision equivalence class of $y$ . The information granularity of $D$ relative to $B$ becomes

$\displaystyle IG_{U_{l}-\{y\}}(D|B)=\left(\frac{|U_{l}|}{|U_{l}|-1}\right)^{2}% IG_{l}(D|B)-\frac{1}{(|U_{l}|-1)^{2}}(2|\delta_{B}(y)-[y]_{D}|)$ (9)

Proof. Supposed that $U_{l}/NR_{B}=\{\delta_{B}(x_{1}),\delta_{B}(x_{2}),\ldots,\delta_{B}(x_{y}),% \ldots,\delta_{B}(x_{|U_{l}|})\}$ , $U_{l}/\textit{IND}(D)=\{D_{1},\linebreak D_{2},\ldots,D_{j}\}$ , when an object $y\in U_{l}$ is deleted from PDS, $\delta^{\prime}_{B}(x)=\delta_{B}(x)-\{y\}$ . According to the symmetrical relation, for $x\in\delta^{\prime}_{B}(y)$ , $\delta^{\prime}_{B}(x)=\delta_{B}(x)-\{y\}$ , $D^{\prime}_{q}=D_{q}-\{y\}=[y]_{D}-\{y\}$ . It follows from Eq. (3) that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-\{y\}|}|\delta^{\prime}_{B}(x_{i})-% D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)\wedge x_{i}% \in D^{\prime}_{q}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}+{\textstyle% \sum_{x_{i}\in\delta^{\prime}_{B}(y)\wedge x_{i}\notin D^{\prime}_{q}}|\delta^% {\prime}_{B}(x_{i})-D^{\prime}_{k}|}$ $\displaystyle{}+{\textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y)\wedge x_{i}% \in D^{\prime}_{q}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}+{\textstyle% \sum_{x_{i}\notin\delta^{\prime}_{B}(y)\wedge x_{i}\notin D^{\prime}_{q}}|% \delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}.$

Since

$\displaystyle{\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)\wedge x_{i}\in D^% {\prime}_{q}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{% i}\in\delta^{\prime}_{B}(y)\wedge x_{i}\in D^{\prime}_{q}}|(\delta_{B}(x_{i})-% \{y\})-(D_{q}-\{y\})|}={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)\wedge x% _{i}\in D^{\prime}_{q}}|\delta_{B}(x_{i})-D_{q}|},{\textstyle\sum_{x_{i}\in% \delta^{\prime}_{B}(y)\wedge x_{i}\notin D^{\prime}_{q}}|\delta^{\prime}_{B}(x% _{i})-D^{\prime}_{k}|}={\textstyle\sum_{x_{i}\in\delta^{\prime}_{B}(y)\wedge x% _{i}\notin D^{\prime}_{q}}|(\delta_{B}(x_{i})-\{y\})-D_{k}|}={\textstyle\sum_{% x_{i}\in\delta^{\prime}_{B}(y)\wedge x_{i}\notin D^{\prime}_{q}}|\delta_{B}(x_% {i})-D_{k}|}-|\delta^{\prime}_{B}(y)-D^{\prime}_{q}||\{y\}|,{\textstyle\sum_{x% _{i}\notin\delta^{\prime}_{B}(y)\wedge x_{i}\in D^{\prime}_{q}}}|\delta^{% \prime}_{B}(x_{i})-D^{\prime}_{k}|={\textstyle\sum_{x_{i}\notin\delta^{\prime}% _{B}(y)\wedge x_{i}\in D^{\prime}_{q}}|\delta_{B}(x_{i})-(D_{q}-\{y\})|}={% \textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y)\wedge x_{i}\in D^{\prime}_{q% }}}|\delta_{B}(x_{i})-D_{q}|,{\textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y% )\wedge x_{i}\notin D^{\prime}_{q}}}|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}% |={\textstyle\sum_{x_{i}\notin\delta^{\prime}_{B}(y)\wedge x_{i}\notin D^{% \prime}_{q}}|\delta_{B}(x_{i})-D_{k}|},$

by merging operations, we have

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-\{y\}|}|\delta^{\prime}_{B}(x_{i})-% D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}-|% \delta^{\prime}_{B}(y)-D^{\prime}_{q}|-|\delta_{B}(y)-D_{q}|.$

Because $\delta^{\prime}_{B}(y)-D^{\prime}_{q}=\delta_{B}(y)-D_{q}$ , we have

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-\{y\}|}|\delta^{\prime}_{B}(x_{i})-% D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}-2|% \delta_{B}(y)-[y]_{D}|.$

As analyzed above, the information granularity of $D$ relative to $B$ becomes

$\displaystyle IG_{U_{l}-\{y\}}(D|B)=\left(\frac{|U_{l}|}{|U_{l}|-1}\right)^{2}% IG_{l}(D|B)-\frac{1}{(|U_{l}|-1)^{2}}(2|\delta_{B}(y)-[y]_{D}|).$

Example 7. (Continuation of Example 3) If an object $y=x_{3}$ is deleted from the partially labeled hybrid decision system shown in Table 1. For Example 3, the original information granularity of $D$ relative to $C$ is $IG_{l}(D|C)=\frac{1}{4}$ . It can be obtained that $\delta_{C}(y)=\{x_{2},y\},[y]_{D}=\{y,x_{6}\}$ . From Eq. (9), the new information granularity of $D$ relative to $C$ is $IG_{U_{l}-\{y\}}(D|C)=\left(\frac{|U_{l}|}{|U_{l}|-1}\right)^{2}IG_{l}(D|C)-% \frac{1}{(|U_{l}|-1)^{2}}(2|\delta_{C}(y)-[y]_{D}|)=\frac{2}{9}$ .

According to the above analyses, the corresponding algorithm is developed based on the incremental mechanism when adding and deleting a single object simultaneously, the detailed descriptions are given as Algorithm IFS.

2 The single incremental feature selection algorithm based on the information granularity (IFS)

[1] A partially labeled hybrid decision system $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ , the original feature subset Red, $IG_{u}(\textit{Red})$ and $IG_{l}(D|\textit{Red})$ , a single adding object $x$ , a single deleting object $y$ . A new feature subset $\textit{Red}^{\prime}$ . Set $\textit{Red}^{\prime}\leftarrow\textit{Red}$ and normalize the numerical feature values; the object $y$ is deleted from PDSset $U^{\prime}=U-\{y\}$ ; go to Step 13; Compute the information granularity $IG_{U^{\prime}}(\textit{Red}^{\prime})$ and $IG_{U^{\prime}}(D|\textit{Red}^{\prime})$ by Eqs (7) and (9); each $a\in\textit{Red}^{\prime}$ $\textit{Sig}_{in}(a,\textit{Red}^{\prime},D)=0$ set $\textit{Red}^{\prime}\leftarrow\textit{Red}^{\prime}-\{a\}$ ; a single new object $x$ is added into PDSset $U^{\prime}=U\cup\{x\}$ and normalize the numerical feature values; go to Step 29; Compute the neighborhood granules $\delta_{\textit{Red}^{\prime}}(x)$ and $\delta_{C}(x)$ of $x$ under $\textit{Red}^{\prime}$ and $C$ ; $\delta_{\textit{Red}^{\prime}}(x)=\delta_{C}(x)$ go to Step 29; go to Step 24; For each $a\in C-\textit{Red}^{\prime}$ , compute $\textit{Sig}_{\textit{out}}(a,\textit{Red}^{\prime},D)$ and sort the rest features by a descending sequence $\{a_{1},a_{2},\ldots,\linebreak a_{|C-\textit{Red}^{\prime}|}\}$ ; $IG_{U^{\prime}}(\textit{Red}^{\prime})\neq IG_{U^{\prime}}(C)$ , $IG_{U^{\prime}}(D|\textit{Red}^{\prime})\neq IG_{U^{\prime}}(D|C)$ for $j=1$ to $|C-\textit{Red}^{\prime}|$ do select $\textit{Red}^{\prime}\leftarrow\textit{Red}^{\prime}\cup\{a_{j}\}$ and compute $IG_{U^{\prime}}(\textit{Red}^{\prime})$ , $IG_{U^{\prime}}(D|\textit{Red}^{\prime})$ by Eqs (6) and (8); each $b\in\textit{Red}^{\prime}$ $\textit{Sig}_{in}(b,\textit{Red}^{\prime},D)=0$ set $\textit{Red}^{\prime}\leftarrow\textit{Red}^{\prime}-\{b\}$ ; Red’;

The time complexity of IFS. Step 7 is used to compute the information granularity of feature subset $\textit{Red}^{\prime}$ by incremental method when deleting an object, the time complexity is $O(|U^{\prime}|)$ . For Steps 8–12, the redundant features are removed by computing the feature significance, the time complexity is $O(|U^{\prime}||\textit{Red}^{\prime}|)$ . Step 18 is used to compute the neighborhood granules of $x$ under $\textit{Red}^{\prime}$ and $C$ , the time complexity is $O(|U^{\prime}||C|)$ . For Step 24, the remaining features are sorted according to the feature significance, whose the time complexity is $O(|U^{\prime}||\textit{Red}^{\prime}||C-\textit{Red}^{\prime}|)$ . Steps 25–28 are to add the most significant feature to the candidate feature subset until termination condition is reached, the time complexity is $O(|U^{\prime}||C||C-\textit{Red}^{\prime}|)$ . Steps 29–33 is to remove the redundant features, the time complexity is $O(|U^{\prime}||\textit{Red}^{\prime}|^{2})$ . Therefore, the time complexity of IFS is $O(|U^{\prime}||C|^{2})$ . Compared with the non-incremental algorithm NIFS, the time complexity of IFS is effectively reduced.

5.2 Incremental feature selection with the variation of a group of objects for the partially labeled hybrid data

When a group of objects is added into or deleted from the system, the group incremental mechanism is regarded the dynamic objects as a whole, the new information granularity can be updated quickly by analyzing the local changed information granularity between dynamic objects and the original data set. On this basis, the proposed group incremental algorithm can obtain a new feature subset by updating the exiting feature selection results of the original data set.

The updating incremental mechanisms of the information granularity are introduced when adding a group unlabeled objects or labeled objects, the detailed descriptions are given as Theorem 5 and Theorem 6.

Theorem 5. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{u}=\{x_{1},x_{2},\ldots,x_{|U_{u}|}\}$ , suppose $U_{ad}=U_{ad}^{u}\cup U_{ad}^{l}$ is a group of adding objects, $U_{ad}^{u}=\{y_{1},y_{2},\ldots,y_{|U_{ad}^{u}|}\}$ . Then the new information granularity of $U_{u}\cup U_{ad}^{u}$ under $B$ becomes

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}\cup U_{ad}^{u}|^{2}}(|U_{u}|^% {2}IG_{U_{u}}(B)+|U_{ad}^{u}|^{2}IG_{U_{ad}^{u}}(B)+2{\textstyle\sum_{i=1}^{|U% _{ad}^{u}|}|X_{B}(y_{i})|})$ (10)

where $X_{B}(y)$ is the object set of $U_{u}$ that satisfies the neighborhood relation with the object $y\in U_{ad}^{u}$ under $B$ .

Proof. Let $Y_{B}(x)$ be the object set of $U_{ad}^{u}$ that satisfies the neighborhood relation with the object $x\in U_{u}$ under $B$ , it follows from Definition 2, the information granularity of $B$ is

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}\cup U_{ad}^{u}|}{\textstyle% \sum_{i=1}^{|U_{u}\cup U_{ad}^{u}|}\frac{|\delta^{\prime}_{B}(x_{i})|}{|U_{u}% \cup U_{ad}^{u}|}}.$

Since

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup U_{ad}^{u}|}|\delta^{\prime}_{B% }(x_{i})|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta^{\prime}_{B}(x_{i})|}+{% \textstyle\sum_{i=1}^{|U_{ad}^{u}|}|\delta^{\prime}_{B}(y_{i})|}={\textstyle% \sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})\cup Y_{B}(x_{i})|}+{\textstyle\sum_{i=1% }^{|U_{ad}^{u}|}|\delta_{B}(y_{i})\cup X_{B}(y_{i})|}={\textstyle\sum_{i=1}^{|% U_{u}|}|\delta_{B}(x_{i})|}+{\textstyle\sum_{i=1}^{|U_{u}|}|Y_{B}(x_{i})|}+{% \textstyle\sum_{i=1}^{|U_{ad}^{u}|}|\delta_{B}(y_{i})|}+{\textstyle\sum_{i=1}^% {|U_{ad}^{u}|}|X_{B}(y_{i})|},$

based on the symmetrical relation,

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}|}|Y_{B}(x_{i})|}={\textstyle\sum_{i% =1}^{|U_{ad}^{u}|}|X_{B}(y_{i})|},$

then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}\cup U_{ad}^{u}|}|\delta^{\prime}_{B% }(x_{i})|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}+{\textstyle\sum% _{i=1}^{|U_{ad}^{u}|}|\delta_{B}(y_{i})|}+2{\textstyle\sum_{i=1}^{|U_{ad}^{u}|% }|X_{B}(y_{i})|}.$

Thus, the information granularity of $B$ becomes

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}\cup U_{ad}^{u}|^{2}}(|U_{u}|^% {2}IG_{U_{u}}(B)+|U_{ad}^{u}|^{2}IG_{U_{ad}^{u}}(B)+2{\textstyle\sum_{i=1}^{|U% _{ad}^{u}|}|X_{B}(y_{i})|}).$

Example 8. (Continuation of Example 2) If a group of adding objects $U_{ad}=\{y_{1},y_{2},y_{3},y_{4}\}$ is added into the partially labeled hybrid decision system shown in Table 1, where $U_{ad}^{u}=\{y_{1},\linebreak y_{2}\}$ , $y_{1}=\{0.8,0.9,0.4,\textit{Male},\textit{Else},*\}$ , $y_{2}=\{0.7,*,0.1,*,\textit{Else},*\}$ , $U_{ad}^{l}=\{y_{3},y_{4}\}$ , $y_{3}=\{0.6,*,0.9,\linebreak\textit{Female},*,\textit{Yes}\}$ , $y_{4}=\{0.1,0.3,0.7,*,\textit{Sternum},\textit{Yes}\}$ . For Example 2, the original information granularity is $IG_{u}(C)=\frac{3}{8}$ . By computing, it can obtain that $IG_{U_{ad}^{u}}(C)=\frac{1}{2}$ , $X_{C}(y_{1})=\{x_{5}\}$ , $X_{C}(y_{2})=\{x_{5}\}$ . From Eq. (10), the new information granularity of $C$ is $IG^{\prime}_{u}(C)=\frac{1}{|U_{u}\cup U_{ad}^{u}|^{2}}(|U_{u}|^{2}IG_{U_{u}}(% C)+|U_{ad}^{u}|^{2}IG_{U_{ad}^{u}}(C)+2{\textstyle\sum_{i=1}^{|U_{ad}^{u}|}|X_% {C}(y_{i})|})=\frac{1}{3}$ .

Theorem 6. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{l}=\{x_{1},x_{2},\ldots,x_{|U_{l}|}\}$ , suppose $U_{ad}=U_{ad}^{u}\cup U_{ad}^{l}$ is a group of adding objects, $U_{ad}^{l}=\{y_{1},y_{2},\ldots,y_{|U_{ad}^{l}|}\}$ . Then, the information granularity of $D$ relative to $B$ becomes

$\displaystyle IG^{\prime}_{l}(D|B)=\frac{1}{|U_{l}\cup U_{ad}^{l}|^{2}}(|U_{l}% |^{2}IG_{U_{l}}(D|B)+|U_{ad}^{l}|^{2}IG_{U_{ad}^{l}}(D|B)+2{\textstyle\sum_{i=% 1}^{|U_{ad}^{l}|}|X_{B}(y_{i})-[y_{i}]_{D}^{U_{l}}|})$ (11)

where $X_{B}(y)$ is the object set of $U_{l}$ that satisfies the neighborhood relation with the object $y\in U_{ad}^{l}$ under $B$ , and $[y]_{D}^{U_{l}}$ is the object set of $U_{l}$ that satisfies the decision equivalence relation with the object $y\in U_{ad}^{l}$ under $D$ .

Proof. Let $Y_{B}(x)$ be the object set of $U_{ad}^{l}$ that satisfies the neighborhood relation with the object $x\in U_{l}$ under $B$ , $[y]_{D}^{U_{ad}^{l}}$ is the object set of $U_{ad}^{l}$ that satisfies the decision equivalence relation with the object $y\in U_{ad}^{l}$ under $D$ . It follows from Eq. (3), the information granularity of $D$ relative to $B$ is

$\displaystyle IG^{\prime}_{l}(D|B)=\frac{1}{|U_{l}\cup U_{ad}^{l}|}{\textstyle% \sum_{i=1}^{|U_{l}\cup U_{ad}^{l}|}\frac{|\delta^{\prime}_{B}(x_{i})-D^{\prime% }_{k}|}{|U_{l}\cup U_{ad}^{l}|}}.$

Since

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup U_{ad}^{l}|}|\delta^{\prime}_{B% }(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta^{\prime}_{B}(% x_{i})-D^{\prime}_{k}|}+{\textstyle\sum_{i=1}^{|U_{ad}^{l}|}|\delta^{\prime}_{% B}(y_{i})-D^{\prime}_{k}|}$ $\displaystyle={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+{% \textstyle\sum_{i=1}^{|U_{l}|}|Y_{B}(x_{i})-[x_{i}]_{D}|}+{\textstyle\sum_{i=1% }^{|U_{ad}^{l}|}|\delta_{B}(y_{i})-[y_{i}]_{D}^{U_{ad}^{l}}|}$ $\displaystyle{}+{\textstyle\sum_{i=1}^{|U_{ad}^{l}|}|X_{B}(y_{i})-[y_{i}]_{D}^% {U_{l}}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|}+{% \textstyle\sum_{i=1}^{|U_{ad}^{l}|}|\delta_{B}(y_{i})-[y_{i}]_{D}^{U_{ad}^{l}}|}$ $\displaystyle{}+{\textstyle\sum_{i=1}^{|U_{l}|}|Y_{B}(x_{i})-[x_{i}]_{D}|}+{% \textstyle\sum_{i=1}^{|U_{ad}^{l}|}|X_{B}(y_{i})-[y_{i}]_{D}^{U_{l}}|}.$

According to the symmetrical relation,

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}|}|Y_{B}(x_{i})-[x_{i}]_{D}|}={% \textstyle\sum_{i=1}^{|U_{ad}^{l}|}|X_{B}(y_{i})-[y_{i}]_{D}^{U_{l}}|},$

then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}\cup U_{ad}^{l}|}|\delta^{\prime}_{B% }(x_{i})-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_% {k}|}+{\textstyle\sum_{i=1}^{|U_{ad}^{l}|}|\delta_{B}(y_{i})-[y_{i}]_{D}^{U_{% ad}^{l}}|}+2{\textstyle\sum_{i=1}^{|U_{ad}^{l}|}|X_{B}(y_{i})-[y_{i}]_{D}^{U_{% l}}|}.$

Thus, the information granularity of $D$ relative to $B$ becomes

Example 9. (Continuation of Example 8) For Example 3, the original information granularity of $D$ relative to $C$ is $IG_{l}(D|C)=\frac{1}{4}$ . For Example 8, $U_{ad}^{l}=\{y_{3},y_{4}\}$ , $y_{3}=\{0.6,*,0.9,\textit{Female},*,\textit{Yes}\}$ , $y_{4}=\{0.1,0.3,0.7,*,\textit{Sternum},\textit{Yes}\}$ . By computing, it can be obtained that $IG_{U_{ad}^{l}}(D|C)=0$ , $X_{C}(y_{3})=\{x_{3},x_{6}\}$ , $X_{C}(y_{4})=\varnothing$ , $[y_{3}]_{D}^{U_{l}}=\{x_{3},x_{6}\}$ , $[y_{4}]_{D}^{U_{l}}=\{x_{3},x_{6}\}$ , from Eq. (11), the new information granularity of $D$ to $C$ is $IG^{\prime}_{l}(D|C)=\frac{1}{|U_{l}\cup U_{ad}^{l}|^{2}}(|U_{l}|^{2}IG_{U_{l}% }(D|C)+|U_{ad}^{l}|^{2}IG_{U_{ad}^{l}}(D|C)+2{\textstyle\sum_{i=3,4}|X_{C}(y_{% i})-[y_{i}]_{D}^{U_{l}}|})=\frac{1}{9}$ .

The updating incremental mechanisms of information granularity are introduced when deleting a group unlabeled objects or labeled objects, the detailed descriptions are given as Theorem 7 and Theorem 8.

Theorem 7. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{u}=\{x_{1},x_{2},\ldots,x_{|U_{u}|}\}$ , suppose $U_{de}=U_{de}^{u}\cup U_{de}^{l}$ is a group of deleting objects, $U_{de}^{u}=\{y_{1},y_{2},\ldots,y_{|U_{de}^{u}|}\}$ . Then the information granularity of $U_{u}-U_{de}^{u}$ under $B$ becomes

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}-U_{de}^{u}|^{2}}(|U_{u}|^{2}% IG_{U_{u}}(B)-|U_{de}^{u}|^{2}IG_{U_{de}^{u}}(B)-2{\textstyle\sum_{i=1}^{|U_{% de}^{u}|}|X_{B}(y_{i})|})$ (12)

where $X_{B}(y)$ is the object set of $U_{u}-U_{de}^{u}$ that satisfies the neighborhood relation with the object $y\in U_{de}^{u}$ under $B$ .

Proof. It follows from Eq. (2) that the information granularity of $U_{u}-U_{de}^{u}$ under $B$ is

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}-U_{de}^{u}|}{\textstyle\sum_{% i=1}^{|U_{u}-U_{de}^{u}|}\frac{|\delta^{\prime}_{B}(x_{i})|}{|U_{u}-U_{de}^{u}% |}},$

since

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-U_{de}^{u}|}|\delta^{\prime}_{B}(x_% {i})|}={\textstyle\sum_{i=1}^{|U_{u}-U_{de}^{u}|}|\delta_{B}(x_{i})|}-{% \textstyle\sum_{i=1}^{|U_{de}^{u}|}|X_{B}(y_{i})|}+{\textstyle\sum_{i=1}^{|U_{% de}^{u}|}|\delta_{B}(y_{i})|}-{\textstyle\sum_{i=1}^{|U_{de}^{u}|}|\delta_{B}(% y_{i})|},$

let

$\displaystyle{\textstyle\sum_{i=1}^{|U_{de}^{u}|}|\delta_{B}(y_{i})|}={% \textstyle\sum_{i=1}^{|U_{de}^{u}|}|X_{B}(y_{i})\cup Y_{B}(y_{i})|},$

where $X_{B}(y)\cap Y_{B}(y)=\varnothing$ and $X_{B}(y)=\delta_{B}(y)\cap\{U_{u}-U_{de}^{u}\}$ , $Y_{B}(y)=\delta_{B}(y)\cap U_{de}^{u}$ (For $U_{de}^{u}$ , $Y_{B}(y)$ is the neighborhood granule of $y$ ). Then

$\displaystyle{\textstyle\sum_{i=1}^{|U_{u}-U_{de}^{u}|}|\delta^{\prime}_{B}(x_% {i})|}={\textstyle\sum_{i=1}^{|U_{u}|}|\delta_{B}(x_{i})|}-{\textstyle\sum_{i=% 1}^{|U_{de}^{u}|}|Y_{B}(y_{i})|}-2{\textstyle\sum_{i=1}^{|U_{de}^{u}|}|X_{B}(y% _{i})|},$

the new information granularity becomes

$\displaystyle IG^{\prime}_{u}(B)=\frac{1}{|U_{u}-U_{de}^{u}|^{2}}(|U_{u}|^{2}% IG_{U_{u}}(B)-|U_{de}^{u}|^{2}IG_{U_{de}^{u}}(B)-2{\textstyle\sum_{i=1}^{|U_{% de}^{u}|}|X_{B}(y_{i})|}).$

Example 10. (Continuation of Example 2) If a group of deleting objects $U_{de}=\{y_{1},y_{2},y_{3},y_{4}\}$ is deleted from the partially labeled hybrid decision system shown in Table 1, where $U_{de}^{u}=\{y_{1},y_{2}\}$ , $y_{1}=x_{5}$ , $y_{2}=x_{7}$ ; $U_{de}^{l}=\{y_{3},y_{4}\}$ , $y_{3}=x_{2}$ , $y_{4}=x_{3}$ . For Example 2, the original information granularity $IG_{u}(C)=\frac{3}{8}$ . By computing, it can be obtained that $IG_{U_{de}^{u}}(C)=\frac{1}{2}$ , $X_{C}(y_{1})=\varnothing$ , $X_{C}(y_{2})=\varnothing$ , from Eq. (12), the new information granularity of $C$ is $IG^{\prime}_{u}(C)=\frac{1}{|U_{u}-U_{de}^{u}|^{2}}(|U_{u}|^{2}IG_{U_{u}}(C)-|% U_{de}^{u}|^{2}IG_{U_{de}^{u}}(C)-2{\textstyle\sum_{i=1}^{|U_{de}^{u}|}|X_{C}(% y_{i})|})=1$ .

Theorem 8. Let $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ be a partially labeled hybrid decision system, for $\forall B\subseteq C$ , $U_{l}=\{x_{1},x_{2},\ldots,x_{|U_{l}|}\}$ , suppose $U_{de}=U_{de}^{u}\cup U_{de}^{l}$ is a group of deleting objects, $U_{de}^{l}=\{y_{1},y_{2},\ldots,y_{|U_{de}^{l}|}\}$ . Then, the information granularity of $D$ relative to $B$ becomes

$\displaystyle IG^{\prime}_{l}(D|B)=\frac{1}{|U_{l}-U_{de}^{l}|^{2}}(|U_{l}|^{2% }IG_{U_{l}}(D|B)-|U_{de}^{l}|^{2}IG_{U_{de}^{l}}(D|B)-2{\textstyle\sum_{i=1}^{% |U_{de}^{l}|}|X_{B}(y_{i})-X_{D_{k}}|})$ (13)

where $X_{B}(y)$ is the object set of $U_{l}-U_{de}^{l}$ that satisfies the neighborhood relation with $y\in U_{de}^{l}$ under $B$ , $X_{D_{k}}(1\leqslant k\leqslant j)$ is the object set of $U_{l}-U_{de}^{l}$ that satisfies the decision equivalence relation with $y\in U_{de}^{l}$ under $D$ .

Proof. It follows from Eq. (3) that

$\displaystyle IG^{\prime}_{l}(D|B)=\frac{1}{|U_{l}-U_{de}^{l}|}{\textstyle\sum% _{i=1}^{|U_{l}-U_{de}^{l}|}\frac{|\delta^{\prime}_{B}(x_{i})-D^{\prime}_{k}|}{% |U_{l}-U_{de}^{l}|}},$

since

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|\delta^{\prime}_{B}(x_% {i})-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|X_{B}(x_{i})-% X_{D_{k}}|}={\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|X_{B}(x_{i})-X_{D_{k}}|}$ $\displaystyle{}+{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|Y_{B}(x_{i})-Y_{D_{% k}}|}-{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|Y_{B}(x_{i})-Y_{D_{k}}|}+{% \textstyle\sum_{i=1}^{|U_{de}^{l}|}|\delta_{B}(y_{i})-D_{k}|}$ $\displaystyle{}-{\textstyle\sum_{i=1}^{|U_{de}^{l}|}|\delta_{B}(y_{i})-D_{k}|},$

where $X_{B}(x)\cap Y_{B}(x)=\varnothing$ , $X_{B}(x)\cup Y_{B}(x)=\delta_{B}(x)$ (For $U_{l}$ , $\delta_{B}(x)$ is the neighborhood granule of object $x$ under $B$ ), let $X_{B}(x)=\delta_{B}(x)\cap\{U_{l}-U_{de}^{l}\}$ , $Y_{B}(x)=\delta_{B}(x)\cap U_{de}^{l}$ ; $X_{D_{k}}\cap Y_{D_{k}}=\varnothing$ , $X_{D_{k}}\cup Y_{D_{k}}=D_{k}(1\leqslant k\leqslant j)$ (For $U_{l}$ , $D_{k}$ is the decision class of object $x$ under $D$ ), let $X_{D_{k}}=D_{k}\cap\{U_{l}-U_{de}^{l}\}$ , $Y_{D_{k}}=D_{k}\cap U_{de}^{l}$ . It can be obtained by merging that

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|\delta^{\prime}_{B}(x_% {i})-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|\delta_{B}(x_% {i})-D_{k}|}+{\textstyle\sum_{i=1}^{|U_{de}^{l}|}|\delta_{B}(y_{i})-D_{k}|}$ $\displaystyle{}-{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|Y_{B}(x_{i})-Y_{D_{% k}}|}-{\textstyle\sum_{i=1}^{|U_{de}^{l}|}|\delta_{B}(y_{i})-D_{k}|},$

let $\delta_{B}(y)=X_{B}(y)\cup Y_{B}(y)$ , $X_{B}(y)\cap Y_{B}(y)=\varnothing$ , where $X_{B}(y)=\delta_{B}(y)\cap\{U_{l}-U_{de}^{l}\}$ , $Y_{B}(y)=\delta_{B}(y)\cap U_{de}^{l}$ , and $D_{k}=X_{D_{k}}\cup Y_{D_{k}}$ , $X_{D_{k}}\cap Y_{D_{k}}=\varnothing$ , where $X_{D_{k}}=D_{k}\cap\{U_{l}-U_{de}^{l}\}$ , $Y_{D_{k}}=D_{k}\cap U_{de}^{l}$ . Then

According to the symmetrical relation,

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|Y_{B}(x_{i})-Y_{D_{k}}% |}={\textstyle\sum_{i=1}^{|U_{de}^{l}|}|X_{B}(y_{i})-X_{D_{k}}|},$

thus

$\displaystyle{\textstyle\sum_{i=1}^{|U_{l}-U_{de}^{l}|}|\delta^{\prime}_{B}(x_% {i})-D^{\prime}_{k}|}={\textstyle\sum_{i=1}^{|U_{l}|}|\delta_{B}(x_{i})-D_{k}|% }-{\textstyle\sum_{i=1}^{|U_{de}^{l}|}|Y_{B}(y_{i})-Y_{D_{k}}|}-2{\textstyle% \sum_{i=1}^{|U_{de}^{l}|}|X_{B}(y_{i})-X_{D_{k}}|},$

the information granularity of $D$ relative to $B$ becomes

Example 11. (Continuation of Example 10) For Example 3, the original information granularity of $D$ relative to $C$ is $IG_{l}(D|C)=\frac{1}{4}$ . For Example 10, $U_{de}^{l}\{y_{3},y_{4}\}$ , $y_{3}=x_{2}$ , $y_{4}=x_{3}$ . By computing, we can obtain that $IG_{U_{de}^{l}}(D|C)=\frac{1}{2}$ , $X_{C}(y_{3})=\varnothing$ , $X_{D_{1}}=\{x_{8}\}$ ; $X_{C}(y_{4})=\varnothing$ , $X_{D_{2}}=\{x_{6}\}$ , the new information granularity of $D$ relative to $C$ is $IG^{\prime}_{l}(D|C)=\frac{1}{|U_{l}-U_{de}^{l}|^{2}}(|U_{l}|^{2}IG_{U_{l}}(D|% C)-|U_{de}^{l}|^{2}IG_{U_{de}^{l}}(D|C)-2{\textstyle\sum_{i=1}^{|U_{de}^{l}|}|% X_{C}(y_{i})-X_{D_{k}}|})=\frac{1}{2}$ .

Based on the above discussions, the group incremental feature selection algorithm based on the infromation granulrity is presented when a group of objects changes dynamically, the detailed description is given as Algorithm GIFS.

3 The group incremental feature selection algorithm based on the information granularity (GIFS)

[1] A partially labeled hybrid decision system $\textit{PDS}=(U=U_{u}\cup U_{l},A=C\cup D,V,f)$ , the original feature subset Red, $IG_{u}(\textit{Red})$ and $IG_{l}(D|\textit{Red})$ , a group of adding objects $U_{ad}$ and a group of deleting objects $U_{de}$ . A new feature subset $\textit{Red}^{\prime}$ . set $\textit{Red}^{\prime}\leftarrow\textit{RED}$ and normalize the numerical feature values; a group of deleting objects $U_{de}$ is deleted from PDSset $U^{\prime}=U-U_{de}$ ; go to Step 13; Compute the information granularity $IG_{U^{\prime}}(\textit{Red}^{\prime})$ and $IG_{U^{\prime}}(D|\textit{Red}^{\prime})$ by Eqs (12) and (13); each $a\in\textit{Red}^{\prime}$ $\textit{Sig}_{in}(a,\textit{Red}^{\prime},D)=0$ set $\textit{Red}^{\prime}\leftarrow\textit{Red}^{\prime}-\{a\}$ ; a group of adding objects $U_{ad}$ is added into PDSset $U^{\prime}=U\cup U_{ad}$ and normalize numerical feature values; go to Step 29; Compute the information granularity $IG_{U^{\prime}}(\textit{Red}^{\prime})$ , $IG_{U^{\prime}}(D|\textit{Red}^{\prime})$ and $IG_{U^{\prime}}(C)$ , $IG_{U^{\prime}}(D|C)$ by Eqs (10) and (11); $IG_{U^{\prime}}(\textit{Red}^{\prime})=IG_{U^{\prime}}(C)$ and $IG_{U^{\prime}}(D|\textit{Red}^{\prime})=IG_{U^{\prime}}(D|C)$ go to Setp 29; go to Step 24; For each $a\in C-\textit{Red}^{\prime}$ , compute $\textit{Sig}_{\textit{out}}(a,\textit{Red}^{\prime},D)$ and sort the rest features by a descending sequence $\{a_{1},a_{2},\ldots,\linebreak a_{|C-\textit{Red}^{\prime}|}\}$ ; $IG_{U^{\prime}}(\textit{Red}^{\prime})\neq IG_{U^{\prime}}(C)$ , $IG_{U^{\prime}}(D|\textit{Red}^{\prime})\neq IG_{U^{\prime}}(D|C)$ for $j=1$ to $|C-\textit{Red}^{\prime}|$ do select $\textit{Red}^{\prime}=\textit{Red}^{\prime}\cup\{a_{j}\}$ and compute $IG_{U^{\prime}}(\textit{Red}^{\prime})$ , $IG_{U^{\prime}}(D|\textit{Red}^{\prime})$ by Eqs (10) and (11); each $b\in\textit{Red}^{\prime}$ $\textit{Sig}_{in}(b,\textit{Red}^{\prime},D)=0$ set $\textit{Red}^{\prime}\leftarrow\textit{Red}^{\prime}-\{b\}$ ; Red’;

The time complexity of GIFS. Step 7 is used to compute the information granularity of feature subset $\textit{Red}^{\prime}$ , the time complexity is $O(|U_{de}||U^{\prime}||Red^{\prime}|+|U_{de}|^{2}|Red^{\prime}|)$ . For Steps 8–12, the redundant features are removed when a group of deleting objects is deleted, and the time complexity is $O((|U_{de}||U^{\prime}||\textit{Red}^{\prime}|+|U_{de}|^{2}|\textit{Red}^{% \prime}|)\cdot|\textit{Red}^{\prime}|)$ . Step 18 computes the information granularity of feature subsets $\textit{Red}^{\prime}$ and $C$ , the time complexity is $O(|U_{ad}||U||\textit{Red}^{\prime}|+|U_{ad}|^{2}|\textit{Red}^{\prime}|)$ . For Steps 24–28, the remaining features are sorted, and the most significant feature is added into the candidate feature subset $\textit{Red}^{\prime}$ until termination condition is reached, the time complexity is $O((|U_{ad}||U||\textit{Red}^{\prime}|+|U_{ad}|^{2}|\textit{Red}^{\prime}|)% \cdot|C-\textit{Red}^{\prime}|)$ . For Steps 29–33, the redundant features are deleted from the candidate feature subset $\textit{Red}^{\prime}$ , the time complexity is $O(|U_{ad}||U||\textit{Red}^{\prime}|^{2}+|U_{ad}|^{2}|\textit{Red}^{\prime}|^{% 2})$ . Therefore, the time complexity of GIFS is $O(|U^{\prime}||C|^{2}\cdot\textit{max}(|U_{ad}|,|U_{de}|))$ . When a group of objects changes dynamically, the time complexity of IFS is $O(|U^{\prime}||C|^{2}\cdot|U_{ad}|\cdot|U_{de}|)$ . The average of GIFS is more obvious than IFS. As analyzed above, GIFS is more efficient than IFS when original data set contains a large amount of objects.

6. Experimental analysis

In this section, a series of experiments are conducted to verify the efficiency and effective of the proposed feature selection algorithms. The experiments are conducted on six data sets from UCI [40], and the specific information of each data set is shown in Table 2. The experiments are implemented using PyCharm 2017 on a PC with Windows 10, Core(TM) Duo CPU 2.40 GHz and 8 GB of RAM. All of the algorithms are coded in Python.

Table 2
Data sets table

Data set	Objects	Features	Classes	Missing
Hepatitis	155	19	3	Y
Credit	680	15	2	Y
Australian	691	14	2	N
German	1000	20	2	N
Wpbc	198	34	2	Y
Anneal	798	38	6	Y

For each data set in Table 2, the data sets are preprocessed. At first, normalize the numerical features to set the value field to $[0,1]$ . Second, select randomly 10% of the objects as labeled objects and the remaining 90% as unlabeled objects. Four data sets Hepatitis, Credit, Wpbc and Anneal are all incomplete data with missing conditional feature values. For the complete data sets, namely, Australian and German, we randomly change 5% of the known features values from each original data set into missing values to create incomplete data sets. To illustrate the efficiency of the proposed algorithms IFS and GIFS in processing dynamic data, let $U$ denote the whole object set of the data set, 70% of the objects are selected as the original data set, i.e., $0.7*|U|$ , and the remaining 30% objects are divided into five parts of equal sizes, i.e., $|x_{i}|=\frac{0.3\times|U|}{5},i=1,2,\ldots,5$ . The first part is viewed as the first incremental object set, the first two parts is viewed as the second incremental object set, …, and all the five part is viewed as the fifth incremental object set, i.e., $X_{i}={\textstyle\bigcup_{j=1}^{i}x_{i}},i=1,2,\ldots,5$ . In addition, we select randomly 10% of each part as the deleting object set.

6.1 Parameter setting

For the proposed algorithms IFS and GIFS, the parameter values $\varepsilon$ are important in computing the neighborhood granules. It can be considered as a neighborhood radius to control the size of neighborhood granules in data analysis. Different neighborhood radius settings form different granularity levels. The appropriate parameter values $\varepsilon$ play an important role in the feature selection process, which affects the feature significance. In this section, in order to analyze the influence of different parameters on the classification performance of feature selection results, a series of experiments are aimed at the original data sets which are preprocessed. $\varepsilon$ ranges from 0.1 to 0.4 in steps of 0.02 to select the appropriate parameter for each data set. In each sub-figure, the $x$ -coordinate represents different parameters, the $y$ -coordinates represent the number of features and the classification accuracies of feature selection results.

It can be seen from Fig. 1 that, feature selection results with different number of features and classification accuracies can be obtained by using different parameters to feature selection with the original data sets, the classification accuracies and the sizes of feature subset are constantly change with the values of $\varepsilon$ increases. Take German data set for example, as is apparently depicted in the chart, it is obvious that when $\varepsilon=0.10$ , the feature selection algorithm can obtain the maximum classification accuracy 75.3% and the minimum size of feature subset 11. Therefore, the optimal $\varepsilon$ value of German data set is 0.10. For Australian data set, the feature selection algorithm can obtain a high classification accuracy and a small size of feature subset when $\varepsilon=0.10$ . For the Credit data set, the experimental results show that it is hardly affected by parameter $\varepsilon$ , the parameter $\varepsilon$ of this data set is 0.10. The main reason is that the neighborhood granules of Credit are not change greatly when $\varepsilon$ ranges from 0.1 to 0.4. For the three data sets Hepatitis, Anneal and Wpbc, the values of $\varepsilon$ are set to 0.28, 0.26 and 0.12, respectively.

Figure 1.

Classification accuracies and sizes of feature subset on different parameters $\varepsilon$ in six data sets.

6.2 Performance of the proposed algorithm

To verify the efficiency and effectiveness of the proposed incremental algorithms IFS and GIFS, Algorithms IFS and GIFS are compared with the non-incremental algorithm NIFS mentioned by Section 4 and the non-incremental standard feature selection algorithm based on information entropy denoted by NSFS [41] when an object set changes dynamically in each data set presented in Table 2, with regard to the results of feature subset, classification accuracy and the computational time.

6.2.1 Feature subset size and classification accuracy

In order to evaluate the classification performance of IFS and GIFS, the classifiers C4.5, SVM and KNN are used to evaluate the feature selection algorithms. The 10-fold cross-validation is used to obtain the final classification accuracy. For each data set, we divide it into ten parts of equal sizes, nine parts of them are used to find the feature subset, the classification performance of the feature subset is tested on the last one part. At last, the average classification accuracy of experiments is used as the final classification accuracy. Table 3 shows the feature selection results of Algorithms GIFS, IFS, NIFS, NSFS. Table 4 records the average classification accuracies of Algorithms GIFS, IFS, NIFS, NSFS.

Table 3
The feature subset size of GIFS, IFS, NIFS and NSFS

Data set	Algorithm	Feature subset size	Feature selection results
Hepatitis	GIFS	13	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 19
	IFS	10	1, 2, 3, 4, 5, 6, 10, 11, 12, 15
	NIFS	11	1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 15
	NSFS	11	1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 15
Credit	GIFS	14	1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
	IFS	14	1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
	NIFS	12	1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14
	NSFS	14	1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Australian	GIFS	10	1, 2, 3, 4, 5, 8, 10, 11, 13, 14
	IFS	10	1, 2, 3, 4, 5, 8, 10, 11, 13, 14
	NIFS	10	1, 2, 3, 4, 5, 8, 10, 12, 13, 14
	NSFS	10	1, 2, 3, 4, 5, 8, 10, 11, 13, 14
German	GIFS	13	1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17
	IFS	13	1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17
	NIFS	13	1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17
	NSFS	13	1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17
Wpbc	GIFS	7	2, 3, 9, 15, 19, 21, 27
	IFS	7	2, 3, 9, 10, 15, 21, 27
	NIFS	6	2, 3, 15, 20, 21, 27
	NSFS	7	2, 3, 5, 15, 21, 27, 31
Anneal	GIFS	17	1, 3, 4, 5, 7, 8, 9, 12, 13, 17, 32, 33, 34, 35, 36, 37, 38
	IFS	17	1, 3, 4, 5, 7, 8, 9, 12, 13, 17, 32, 33, 34, 35, 36, 37, 38
	NIFS	17	1, 3, 4, 5, 7, 8, 9, 12, 13, 17, 32, 33, 34, 35, 36, 37, 38
	NSFS	17	1, 3, 4, 5, 7, 8, 9, 12, 13, 17, 32, 33, 34, 35, 36, 37, 38

Table 4

Classification accuracies of GIFS, IFS, NIFS and NSFS with the classifiers C4.5, SVM and KNN

Data set	Size	Accuracy			GIFS			IFS			NIFS			NSFS
		C4.5	SVM	KNN	C4.5	SVM	KNN	C4.5	SVM	KNN	C4.5	SVM	KNN	C4.5	SVM	KNN
Hepatitis	19	81.50	81.29	81.90	81.90	84.52	83.00	81.33	82.67	81.90	81.31	82.54	82.60	81.31	82.54	82.60
Credit	15	84.75	85.70	85.00	85.63	85.70	85.06	85.63	85.70	85.06	85.48	85.52	85.86	85.63	85.70	85.06
Australian	14	85.13	85.52	85.33	86.49	85.93	85.73	86.49	85.93	85.73	86.49	85.93	85.73	86.49	85.93	85.73
German	20	73.03	76.70	75.11	76.03	76.81	75.30	76.03	76.81	75.30	76.03	76.81	75.30	76.03	76.81	75.30
Wpbc	34	73.83	80.68	77.63	75.87	80.84	79.30	74.28	80.68	78.96	74.06	81.80	80.63	72.72	80.30	80.86
Anneal	38	98.53	98.91	98.83	98.63	99.06	98.90	98.63	99.06	98.90	98.63	99.06	98.90	98.63	99.06	98.90
Average		82.79	84.80	83.96	84.09	85.47	84.54	83.73	85.14	84.25	83.66	85.27	84.83	83.46	85.05	84.74

Figure 2.

Computational time of Algorithms NIFS, IFS, GIFS and NSFS in six data sets.

It can be seen from Table 3 that the results of feature selection obtained by three algorithms GIFS, IFS and NIFS are similar in most case, because Algorithms GIFS, IFS, NIFS are all based on the information granularity, the calculations of the feature significance are the same. It is noted that the proposed algorithms GIFS and IFS can get the same or similar feature selection results as the algorithm NSFS. Table 3 shows that the algorithms GIFS and IFS can effectively select the feature subset.

It can be seen from Table 4, compared with the classification accuracies without feature selection in each data set, the classification accuracies of Algorithms GIFS, IFS, NIFS, NSFS are improved. Meanwhile, the classification accuracies of algorithms GIFS, IFS, NIFS and NSFS are similar in classifiers C4.5, SVM and KNN at most cases. Take Australian data set for example, the classification accuracies of Algorithms GIFS, IFS, NIFS, NSFS are the same in C4.5, SVM and KNN, which is 86.49%, 85.93% and 85.73%. From the last row of Table 4, the average classification accuracies obtained by GIFS and IFS are similar to the NIFS and NSFS under classifiers C4.5, SVM, KNN, indicating that the proposed algorithms GIFS and IFS can effectively obtain the feature subset.

6.2.2 Computational time

To verify the efficiency of the proposed Algorithms IFS and GIFS, Fig. 2 presents the computational time by Algorithms GIFS, IFS, NIFS, NSFS when an object set changes dynamically. In each sub-figure, the $x$ -coordinate represents the variation size of object set, and the $y$ -coordinate represents the computational time of different algorithms.

It can be seen from Fig. 2, the computational time of Algorithms GIFS, IFS, NIFS and NSFS increases as the variation size of object set increases. In each sub-figure, the computational time of the proposed algorithms IFS and GIFS are obviously shorter than NIFS. Take German data set for example, when the object set changes dynamically for the third time, the computation time of the NSFS, NIFS, IFS and GIFS are 233.26 s, 213.10 s, 68.63 s and 30.58 s. The IFS time consumption decreases by 70.57% and 67.79% relative to the NSFS and NIFS, the GIFS time consumption decreases by 86.89% and 85.64% relative to the NSFS and NIFS. Hence, the proposed algorithms IFS and GIFS are more effective than non-incremental algorithm NIFS and the algorithm NSFS. In addition, as the variation size of object set increases, the variation trend of the computational time by GIFS is more gentle than IFS and the advantage of the GIFS is more significant, indicating that GIFS can save more computational time than IFS.

As mentioned above, the experimental results show that the proposed algorithms IFS and GIFS are more effective than NIFS and NSFS, and can greatly reduce the computation time of feature selection while maintaining the classification accuracy. Meanwhile, GIFS takes less computation time than IFS when a group of objects dynamically changes.

7. Conclusions and future work

In many practical applications, the partially labeled hybrid data is ubiquitous. For the dynamic partially labeled hybrid decision data, the traditional (non-incremental) feature selection algorithm is given. However, the traditional feature selection is time-consuming when an object set changes dynamically, which cannot be obtained in real time. To solve this problem, we proposed the incremental feature selection algorithms based on information granularity. When a single object or a group of objects dynamically changes, the information granularity is obtained by updating incremental mechanism, a lot of repeated calculations can be avoided. When a group of objects dynamically changes, the group incremental algorithm is more efficient than the single incremental algorithm. Furthermore, a series of experimental results on six data sets from UCI and show that the proposed incremental feature selection algorithms can efficiently obtain new feature subset by using the incremental manner when an object set changes dynamically. At the same time, our next research work will consider how to efficiently select the feature subset according to the dynamic changes of the feature set in the partially labeled hybrid data.

Footnotes

Acknowledgments

This work is supported by National Natural Science Foundation of China (61662023 and 61966016), and Natural Science Foundation of Jiangxi Province (20202BABL202037 and 20192BAB207018).

References

Zhao

and Qian

K.Y.

, Mixed feature selection in incomplete decision table, Knowledge-Based Systems 57(2) (2014), 181–190.

Wang

Qian

Y.H.

Liang

X.Y.

Guo

and Liang

J.Y.

, Local neighborhood rough set, Knowledge-Based Systems 153(8) (2018), 53–64.

Yan

Y.J.

and Dai

J.H.

, Unsupervised feature selection for interval ordered information systems, Pattern Recognition and Artificial Intelligence 30(10) (2017), 928–936.

T.H.

Jia

D.D.

Zhou

H.H.

Xue

and Cao

, Feature selection using forest optimization algorithm based on contribution degree, Intelligent Data Analysis 22(6) (2018), 1189–1207.

Qiu

C.Y.

and Xiang

, Feature selection using a set based discrete particle swarm optimization and a novel feature subset evaluation criterion, Intelligent Data Analysis 23(1) (2019), 5–21.

Pawlak

, Rough sets, International Journal of Parallel Pro-Gramming 11(5) (1982), 341–356.

Zhang

Miao

D.Q.

and Gao

, Semi-supervised data feature reduction based on rough-subspace Ensemble learning, Journal of Chinese Computer Systems 37(12) (2016), 2727–2732.

Zhang

Z.W.

Jing

X.Y.

and Wang

T.J.

, Label propagation based semi-supervised learning for software defect prediction, Automated Software Engineering 24(1) (2017), 47–69.

Wan

Chen

X.L.

Zhang

J.H.

and Ou

Z.L.

, Semi-supervised feature selection based on low-rank sparse graph embedding, Journal of Image and Graphics 23(9) (2018), 1316–1325.

10.

Feng

Q.R.

and Jing

H.B.

, Approximations and uncertainty measurements in ordered information systems, Intelligent Data Analysis 20(4) (2016), 723–743.

11.

Q.H.

Xie

Z.X.

and Yu

D.R.

, Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, Pattern Recognition 40(12) (2007), 3509–3521.

12.

Zeng

A.P.

T.R.

Liu

Zhang

J.B.

and Chen

H.M.

, A fuzzy rough set approach for incremental feature selection on hybrid information systems, Fuzzy Sets and Systems 258(1) (2015), 39–60.

13.

H.Z.

Feng

and Wang

, Hephaistos: A fast and distributed outlier detection approach for big mixed attribute data, Intelligent Data Analysis 23(4) (2019), 759–778.

14.

Q.H.

D.R.

Liu

J.F.

and Wu

C.X.

, Neighborhood rough set based heterogeneous feature subset selection, Information Science 178(18) (2008), 3577–3594.

15.

Sun

J.C.

and Tian

, Feature selection using rough-entropy based uncertainty measures in incomplete decision system, Knowledge-based Systems 36(12) (2012), 206–216.

16.

Sun

Wang

L.Y.

Qian

Y.H.

J.C.

and Zhang

S.G.

, Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems, Knowledge-based Systems 186(12) (2019), 1–19.

17.

Wei

Liang

J.Y.

and Qian

Y.H.

, A comparative study of rough sets for hybrid data, Information Science 190(6) (2012), 1–16.

18.

Han

Jin

X.N.

and Li

J.X.

, An assessment method for the impact of missing data in the rough set-based decision fusion, Intelligent Data Analysis 20(6) (2016), 1267–1284.

19.

Wang

Liu

J.C.

and Wei

, Semi-supervised feature selection algorithm based on information entropy, Computer Science 45(11) (2018), 427–430.

20.

Dai

J.H.

Q.H.

Zhang

J.H.

and Zheng

N.G.

, Attribute selection for partially labeled categorical data by rough set approach, IEEE Transactions on Cybernetics 47(9) (2017), 2460–2471.

21.

Liu

K.Y.

Yang

X.B.

H.L.

J.S.

Wang

P.X.

and Chen

X.J.

, Rough set based semi-supervised feature selection via ensemble selector, Knowledge-Based Systems 165 (2018), 282–296.

22.

Xiao

L.S.

Wang

H.J.

and Yang

, Semi-supervised feature selection based on attribute dependency and hybrid constraint, Journal of Computer Applications 35(12) (2015), 80–84.

23.

Wang

and Liang

, An efficient feature selection algorithm for hybrid data, Neurocomputing 53(12) (2016), 33–41.

24.

Chen

D.G.

and Yang

Y.Y.

, Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models, IEEE Transactions on Fuzzy Systems 22(5) (2014), 1325–1334.

25.

Zhang

Mei

C.L.

Chen

D.G.

and Li

J.H.

, Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56(8) (2016), 1–15.

26.

Kim

and Jun

, Rough set model based feature selection for mixed-type data with feature space decomposition, Expert Systems with Applications 103(8) (2018), 196–205.

27.

Yang

Y.Y.

Chen

D.G.

and Wang

, Incremental perspective for feature selection based on fuzzy rough sets, IEEE Transactions on Fuzzy Systems 26(3) (2018), 1257–1273.

28.

Liang

J.Y.

Wang

Dang

C.Y.

and Qian

Y.H.

, A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering 26(2) (2013), 294–308.

29.

F.M.

Ding

M.W.

Zhang

T.F.

and Jie

, Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data, Neurocomputing 344(7) (2019), 20–27.

30.

Shu

W.H.

Qian

W.B.

and Xie

Y.H.

, Incremental approaches for feature selection from dynamic data with the variation of multiple objects, Knowledge-Based System 163(1) (2019), 320–331.

31.

Zhang

J.B.

T.R.

Ruan

and Liu

, Neighborhood rough sets for dynamic data mining, International Journal of Intelligent Systems 27(4) (2012), 317–342.

32.

Jing

Y.G.

T.R.

Hamido

Wang

B.L.

and Cheng

, An incremental attribute reduction method for dynamic data mining, Information Sciences 465(10) (2018), 202–218.

33.

Zheng

Wang

and Hong

T.T.

, Incremental attribute reduction based on relational matrix, Journal of Chinese Computer Systems 39(5) (2018), 1000–1004.

34.

Wang

Liang

J.Y.

and Qian

Y.H.

, Attribute reduction: A dimension incremental strategy, Knowledge-based Systems 39 (2013), 95–108.

35.

Wei

X.Y.

Liang

J.Y.

Cui

J.B.

and Sun

Y.J.

, Discernibility matrix based incremental attribute reduction for dynamic data, Knowledge-Based Systems 140(1) (2018), 142–157.

36.

Luo

T.R.

Chen

H.M.

Hamido

and Yi

, Incremental rough set approach for hierarchical multicriteria classification, Information Sciences 429(3) (2018), 72–87.

37.

Shu

W.H.

and Shen

, Incremental feature selection based on rough set in dynamic incomplete data, Pattern Recognition 47(12) (2014), 3890–3906.

38.

Liu

T.R.

and Zhang

J.B.

, A rough set-based incremental approach for learning knowledge in dynamic information systems, International Journal of Approximate Reasoning 55(9) (2014), 1764–1786.

39.

Tang

Liu

Z.Y.

K.L.

and Li

K.Q.

, Real-time incremental recommendation for streaming data based on apache flink, Intelligent Data Analysis 23(6) (2019), 1421–1437.

40.

UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html.

41.

Mariello

and Battiti

, Feature selection based on the neighborhood entropy, IEEE Transactions on Neural Networks and Learning Systems 29(12) (2018), 6313–6322.

Information granularity-based incremental feature selection for partially labeled hybrid data

Abstract

Keywords

1. Introduction

2. Related works

3. Preliminaries

3.1 The partially labeled hybrid decision system

Table 1 The partially labeled hybrid decision system

5. Incremental feature selection with the variation of an object set for the partially labeled hybrid data

5.1 Incremental feature selection with the variation of a single object for the partially labeled hybrid data

Table 2 Data sets table

6.2.1 Feature subset size and classification accuracy

Table 3 The feature subset size of GIFS, IFS, NIFS and NSFS

7. Conclusions and future work

Footnotes

Acknowledgments

References

Table 1
The partially labeled hybrid decision system

Table 2
Data sets table

Table 3
The feature subset size of GIFS, IFS, NIFS and NSFS