A multi-measure feature selection method for decision systems with preference relation

Abstract

Feature selection focuses on selecting important features that can improve the accuracy and simplification of the learning model. Nevertheless, for the ordered data in many real-world applications, most of the existing feature selection algorithms take the single-measure into consideration when selecting candidate features, which may affect the classification performance. Based on the insights obtained, a multi-measure feature selection algorithm is developed for ordered data, which not only considers the certain information by the dominance-based dependence, but also uses the discern information provided by the dominance-based information granularity. Extensive experiments are performed to evaluate the performance of the proposed algorithm on UCI data sets in terms of the number of selected feature subset and classification accuracy. The experimental results demonstrate that the proposed algorithm not only can find the relevant feature subset but also the classification performance is better than, or comparably well to other feature selection algorithms.

Keywords

Ordered decision system dominance-based rough set multi-measure feature selection

1 Introduction

Feature selection [1–3] is an efficient method for data pre-processing by selecting important features and deleting redundant features to reduce the influence of data dimension and noise data, so as to improve data tightness and classification ability. Therefore, it has been widely used in knowledge discovery, pattern recognition, data mining and other fields [4–6]. For instance, Pan et al. [7] established an evaluation model to children’s foot & ankle deformity severity based on feature selection algorithm. Hu et al. [8] proposed a feature selection method, which accelerates the global convergence speed and improves the exploration efficiency. Zhao et al. [9] studied a feature selection approach for analyzing high-dimensional small sample data sets. Kaur et al. [10] designed a feature selection algorithm to detect depression. Zhang et al. [11] investigated a new model based on filter feature selection method and the classification performance is improved. Rough set theory is an effective mathematical tool for dealing with uncertainty, inconsistency, and incomplete knowledge [12–15]. Its main advantage is to deal with data directly, and to mine potentially useful knowledge from data sets. Consequently, this theory has been widely used in feature selection in the field of artificial intelligence.

Feature measure plays an important role as the selection basis of important features in feature selection process. Based on rough sets, many different feature measures are proposed and they can be divided into two categories from different viewpoints: the algebra theory approach and the information theory approach. For the algebra theory approach, Pawlak [16] and other scholars proposed efficient feature measures, such as approximation accuracy [20], approximation roughness [17], dependence measure [2] and discernibility matrix measure [18]. This type of feature measure can characterize the dependence between conditional features and decision feature or evaluation the certainty knowledge under the feature subsets. For the information theory approach, Shannon [19] introduced the concept of entropy in physics into information systems and expanded it, then proposed the concept of information entropy. Information entropy can quantify the amount of information in the data. It is used to evaluation the uncertainty knowledge under the feature subset. Many scholars use the information entropy and its extension as the feature measure to select feature subset [21–23]. However, the feature measures proposed in the above literature can not deal with the ordered data.

In many real-world applications, feature values of the collected objects are preferred. That is, the ranking of feature values. For example, in a car information system comprising indexes and the total evaluation, there is consistency between conditional features and decision feature. The total evaluation is better when a car has higher indexs in all aspects. The ordered data is generated. In order to accurately describe the relationship between conditional features and decision feature in ordered data, Greco et al. [24] proposed the dominance-based rough set and introduced the dominance relation which is the expansion of the traditional equivalence relation. The dominance relation is more in line with people’s thinking about the data in practical application. Under the dominance-based rough set model, many scholars have extended the feature measure in the traditional rough set theory to the ordered decision system, and have achieved some results [29, 30]. For robust fuzzy dominance rough sets, Sang et al. [25] designed a feature selection method based on self-adaptive weighted interaction. For the algebra theory approach, Du et al. [26] extended the traditional dominance relation for two different situations where incomplete values appear in incomplete ordered information systems, and proposed a feature selection algorithm based on the discernibility matrix method. Qian et al. [28] proposed a feature selection algorithm based on dependence measure for incomplete ordered information system with fuzzy decision by combining the dominance-based rough set with the α-cut set theory. Yang et al. [27] proposed a feature selection raised a new attribute reduction method based on fuzzy preference relations. For the information theory approach, Sang et al. [31] proposed the incremental feature selection algorithm based on the dominant condition entropy when the dynamic changes of a single object in ordered decision system. Hu et al. [34] proposed the measure of information entropy for ordinal classification. Palangetic et al. [33] studied the application of Ordered Weighted Average (OWA) operators dominance-based rough set. The advantage of the dominance-based rough set is that it can describe the dominance relation between objects in ordered data, and describe the conceptions through the upward and downward unions of classes instead of the particular classes. It can be used to solve the ordered data in practical problems. Therefore, we focus on the feature measure based on the dominance-based rough set for ordered data in this paper.

However, most of the existing feature measures using the dominance-based rough set are single-measure, which may not be considered sufficiently in selecting candidate features, resulting in relatively lower classification accuracies in feature selection. Multi-measure [35, 39] combines different single criterion measures, which can consider different criteria at the same time. The feature subsets are more comprehensive, which are selected by combining different criteria. In the process of feature selection, with more evaluation criteria used, the tendency characteristics of a single criterion can be avoided as much as possible. Li et al. [36] proposed a multi-criterion based attribute reduction, which considers both neighborhood decision error rate and neighborhood decision consistency. Shu et al. [37] proposed a multi-criteria evaluation function for cost-sensitive data with missing values. Sun et al. [38] proposed a multi-criteria fusion feature selection algorithm for fault diagnosis of helicopter planetary gear train. However, the mentioned above feature selection algorithms do not consider the data with dominance relation. Therefore, the research of multi-measure feature selection algorithm for ordered data still needs to be solved.

The contributions of this paper are shown as follows: (1). A multi-measure combining the dominance-based dependence and the dominance-based information granularity is proposed for the ordered decision system, which not only considers the certain information, but also uses the discern information. (2). A heuristic feature selection algorithm based on the feature multi-measure is proposed, which can select the feature subset from the multi-perspective analysis. (3). The comparative experiments prove that the proposed feature selection algorithm based on multi-measure has advantage in classification accuracy, in comparison with other feature selection algorithms.

The remaining part of this paper is organized as follows. Section 2 briefly explains the basic concepts of the ordered decision system and the existing feature measures under the ordered decision system. In Section 3, the multi-measure feature selection algorithm is developed for ordered data, which combines the dominance-based dependence and the dominance-based information granularity. In Section 4, a serious of experiments are constructed by comparing the proposed algorithm with existing three feature selection algorithms under the UCI data sets. Finally, Section 5 summarizes this paper and presents the future work.

2 Preliminaries

This section mainly introduces the basic knowledge of the dominance-based rough sets and the single-measures of features for ordered decision systems.

In the rough set theory, the data is always defined as a 4-tuple DS = (U, A = C ∪ D, V, f), where U ={ x₁, x₂, . . . , x_|U| } is a nonempty finite set of objects, also called the universe; C ={ c₁, c₂, . . . , c_|C| } is a set of conditional features which characterize the objects, D ={ d } is the decision feature, and C∩ D = ∅; V = ⋃ _a∈AV_a is the union of feature values and V_a represents a domain of feature a ∈ A; f : U × A → V is an information function, specifically, ∀a ∈ C ∪ D, x ∈ U, f (x, a) ∈ V_a is the feature value of object x under feature a.

In the decision system DS, when all features are increasing preference or decreasing preference, then the system is called an ordered decision system. It is denoted by ODS = (U, A = C ∪ D, V, f). Without lose of generality, in this paper, the features are characterized by the increasing preference. Table 1 shows an ordered decision system that introduces the students’ grades and general evaluation information, where U ={ x₁, x₂, . . . , x₅ } is represent five students, C ={ c₁, c₂, c₃, c₄ } is the set of conditional features, which is used to describe the performance of students in various disciplines, including Chinese c₁, Math c₂, English c₃ and Computer c₄. The domain of each conditional features contains the values Excellent, Good, Passing and Failing. Their increasing preference relation is describe as Excellent > Good > Passing > Failing. The numerical values of the decision set are defined as Excellent=4, Good=3, Passing=2, Failing=1. Obviously, 4>3>2> 1 is consistent with the increasing preference of raw data. Evaluation d is the decision feature, the domain of decision feature d contains three symbols with increasing preference Excellent=3, good=2 and poor=1. In addition, the preference relation between the Evaluation d and the conditional features is consistent, that is, the better the students’ performance in various disciplines, the higher the students’ evaluation.

Table 1
The ordered decision system on the students’ grades and general evaluation information

Chinese c₁ Math c₂ English c₃ Computer c₄ Evaluation d

student x1 2 3 2 3 1

student x2 3 1 2 1 3

student x3 1 2 3 2 3

student x4 4 3 4 2 2

student x5 3 1 2 3 3

	Chinese c₁	Math c₂	English c₃	Computer c₄	Evaluation d
student x1	2	3	2	3	1
student x2	3	1	2	1	3
student x3	1	2	3	2	3
student x4	4	3	4	2	2
student x5	3	1	2	3	3

Definition 1. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, the dominance relation $R_{B}^{\geq}$ with respect to B is defined as $\begin{matrix} \begin{matrix} R_{B}^{\geq} = {(x, y) \in U \times U | f (x, a) \geq f (y, a), \forall a \in B} \end{matrix} \end{matrix}$ (1)

The dominance granules of U induced by the dominance relation $R_{B}^{\geq}$ are expressed by $U / R_{B}^{\geq} = {{[x_{1}]}_{B}^{\geq},$ ${[x_{2}]}_{B}^{\geq}, . . ., {[x_{| U |}]}_{B}^{\geq}}$ , where ${[x_{i}]}_{B}^{\geq} (1 \leq i \leq | U |)$ is the dominance granule of object x_i under B which dominate x_i. That is, ${[x_{i}]}_{B}^{\geq} = {y \in U | (y, x_{i}) \in R_{B}^{\geq}} = {y \in U | f (y, a) \geq f (x_{i}, a), \forall a \in B}$ .

Definition 2. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, $U / R_{B}^{\geq} = {{[x_{1}]}_{B}^{\geq}, {[x_{2}]}_{B}^{\geq}, . . ., {[x_{| U |}]}_{B}^{\geq}}$ , $U / R_{D}^{\geq} = {[x_{1}]_{D}^{\geq}, {[x_{2}]}_{D}^{\geq}$ , . . . , ${[x_{| U |}]}_{D}^{\geq}}$ . The lower and upper approximations of decision D relative to the dominance relation $R_{B}^{\geq}$ is defined as $\begin{matrix} \begin{matrix} \underline{R_{B}^{\geq}} (D) = {x_{i} \in U | {[x_{i}]}_{B}^{\geq} \subseteq {[x_{i}]}_{D}^{\geq}}, 1 \leq i \leq | U | \end{matrix} \end{matrix}$ (2) $\begin{matrix} \begin{matrix} \bar{R_{B}^{\geq}} = {x_{i} \in U | {[x_{i}]}_{B}^{\geq} \cap {[x_{i}]}_{D}^{\geq} \neq \emptyset}, 1 \leq i \leq | U | \end{matrix} \end{matrix}$ (3)

Based on the lower approximation, the dominance-based dependence is presented as follows.

Definition 3. [40] Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, the dominance-based dependence is defined as $\begin{matrix} \begin{matrix} γ_{B}^{\geq} (D) = \frac{| \underline{R_{B}^{\geq}} (D) |}{| U |} \end{matrix} \end{matrix}$ (4)

From Definition 3, the dominance-based dependence $γ_{B}^{\geq} (D)$ reflects the degree of dominant consistency of objects on the feature subset B and the decision feature D. it can be used to quantify the consistent knowledge of features. Thus, $1 - γ_{B}^{\geq} (D)$ can describe the inconsistent risk of the feature subset B. Meanwhile, the smaller the inconsistent risk, the better the classification performance of the feature subset.

Definition 4. [40] Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, the dominance-based information granularity of the dominance relation $R_{B}^{\geq}$ under U is defined as $\begin{matrix} \begin{matrix} IG (R_{B}^{\geq}) = \frac{1}{| U |} \sum_{i = 1}^{| U |} \frac{| {[x_{i}]}_{B}^{\geq} |}{| U |} \end{matrix} \end{matrix}$ (5)

The dominance-based information granularity of D relative to B is defined as $\begin{matrix} \begin{matrix} {IG}^{\geq} (D | B) = IG (R_{B}^{\geq}) - IG (R_{B \cup D}^{\geq}) \\ = \frac{1}{| U |} \sum_{i = 1}^{| U |} \frac{| {[x_{i}]}_{B}^{\geq} - {[x_{i}]}_{B \cup D}^{\geq} |}{| U |} \\ = \frac{1}{| U |} \sum_{i = 1}^{| U |} \frac{| {[x_{i}]}_{B}^{\geq} - {[x_{i}]}_{D}^{\geq} |}{| U |} \end{matrix} \end{matrix}$ (6)

From Equation (6), the dominance-based information granularity is utilized to evaluate the distinguishing ability of D relative to B. In addition, the smaller dominance-based information granularity of D relative to B means that the feature subset B has more similar classification ability to the decision feature set.

3 A multi-measure feature selection algorithm for the ordered decision system

In this section, Section 3.1 presents several drawbacks of a single feature measure. Through the idea of feature multi-measure, the feature multi-measure is proposed by combining with two single criterion measures. In addition, a multi-measure feature selection algorithm is developed to select the feature subset for ordered decision system.

3.1 The drawbacks of a single feature measure

At present, the feature single-measure based on dominance-based relation has been proposed for ordered data [34, 40], and the corresponding feature selection algorithm can effectively select feature subsets. However, when using the feature single measure to evaluate features, they have the following shortcomings.

(1). When using single-measure to evaluate feature, if multiple candidate features have the same evaluation value, the most suitable candidate feature can not be selected from multiple features with the same feature measure values.

(2). A single evaluation measure can not quantify the certain information (algebraic theory) and discern information (information theory) contained in candidate features simultaneously. However, the single measure only considers the certain information by the dominance-based dependence or uses the discern information by the dominance-based dependence. According to people’s normal cognition, these selected features are often not really suitable. It may lead to the selection of the optimal feature subset is insufficient and the classification performance is affected.

The important features should have the ability to balance the inconsistent risk and the distinguishing ability. The selected features should have both low inconsistent risk and poor distinguishing ability from decision labels.In view of possible disadvantages mentioned above, the following example is used for detailed explanations.

Example 1. The ordered decision system is shown in Table 1, where U ={ x₁, x₂, x₃, x₄, x₅ }, C ={ a₁, a₂, a₃, a₄ }, D ={ d }. According to Definition 3, it can be obtained that the dominance-based dependence of D relative to a₁, a₂, a₃ and a₄ as $γ_{{a_{1}}}^{\geq} (D) = 0.4$ , $γ_{{a_{2}}}^{\geq} (D) = 0.2$ , $γ_{{a_{3}}}^{\geq} (D) = 0.4$ , $γ_{{a_{4}}}^{\geq} (D) = 0.2$ . Obviously, the feature with the greatest dependence will be select. However, there are two features a₁ and a₃ have the maximum value, then it is impossible to effectively distinguish two features by using the single feature measure. Similarly, the dominance-based information granularity of D relative to {a₁, a₂}, {a₁, a₃}, {a₁, a₄} is calculated by Definition 4 as follows IG^≥ (D| { a₁, a₂ }) = 0.16, IG^≥ (D| { a₁, a₃ }) = 0.12, IG^≥ (D| { a₁, a₄ }) = 0.12. According to the description of Definition 4, the smaller the dominance-based information granularity, the better the performance of feature subset. Since feature subsets {a₁, a₃} and {a₁, a₄} have the same minimum dominance-based information granularity, then it is impossible to distinguish the performance of these two feature subsets. In this case, the feature is usually randomly selected as a candidate, the classification performance of the optimal feature subset is bound to be affected.

Simultaneously, the dominance-based information granularity of D relative to a₁, a₂, a₃ and a₄ as IG^≥ (D| { a₁ }) =0.16, IG^≥ (D| { a₂ })=0.28, IG^≥ (D| { a₃ }) = 0.20, IG^≥ (D| { a₄ }) = 0.24. The dominance-based dependence of D relative to a₁, a₂, a₃ and a₄ as $γ_{{a_{1}}}^{\geq} (D) = 0.4$ , $γ_{{a_{2}}}^{\geq} (D) = 0.2$ , $γ_{{a_{3}}}^{\geq} (D) = 0.4$ , $γ_{{a_{4}}}^{\geq} (D) = 0.2$ . It is not difficult to find that the dominance-based granularity of a₂ is the largest, but its dominance-based dependence is the smallest.

3.2 A multi-measure for selecting feature subset

In this subsection, to effectively solve the shortcomings in the previous section, a multi-measure method which considers the dominance-based dependence and the dominance-based information granularity simultaneously is proposed for the ordered decision system. The limitations of single criteria measure can be solved by combining two single measures, and the multi-measure method is given.

Definition 5. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, the dominance-based multi-measure of D relative to B is defined as $\begin{matrix} \begin{matrix} {RG}^{\geq} (D | B) = \frac{1}{2} ({IG}^{\geq} (D | B) + (1 - γ_{B}^{\geq} (D))) \end{matrix} \end{matrix}$ (7)

From Definition 5, it can be observed that the proposed multi-criteria which considers the inconsistency risk and distinguishing ability. Simultaneously, the features are selected from the point of view of the algebra theory and information theory. The lower the value of RG^≥ (D|B), the smaller the value of the inconsistency risk and the smaller the distinguishing ability.

Theorem 1. (Monotonicity) Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B₁, B₂ ⊆ C and B₁ ⊆ B₂, then RG^≥ (D|B₁) ≥ RG^≥ (D|B₂).

Proof. Since B₁ ⊆ B₂, for ∀x ∈ U, according to Definition 1, it is obvious that ${[x]}_{B_{1}}^{\geq} \supseteq {[x]}_{B_{2}}^{\geq}$ . It follows from Definition 2 that $\underline{R_{B_{1}}^{\geq}} (D) \subseteq \underline{R_{B_{2}}^{\geq}} (D)$ . Obviously, we can obtain that $γ_{B_{1}}^{\geq} (D) \leq γ_{B_{2}}^{\geq} (D)$ from Definition 3. On the other hand, $1 - γ_{B_{1}}^{\geq} (D) \geq 1 - γ_{B_{2}}^{\geq} (D)$ . Similarly, it follows from Equation (6) that IG^≥ (D|B₁) ≥ IG^≥ (D|B₂). It is clearly that RG^≥ (D|B₁) ≥RG^≥ (D|B₂).

Example 2. The ordered decision system is shown in Table 1, where U ={ x₁, x₂, x₃, x₄, x₅ }, C = {a₁, a₂, a₃, a₄}, D ={ d }. Definition 6 is used to calculate the dominance-based multi-measure of D relative to a₁, a₂, a₃ and a₄ as RG^≥ (D| {a₁}) =0.38, RG^≥ (D| {a₂}) =0.54, RG^≥ (D| {a₃}) =0.40, RG^≥ (D| {a₄}) =0.52. Since RG^≥ (D| {a₁}) < RG^≥ (D| {a₃}), then a₁ is better than a₃.

Similarly, the dominance-based multi-measure of D relative to {a₁, a₂}, {a₁, a₃}, {a₁, a₄} as RG^≥ (D| {a₁, a₂}) =0.38, RG^≥ (D| { a₁, a₃ }) = 0.36, RG^≥ (D| {a₁, a₄}) = 0.26. Since RG^≥ (D| { a₁ , a₄ }) <RG^≥ (D| {a₁, a₃}), then feature subset {a₁, a₃} is better than {a₁, a₄}. For Example 1, when evaluating features, the single-measure cannot be distinguished a₁ and a₃, {a₁, a₃} and {a₁, a₄}. Comparing Example 2 with Example 1, it can be seen that the dominance-based multi-measure can effectively overcome the drawback that single criterion can not distinguish multiple features or feature subsets with the same evaluation, and select the more important candidate features as much as possible.

On the basis of the proposed multi-measure for ordered data, the inner significance and outer significance of features are defined as follows.

Definition 6. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C and b ∈ B, the inner significance of b in B is defined as $\begin{matrix} \begin{matrix} {Sig}_{in} (b, B, D) = {RG}^{\geq} (D | B - {b}) - {RG}^{\geq} (D | B) \end{matrix} \end{matrix}$ (8)

It can be seen from Definition 6, the greater the change of dominance-based multi-measure when feature b is delete from feature subset B, it means that feature b is more important. Therefore, the feature b is redundant when Sig_in (b, B, D) = 0. Then in this paper, Definition 7 is used to delete redundant features in candidate feature subsets.

Definition 7. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C and b ∈ C - B, the outer significance of b in C - B is defined as $\begin{matrix} \begin{matrix} {Sig}_{out} (b, B, D) = {RG}^{\geq} (D | B) - {RG}^{\geq} (D | B \cup {b}) \end{matrix} \end{matrix}$ (9)

It is easy to find that Sig_out (b, B, D) is the variation value of the dominance-based multi-measure when feature b is added into B. Specifically, the greater the change value, the more important the feature is. Therefore, in the feature selection process, Definition 7 is used to select important features.

Definition 8. Let ODS = (U, A = C ∪ D, V, f) be an ordered decision system, ∀B ⊆ C, B is referred to the optimal feature subset iff

(1) RG^≥ (D|B) = RG^≥ (D|C);

(2) ∀b ∈ B, RG^≥ (D|B) ≤ RG^≥ (D|B - { b }) .

By Definition 8, condition (1) ensures that the feature subset has the same distinguishing ability and inconsistency risk as the whole feature set; condition (2) ensures that every feature in the feature subset is indispensable. It is obvious that the optimal feature subset is a minimal set of features which satisfies the same information ability as all features.

3.3 A multi-measure feature selection algorithm for the ordered decision system

Based on Definition 8, the multi-measure feature selection algorithm for the ordered decision system is proposed and the flow chart is presented in Fig. 1.

Fig. 1

Flow chart of Algorithm MMFS.

The detailed descriptions of the heuristic feature selection algorithm for ODS are given as Algorithm MMFS.

Algorithm 1 A multi-measure feature selection algorithm for the ordered decision system (Algorithm MMFS)
Require: An ordered decision system ODS = (U, A = C ∪ D, V, f).
Ensure A feature subset Red.
1: RED← ∅;
2: for each a ∈ C
3: compute Sig_in (a, C, D);
4: ifSig_in (a, C, D) > 0 then
5: RED← RED ∪ { a };
6: end if
7: end for
8: whileRG ≥ (D\|RED) ≠ RG^≥ (D\|C) do
9: for each a_k ∈ C - REDdo
10: $a = \underset{out}{argmaxSig} (a, RED, D)$ ;
11: RED← RED ∪ { a_k };
12: end for
13: end while
14: for each b ∈ REDdo
15: compute Sig_in (b, RED, D) by Equation (8);
16: ifSig_in (b, RED, D) = 0 then
17: RED← RED - { b };
18: end if
19: end for
20: returnRed;

The time complexity of MMFS. Steps 2-7 are used to select core features, the time complexity is O (|U|²|C|²). Steps 8-13 are used to select the most significance feature and added into Red until the termination condition is satisfied, the time complexity is O (|U|²|C||C| + |U|²|C||C - 1| + . . . + |U|²|C| * 1) = O (|U|²|C|³). For Steps 14-19, some redundant features are deleted from the selected feature subset, the time complexity is O (|U|²|C|²). Therefore, the overall time complexity of Algorithm MMFS is O (|U|²|C|³).

4 Experimental analysis

In this section, a series of comparative experiments are performed. Eight datasets are selected from UCI Machine Learning Repository [41]. Meanwhile, the condition features and decision feature of datasets are preference-ordered and there are monotonicity between them. For example, in the data set car, the acceptability of a car is evaluated by various indexes. Under the conditions of lower price, less maintenance cost, more passengers, more luggage and higher safety performance, the car is more acceptable. The descriptions of eight ordered datasets are shown in Table 2. For the datasets in Table 2, all numerical features are discretized by using the data tool Rosetta. These experiments are applied using PyCharm 2017 and running on a personal computer with Window 10, Intel(R) Core(TM) i5-9400 CPU @2.40 GHz and 8.00G memory. The programming language is Python.

Table 2
Description of eight ordered datasets

Data set Objects Features Classes

Post Operative 87 8 3

Dermatology 366 34 6

Libra Movement 360 90 15

Sonar 208 60 2

Car 1728 6 4

Steel Plates Faults 798 38 6

SPECTF 267 45 2

Wdbc 569 31 2

Data set	Objects	Features	Classes
Post Operative	87	8	3
Dermatology	366	34	6
Libra Movement	360	90	15
Sonar	208	60	2
Car	1728	6	4
Steel Plates Faults	798	38	6
SPECTF	267	45	2
Wdbc	569	31	2

4.1 Performance of the proposed algorithm

To verify the efficiency of the proposed algorithm MMFS, three single feature measure-based feature selection algorithms are adopted for comparative experiments, which are denoted by FSDD [40], FSDI [40] and FSDE [34], respectively. FSDD is the feature selection algorithm based on the dominance-based dependence. FSDI is the feature selection algorithm based on the dominance-based information granularity. FSDE is the feature selection algorithm based on the dominance-based information entropy.

Three classifiers C4.5, SVM and KNN are used to evaluate the classification performance of different feature selection algorithms. Meanwhile, 10-fold cross-validation is used to obtain the final classification accuracy. For each data, it is divided into ten parts of equal sizes, nine of which are used as training sets and the remaining one is used as test set. Finally, the average classification accuracy of experiments is taken as the classification accuracy. The results of feature subset of MMFS, FSDD, FSDI and FSDE are shown in Table 3.

Table 3
The feature subset sizes of MMFS, FSDD, FSDI and FSDE

Data set Algorithm Feature subset size Feature selection results

Post Operative MMFS 7 1,2,3,4,5,7,8

FSDD 4 2,3,7,8

FSDI 7 1,2,3,4,5,7,8

FSDE 7 1,2,3,4,5,7,8

Dermatology MMFS 17 1-5,8,10,13,14,16,18,19,21,26,28,32,34

FSDD 16 1-5,8,10,14,16,18,19,21,26,28,32,34

FSDI 17 1-6,10,13,14,16,18,19,21,26,28,32,34

FSDE 17 1-5,8,10,13,14,16,18,19,21,26,28,32,34

Libra Movement MMFS 23 2-4,6,10,17,18,21,26,27,35,37,48,49,57,60,68,71,75,78,85,89,90

FSDD 44 1-37,51,54,57,66,81,82,87

FSDI 24 2-4,6,8,17,18,21,24,27,35,36,37,53,56,57,66,73,74,77,78,85,89,90

FSDE 84 1-7,9-49,51,53,55,57,59-90

Sonar MMFS 13 2,5,7,11,15,17,21,24,29,30,33,42,45

FSDD 18 2-7,11,13,15,18,23,24,26,29,30,34,44,48

FSDI 11 9,11,17,20,24,26,29,33,38,43,48

FSDE 49 2-6,8,9,11-14,18,21,22,23,25,27-36,39-60

Car MMFS 5 1,2,4,5,6

FSDD 1 1

FSDI 5 1,2,4,5,6

FSDE 5 1,2,4,5,6

Strrl Plates Faults MMFS 11 1,3,8,9,10,11,16,17,18,22,25

FSDD 12 1,3,8,9,10,11,16,17,18,19,22,25

FSDI 12 3,5,6,8,9,10,11,16,17,18,19,25

FSDE 14 1,3,7,8,9,10,14,15,16,17,18,19,25,26

SPECTF MMFS 20 1,2,3,7,9,11,14,16,17,19,21,24,28,31,33,34,37,38,41,42

FSDD 23 1-8,10,13,15,17,18,22,23,24,27,31,32,33,34,37,42

FSDI 18 2,3,9,11,13,16,17,21,24,27,28,30,32,33,34,37,38,41

FSDE 38 2,3,4,6,7,9-15,17,19-31,33-44

Wdbc MMFS 9 2,7,13,14,23,24,26,30,31

FSDD 8 3,4,7,14,25,26,27,30

FSDI 8 2,7,10,14,23,24,26,31

FSDE 28 1-10,12-22,24-30

Data set	Algorithm	Feature subset size	Feature selection results
Post Operative	MMFS	7	1,2,3,4,5,7,8
	FSDD	4	2,3,7,8
	FSDI	7	1,2,3,4,5,7,8
	FSDE	7	1,2,3,4,5,7,8
Dermatology	MMFS	17	1-5,8,10,13,14,16,18,19,21,26,28,32,34
	FSDD	16	1-5,8,10,14,16,18,19,21,26,28,32,34
	FSDI	17	1-6,10,13,14,16,18,19,21,26,28,32,34
	FSDE	17	1-5,8,10,13,14,16,18,19,21,26,28,32,34
Libra Movement	MMFS	23	2-4,6,10,17,18,21,26,27,35,37,48,49,57,60,68,71,75,78,85,89,90
	FSDD	44	1-37,51,54,57,66,81,82,87
	FSDI	24	2-4,6,8,17,18,21,24,27,35,36,37,53,56,57,66,73,74,77,78,85,89,90
	FSDE	84	1-7,9-49,51,53,55,57,59-90
Sonar	MMFS	13	2,5,7,11,15,17,21,24,29,30,33,42,45
	FSDD	18	2-7,11,13,15,18,23,24,26,29,30,34,44,48
	FSDI	11	9,11,17,20,24,26,29,33,38,43,48
	FSDE	49	2-6,8,9,11-14,18,21,22,23,25,27-36,39-60
Car	MMFS	5	1,2,4,5,6
	FSDD	1	1
	FSDI	5	1,2,4,5,6
	FSDE	5	1,2,4,5,6
Strrl Plates Faults	MMFS	11	1,3,8,9,10,11,16,17,18,22,25
	FSDD	12	1,3,8,9,10,11,16,17,18,19,22,25
	FSDI	12	3,5,6,8,9,10,11,16,17,18,19,25
	FSDE	14	1,3,7,8,9,10,14,15,16,17,18,19,25,26
SPECTF	MMFS	20	1,2,3,7,9,11,14,16,17,19,21,24,28,31,33,34,37,38,41,42
	FSDD	23	1-8,10,13,15,17,18,22,23,24,27,31,32,33,34,37,42
	FSDI	18	2,3,9,11,13,16,17,21,24,27,28,30,32,33,34,37,38,41
	FSDE	38	2,3,4,6,7,9-15,17,19-31,33-44
Wdbc	MMFS	9	2,7,13,14,23,24,26,30,31
	FSDD	8	3,4,7,14,25,26,27,30
	FSDI	8	2,7,10,14,23,24,26,31
	FSDE	28	1-10,12-22,24-30

As shown in Table 3, Algorithms MMFS, FSDD, FSDI and FSDE can effectively reduce the feature dimension of data sets. Simultaneously, compared with Algorithms FSDD, FSDI and FSDE, different feature subsets can be selected by using Algorithm MMFS. Most features of feature subset selected by MMFS are among the optimal features selected by FSDD or FSDI, and some different features may be selected. For WDBC data set, compared with Algorithms FSDD and FSDI, one feature is different, the algorithm MMFS chose 13. Specially, in Car data set, the feature subset is only one feature (buying price) by using the algorithm FSDD. This is because the dominance dependence of each feature in the Car data set is the same as that of the complete feature set. However, it is obviously impossible to determine the performance of a car only by the buying price. Moreover, selecting a single feature can not satisfied the following complex machine learning tasks. Furthermore, compared with Algorithm FSDE, the feature subset selected by MMFS often has fewer features. For example, in Libra Movement, the feature subset size of Algorithm MMFS is 23 and the feature subset size of Algorithm FSDE is 84. Compared with the other Algorithms FSDD and FSDI, the feature subsets size are similar. Such as Dermatology data set, the feature subset sizes of MMFS, FSDD and FSDI are 17, 16 and 17. Therefore, the algorithm MMFS can select not only relatively few features, but also potentially important features.

In the following, Tables 4 –6 present classification accuracies of MMFS, FSDD, FSDI and FSDE under classifiers C4.5, SVM and KNN. The last row ‘Average’ is the average classification accuracy under eight data sets.

Table 4

The classification accuracies of MMFS, FSDD, FSDI and FSDE under classifier C4.5

Data set	MMFS	FSDD	FSDI	FSDE
Post Operative	70.53±0.56	70.53±0.56	70.53±0.56	70.53±0.56
Dermatology	91.06±0.32	89.90±0.24	90.23±0.23	91.06±0.32
Libra Movement	62.76±1.02	59.53±0.82	60.53±0.60	62.35±2.74
Sonar	77.85±1.22	71.96±0.89	75.33±1.85	70.26±3.89
Car	95.26±0.04	70.00±0.00	95.26±0.04	95.26±0.04
Steel Plates Faults	68.86±0.16	67.63±0.41	65.86±0.51	66.92±0.64
SPECTF	78.83±0.45	76.41±2.14	78.16±1.59	76.66±0.91
Wdbc	94.43±0.32	92.26±0.44	94.40±0.28	92.43±0.54
Average	79.94	74.86	78.78	78.18

Table 5

The classification accuracies of MMFS, FSDD, FSDI and FSDE under classifier SVM

Data set	MMFS	FSDD	FSDI	FSDE
Post Operative	71.33±0.12	71.33±0.12	71.33±0.12	71.33±0.12
Dermatology	87.06±0.33	86.98±0.35	86.96±0.47	87.06±0.33
Libra Movement	83.96±0.57	83.90±0.65	83.86±0.53	83.72±0.14
Sonar	81.13±0.23	77.86±0.65	82.26±0.44	81.46±0.80
Car	96.03±0.09	70.00±0.00	96.03±0.09	96.03±0.09
Steel Plates Faults	71.34±0.23	70.90±0.32	67.73±0.55	71.23±0.26
SPECTF	80.26±0.32	79.53±0.18	79.36±0.20	79.76±0.28
Wdbc	97.13±0.09	96.73±0.20	96.93±0.18	97.06±0.24
Average	85.53	79.65	83.05	83.45

Table 6

The classification accuracies of MMFS, FSDD, FSDI and FSDE under classifier KNN

Data set	MMFS	FSDD	FSDI	FSDE
Post Operative	70.00±0.81	70.00±0.81	70.00±0.81	70.00±0.81
Dermatology	90.53±0.12	90.50±0.02	90.17±0.63	90.27±0.01
Libra Movement	86.13±0.14	86.93±0.88	85.37±0.67	85.47±0.06
Sonar	82.50±1.28	79.80±1.50	79.47±0.39	81.63±0.63
Car	94.40±0.35	70.00±0.00	94.40±0.35	94.40±0.35
Steel Plates Faults	74.70±0.06	72.83±2.68	70.77±0.62	71.93±2.72
SPECTF	75.77±0.28	71.10±0.69	71.30±1.50	71.53±3.11
Wdbc	95.73±0.01	94.33±1.88	93.67±0.74	94.40±0.86
Average	83.72	79.43	81.89	74.58

As shown in Tables 4 –6, compared with FSDD, FSDI and FSDE, the proposed algorithm MMFS has higher classification accuracy in most datasets. For example, for SPECTF data set, the classification accuracies of Algorithms FSDD, FSDI, FSDE under C4.5 classifier are 76.41%, 78.16% and 76.66%, respectively. The classification accuracy of MMFS is 78.83%, which improves 2.42%, 0.67% and 2.17% compared with Algorithms FSDD, FSDI and FSDE. Also, under the SVM classifier, Algorithm MMFS has the highest classification accuracy among the four algorithms, which is 80.26%. For Steel Plates Faults data set, under the classifiers of C4.5, the classification accuracies of MMFS is 68.86%, an improvement of 1.23%, 3.00% and 1.94% compared to FSDD, FSDI and FSDE. Meanwhile, for SVM classifier, the classification accuracy record of MMFS is 71.34% in Steel Plates Faults data set, which is 0.44%, 3.61% and 0.11% higher than that of FSDD, FSDI, FSDE respectively. For car data set, the classification accuracy of FSDD is 70%, which is lower than other algorithms. It shows that for some specific data sets, the features can not be completely selected by using a single criterion. Simultaneously, For Post Operative data set, four Algorithms MMFS, FSDD, FSDI and FSDE can get the same classification accuracy. Since the feature subsets selected by the four algorithms are the same. Similarly, for Sonar dataset, the classification accuracies of Algorithms MMFS, FSDD, FSDI and FSDE are 95.73%, 94.33%, 93.67% and 94.40% under the KNN classifier, respectively, the classification accuracies of Algorithms MMFS obtains the highest accuracy result. According to the average classification accuracy of the last row in Tables 4 –6, the algorithm MMFS proposed in this paper has the highest average classification accuracies under the classifiers of C4.5, SVM and KNN.

Moreover, a comparison of the number of features chosen by Algorithm MMFS and the classification accuracy under three different classifiers(C4.5, SVM, KNN) are plotted in Fig. 2. Firstly, from Fig. 2, we could notice that the overall classification accuracy of dataset shows an upward trend as the number of features increases. Secondly, classification accuracy will show a stable state, when the important features are selected, no more significant upward or downward changes. Finally, from Fig. 2, we can know that the same dataset displays slightly different results under the different classifiers.

Fig. 2

Classification accuracies of four representative datasets on three different classifiers varies with the number of features.

The above analyses show that the proposed algorithm MMFS with a feature multi-measure can select the feature subsets with higher classification accuracy at most cases.

4.2 Evaluation metrics

ROC curve is an important evaluation index to measure the effectiveness of the algorithm. AUC is the area of ROC curve, and the value is [0.5, 1]. The closer ROC line is to the upper left corner of the first quadrant, the larger the area of the AUC, which in turn indicates better algorithm performance. Thus, to demonstrate the effectiveness of the MMFS, we choose the ROC curve and AUC to test the performance of MMFS. Then, the ROC curve comparisons between MMFS and other algorithms are presented in Fig. 3. we can obtain from Fig. 3 that the ROC curve of MMFS is closest to the upper left corner of first quadrant in datasets. For example, in the Sonor dataset, the MMFS is represented by a red line, blue line represents FSDD, black line represents FSDI, green line represents FSDE. From Fig. 3(a), the ROC curve of MMFS is closest to the upper left corner of the first quadrant compared to other lines, and the largest area under the curve, therefore, it is confirmed that the MMFS performs better than other algorithms, and the value of AUC recorded in Table 8. From Table 8, under the SVM classifier, the AUC values of MMFS, FSDD, FSDI and FSDE are 0.8774, 0.7701, 0.7500 and 0.8676, respectively, from which we can know Algorithm MMFS obtains maximum in Sonor dataset. Similarly, in Wdbc dataset, compared with Algorithms FSDD, FSDI and FSDE, the value of Algorithm MMFS is highest. In other words, Algorithm MMFS has the largest area under the ROC curve. Namely, MMFS performs better than FSDD, FSDI and FSDE.

Fig. 3

The ROC curve comparisons between Algorithm MMFS and other algorithms.

Table 7

Computational time of MMFS, FSDD, FSDI and FSDE

Data set	Computational time
	MMFS	FSDD	FSDI	FSDE
Post Operative	0.07	0.05	0.07	0.09
Dermatology	32.80	45.14	28.76	35.33
Libra Movement	325.33	716.54	307.04	351.59
Sonar	32.80	45.14	28.76	116.79
Car	27.80	13.16	27.93	29.82
Steel Plates Faults	1119.62	1036.93	1110.59	1169.32
SPECTF	39.08	48.16	33.99	72.37
Wdbc	50.75	58.03	48.23	138.61
Average	206.90	245.39	198.17	239.24

Table 8

The AUC values of MMFS, FSDD, FSDI and FSDE under classifier SVM

Data set	MMFS	FSDD	FSDI	FSDE
Sonar	0.8774	0.7701	0.7500	0.8676
Wdbc	0.9788	0.9687	0.9305	0.9657
Average	0.9281	0.8693	0.8402	0.9166

4.3 Statistical analysis

In order to further compare the experimental results of different algorithms statistically, Friedman Test and Nemenyi Test were selected to verify the effectiveness of the algorithm comparison.

Friedman Test, as a nonparametric statistical test method, assumes that all experimental algorithms have the same classification performance. The formula is defined as $\begin{matrix} F_{F} = \frac{(T - 1) X_{F}^{2}}{T (s - 1) - X_{F}^{2}} \end{matrix}$ (10) $\begin{matrix} X_{F}^{2} = \frac{12 T}{s (s + 1)} (\sum_{i = 1}^{s} R_{i}^{2} - \frac{s {(s + 1)}^{2}}{4}) \end{matrix}$ (11) Where T and s are the number of experimental data sets and experimental algorithms respectively, R_i represents the average ranking value of classification accuracy results of algorithm i on different classifiers. The confidence level of α is set to 0.05, establishing the critical value of the distribution. Then, F_F can be distributed as 3, 21 degrees of freedom with four algorithms and eight data sets, therefore, based on the critical table values, it is easy to get Friedman Test results for C4.5, SVM and KNN, which is shown in Table 9. From Table 9, the F_F test results for three different classifiers (C4.5, SVM and KNN) are 5.344, 7.615 and 5.451 respectively, these three values are clearly greater than 3.072. The result proves that there are significant differences in the performance of all algorithms on each classifier.

Table 9

The value of Friedman

Classifier	The value of Friedman	Confidence value (α=0.05)
C4.5	5.344
SVM	7.615	3.072
KNN	5.451

In addition, the Nemenyi Test can further analyze the relative performance and differences of all the compared algorithms, and its critical difference formula is defined as:

$\begin{matrix} {CD}_{α} = q_{α} \sqrt{\frac{s (s + 1)}{6 T}} \end{matrix}$ (12)

From the algorithm comparison experiments, we can get T = 8, s = 4 and α = 0.05, therefore q_α=2.569, fininally the value of CD_0.05 is 1.658. According to Table 9 and CD_0.05, then, Nemenyi test results of Algorithm MMFS and other comparison algorithms under the three different classifiers (C4.5, SVM and KNN) are displayed in Fig. 4. Through Fig. 4(a) and 4(c), for the classifiers C4.5 and KNN, there are no segment connection between MMFS and other algorithms, which means that the accuracy of Algorithm MMFS is better than Algorithms FSDE, FSDD and FSDI in terms of statistics. But, under the classifier SVM, no evidence of difference in accuracy between Algorithms MMFS and FSDE.

Fig. 4

Comparisons between MMFS and other three algorithms under the Nemenyi test.

As can be seen from the figures above, MMFS is significantly better than FSDD, FSDI, FSDE under three different classifiers. However, there is no significant difference in classification accuracy between FSDE and FSDI.

4.4 Computational time

In order to analyze the computational time of different algorithms, Table 7 shows the running time of MMFS, FSDD, FSDI and FSDE. The column ‘Average’ represents the average computational time under eight datasets.

It can be seen in Table 7, under most datasets, the computational time of Algorithm MMFS is lower than FSDD and FSDE, but slightly higher than FSDI. For example, in Sonar data set, the computational time of MMFS is 32.80s, and computational time are 45.14s, 28.76s and 116.79s by using Algorithms FSDD, FSDI and FSDE. It can be seen from the average running time compared with Algorithms FSDD and FSDE, the average computational time is shortened by 38.49s and 32.34s respectively. Compared with Algorithm FSDI, the average computational time is increased by 8.73s. Consequently, the proposed algorithm MMFS is acceptable in computational time.

As mentioned above, according to a series of experimental results, the proposed algorithm MMFS can select feature subsets with higher classification accuracy without significantly increasing the computational time. Therefore, the proposed algorithm MMFS is feasible and efficient, it can be used to effectively deal with feature subset selection of ordered data sets.

5 Conclusion and feature work

In many practical applications, the ordered data is ubiquitous. Most of existing feature selection algorithms for ordered data consider the single-measure for selecting feature subset. Which may not be considered sufficiently in selecting candidate features, resulting in relatively lower classification accuracies. To overcome these shortcomings, this paper proposed a multi-measure feature measure, which combines dominance-based dependence with dominance-based information granularity. The feature multi-measrue can select different feature subsets by evaluating the inconsistency risk and distinguishing ability at the same time. On this basis, a heuristic feature selection algorithm is proposed. Experiments results on UCI ordered datasets demonstrate the effectiveness and efficiency of the proposed feature selection algorithm. The proposed algorithm can obtain higher classification accuracy than the compared algorithms at most data sets. At the same time, our next research work will consider the feature selection of ordered data under complex background, such as hybrid ordered data and partially labeled ordered data.

Footnotes

Acknowledgment

This work is supported by National Natural Science Foundation of China (62266018 and 61966016), and Natural Science Foundation of Jiangxi Province (20202BABL202037 and 20192B AB207018).

References

Liu

, Ma

, Hu

and Gao

, A feature selection method withfeature ranking using genetic programming, Connection Science 34(1) (2022), 1146–1168.

Zhao

, Liang

, Dong

, Tang

and Liu

, NEC: A nestedequivalence class-based dependency calculation approach for fastfeature selection using rough set theory, Information Sciences 536(10) (2020), 431–453.

Too

and Abdullah

A.R.

, Binary atom search optimisation approaches for feature selection, Connection Science 32(4) (2020), 406–430.

Sun

, Wang

, Ding

, Qian

and Xu

, Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets, IEEE Transactions on Fuzzy Systems 29(1) (2021), 19–33.

Hancer

, Fuzzy kernel feature selection with multi-objective differential evolution algorithm, Connection Science 31(4) (2019), 323–341.

Too

, Sadiq

and Mirjalili

, A conditional opposition-based particle swarm optimisation for feature selection, Connection Science 34(1) (2022), 339–361.

Pan

, Zhang

, Lin

, Guan

, Chen

, Ge

and Chen

, Anevaluation model for children’s foot & ankle deformity severityusing sparse multi-objective feature selection algorithm, Computers in Biology and Medicine 151(A) (2022), 106229.

, Zhong

, Wang

and Wei

, Multi-strategy assisted chaoticcoot-inspired optimization algorithm for medical feature selection:A cervical cancer behavior risk study, Computers in Biology and Medicine 151(A) (2022), 106239.

Zhao

, He

, Wu

, Liu

and Wang

, IOFS-SA: An interactiveonline feature selection tool for survival analysis, Computersin Biology and Medicine 150 (2022), 106121.

10.

Kaur

, Rathi

and Agrawal

, Enhanced depression detection from speech using Quantum Whale Optimization Algorithm for feature selection, Computers in Biology and Medicine 150 (2022), 106122.

11.

Zhang

, Wei

, Liu

, Wang

, Xi

and Pan

, Identification of Autism spectrum disorder based on a novel feature selection method and Variational Auto encoder, Computers in Biology and Medicine 148 (2022), 105854.

12.

Liu

, Li

, Weng

, Zhang

, Chen

and Wu

, Feature selection for multi-label learning with streaming label, Neurocomputing 387(4) (2020), 268–278.

13.

Jiang

, Yu

, Guo

and Wang

, Feature selection with missing labels based on label compression and local feature correlation, Neurocomputing 395(6) (2020), 95–106.

14.

Garcia

, Maldonado

and Vairetti

, Efficient n-gramconstruction for text categorization using feature selection techniques, Intelligent Data Analysis 25(3) (2021), 509–525.

15.

Chakhar

, Ishizaka

, Labib

and Saad

, Dominance-based roughset approach for group decision, European Journal of Operational Research 251(1) (2016), 206–224.

16.

Pawlak

, Rough sets, Incremental Journal of Computer and Information Sciences 11(5) (1982), 341–356.

17.

Dai

and Xu

, Approximations and uncertainty measures inincomplete information systems, Information Sciences 198(9) (2012), 62–80.

18.

Dai

, Hu

, Zhang

, Hu

and Zheng

, Attribute selection forpartially labeled categorical data by rough set approach, IEEE Transactions on Cybernetics 47(9) (2017), 2460–2471.

19.

Shannon

C.E.

, The mathematical theory of communication, The Bell System Technical Journal 27 (1948), 373–423.

20.

Dai

, Wang

and Xu

, An uncertainty measure for incomplete decision table and its applications, IEEE Transactions on Cybernetics 43(4) (2013), 1277–1289.

21.

Chen

, Xue

, Ma

and Xu

, Measures of uncertainty for neighborhood rough sets, Knowledge-Based Systems 120(3) (2017), 226–235.

22.

Xie

, Liu

, Li

and Zhang

, New measures of uncertainty foran interval-valued information system, Information Sciences 470(1) (2019), 156–174.

23.

Gao

, Lai

, Zhou

, Wen

and Wong

, Granular maximum decision entropy-based monotonic uncertainty measure for attribute reduction, International Journal of Approximate Reasoning 104(1) (2019), 9–24.

24.

Greco

, Matarazzo

and Slowinski

, Rough approximation of a preference relation by dominance relations, European Journal of Operational Research 117(1) (1999), 63–83.

25.

Sang

, Chen

, Wan

, Yang

, Li

, Xu

and Luo

, Self-adaptive weighted interaction feature selection based on robustfuzzy dominance rough sets for monotonic classification, Knowledge-based Systems 253 (2022), 109523.

26.

and Hu

, Dominance-based rough set approach to incomplete ordered information systems, Information Sciences 346(6) (2016), 106–129.

27.

Yang

, Zhang

, De Baets

, Jah

and Shi

, Quantitative Dominance-Based Neighborhood Rough Sets via Fuzzy Preference Relations, IEEE Transactions On Fuzzy Systems 29(3) (2021), 515–529.

28.

Qian

and Shu

, Attribute reduction in incomplete ordered information systems with fuzzy decision, Applied Soft Computing Journal 73(12) (2019), 242–253.

29.

Yang

, Qi

, Yu

and Yang

, α-Dominance relation and rough sets in interval-valued information systems, Information Sciences 294(2) (2015), 334–347.

30.

Huang

, Li

, Feng

and Zhou

, Dominance-based rough sets inmulti-scale intuitionistic fuzzy decision tables, Applied Mathematics and Computation 348(5) (2019), 487–512.

31.

Sang

, Yang

, Chen

and Wang

, Incremental attribute reduction algorithm in Dominance-based rough set, Computer Science 47(8) (2020), 137–143.

32.

Wang

, Li

, Luo

, Hu

, Fujita

and Huang

, A novel approach for efficient updating approximations in dynamic ordered information systems, Information Sciences 507(1) (2020), 197–219.

33.

Palangetic

, Cornelis

, Greco

and Slowinski

, Fuzzy extensions of the dominance-based rough set approach, International Journal of Approximate Reasoning 129 (2021), 1–19.

34.

, Guo

, Yu

and Liu

, Information entropy for ordinal classification, Information Sciences 53(6) (2010), 1188–1200.

35.

and Zeng

, Sequential multi-criteria feature selection algorithm based on agent genetic algorithm, Applied Intelligence 33(10) (2010), 117–131.

36.

, Yang

, Song

, Li

, Wang

and Yu

, Neighborhood attribute reduction: a multi-criterion approach, International Journal of Machine Learning and Cybernetics 10(4) (2019), 731–742.

37.

Shu

and Shen

, Multi-criteria feature selection oncost-sensitive data with missing values, Pattern Recognition 51(3) (2016), 268–280.

38.

Sun

, Wang

and Sun

, A multi-criteria fusion featureselection algorithm for fault diagnosis of helicopter planetary geartrain, Chinese Journal of Aeronautics 33(5) (2020), 1549–1561.

39.

, Guo

, Li

and Xiao

, A feature selection algorithm based on redundancy analysis and interaction weight, Applied Intelligence 51(4) (2020), 2672–2686.

40.

, Zhang

and Zhang

, Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems, Applied Soft Computing 9(9) (2009), 1244–1251.

41.

Dheeru

and Taniskidou Karra

, (2017) http://archive.ics.uci.edu/ml.

A multi-measure feature selection method for decision systems with preference relation

Abstract

Keywords

1 Introduction

2 Preliminaries

Table 1 The ordered decision system on the students’ grades and general evaluation information Chinese c1 Math c2 English c3 Computer c4 Evaluation d student x1 2 3 2 3 1 student x2 3 1 2 1 3 student x3 1 2 3 2 3 student x4 4 3 4 2 2 student x5 3 1 2 3 3

3.1 The drawbacks of a single feature measure

3.2 A multi-measure for selecting feature subset

Table 2 Description of eight ordered datasets Data set Objects Features Classes Post Operative 87 8 3 Dermatology 366 34 6 Libra Movement 360 90 15 Sonar 208 60 2 Car 1728 6 4 Steel Plates Faults 798 38 6 SPECTF 267 45 2 Wdbc 569 31 2

5 Conclusion and feature work

Footnotes

Acknowledgment

References

Table 1
The ordered decision system on the students’ grades and general evaluation information

Chinese c₁ Math c₂ English c₃ Computer c₄ Evaluation d

student x1 2 3 2 3 1

student x2 3 1 2 1 3

student x3 1 2 3 2 3

student x4 4 3 4 2 2

student x5 3 1 2 3 3

Table 2
Description of eight ordered datasets

Data set Objects Features Classes

Post Operative 87 8 3

Dermatology 366 34 6

Libra Movement 360 90 15

Sonar 208 60 2

Car 1728 6 4

Steel Plates Faults 798 38 6

SPECTF 267 45 2

Wdbc 569 31 2