Attribute reductions based on δ -fusion condition entropy and harmonic similarity degree in interval-valued decision systems

Abstract

This paper defines an improved similarity degree based on inclusion degree as well as advanced information system based on interval coverage and credibility, and thus an attribute reduction framework embodying 4×2 = 8 reduct algorithms is systematically constructed for application and optimization in interval-valued decision systems. Firstly, a harmonic similarity degree is constructed by introducing interval inclusion degree and harmonic average mechanism, which has better semantic interpretation and robustness. Secondly, interval credibility degree and coverage degree are defined for information fusion, and they are combined to propose a δ-fusion condition entropy. The improved condition entropy achieves the information reinforcement and integrity by dual quantization fusion of credibility and coverage, and it obtains measure development from granularity monotonicity to non-monotonicity. In addition, information and joint entropies are also constructed to obtain system equations. Furthermore, 8 reduct algorithms are designed by using attribute significance for heuristic searches. Finally, data experiments show that our five novel reduct algorithms are superior to the three contrast algorithms on classification performance, which also further verify the effectiveness of proposed similarity degree, information measures and attribute reductions.

Keywords

Attribute reductions interval-valued decision systems information measurements harmonic similarity degree interval coverage degree and credibility degree

1 Introduction

Attribute reductions (ARs) play an important role in data pre-processing, which have become a key research direction, such as on data mining [1, 2], information processing [3, 4] and machine learning [5, 6]. Specifically, knowledge granularity and information measurement underlie ARs, thus they become the main driving forces for ARs improvement [7, 8].

ARs come from rough set theory, which can effectively deal with uncertain information [9]. Classical rough set theory mainly involves single-valued systems. Due to the complexity of practical problems, single-valued systems have been extended to interval-valued information systems (IVISs) and interval-valued decision systems (IVDSs).

In IVISs/IVDSs, knowledge granularity and information measurement are two important factors for ARs. Therefore, this paper begins by elaborating the existing research status and improvement space from the following three aspects of knowledge granularity, information measurement and ARs. (1) Similarity degree between intervals is the foundation for building knowledge granularity, which is concerned by many researchers. The classical similarity degree (recorded as SD) in Equations (2)(3) is proposed by [10], which mainly considers integrated maximum and minimum of intervals’ endpoints. By [11], α-weak similarity degree is put forward by combining the maximum and minimum similarity degrees. Ma et al. [12] utilize kernel function and the distance to define kernel similarity degree. Liu et al. [13] put forward new distance to define a similarity degree. In [14], intersection-union similarity degree (recorded as IUSD) is defined by ratio of the intersection lengths to union lengths. Dai et al. [15] define the relative bound difference similarity degree (recorded as RBD) by using the distance between the endpoints. Generally speaking, these SDs essentially mainly consider the endpoint or distance and may lack sufficient semantic analysis. Specifically, we focus on SD and IUSD which are related to inclusion degrees. IUSD between $A = [a^{-}, a^{+}]$ and $B = [b^{-}, b^{+}]$ is defined as ${IUSD}_{A B} = \frac{| A \cap B |}{| A \cup B |}$ [14], where $A \cup B = [\min (a^{-}, b^{-}), \max (a^{+}, b^{+})]$ while $A \cap B = \emptyset$ if a⁺ < b^- or b⁺ < a^-, otherwise $A \cap B = [\max (a^{-}, b^{-}), \min (a^{+}, b^{+})]$ . Next, an example is provided to illustrate the deficiency of existing SDs. Given $A = [3, 7]$ and $B = [1, 9]$ , the inclusion degrees [16] of $A$ relative to $B$ and $B$ relative to $A$ are $D (A, B) = \frac{| A \cap B |}{| A |} = \frac{7 - 3}{7 - 3} = 1$ , $D (B, A) = \frac{| A \cap B |}{| B |} = \frac{7 - 3}{9 - 1} = 0.5$ . SD and IUSD between $A$ and $B$ are ${SD}_{A B} = 1$ and ${IUSD}_{A B} = \frac{| A \cap B |}{| A \cup B |} = \frac{7 - 3}{9 - 1} = 0.5$ , where ${SD}_{A B}$ is obtained by Definition 1. Thus, $\begin{matrix} D (B, A) & = 0.5 < {SD}_{A B} = 1 = D (A, B), \\ D (B, A) & = 0.5 = {IUSD}_{A B} < D (A, B) = 1 . \end{matrix}$ (1) It can be found that ${SD}_{A B}$ is too large while ${IUSD}_{A B}$ is too small in comparison with $D (A, B) = 1$ and $D (B, A) = 0.5$ . In other words, the ideal similarity degree should be between two inclusion degrees, which is in line with common sense. (2) Uncertainty measures, especially information measures, essentially determine the dependency learning effectiveness of ARs through its representation quality and quantification ability [15 , 17– 21]. Xie et al. [19] put forward a θ-rough entropy by multiplying θ-information granulation and θ-roughness. Zhang et al. [20] propose the interval approximation accuracy and roughness by defining a interval-decision entropy. Liu et al. [21] define a condition entropy (CE) based on fuzzy α-similarity relation in incomplete IVISs. Recently, Dai et al. [15] define information entropy, CE and joint entropy in IVDSs, then establish a framework of ARs by using a CE H^δ (D|B) (See Equation (11)). However, the above measures lack the cardinality of decision classes and may lose information to obstruct the performance of ARs. To this end, novel information measures are constructed by introducing interval credibility and coverage degrees $α_{ij} = \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| {SC}_{B}^{δ} (x_{i}) |}$ , $β_{ij} = \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| D_{j} |}$ . In fact, credibility and coverage degrees are from single-valued systems [22, 23], and they evolve into neighborhood coverage and credibility degrees in neighborhood systems [24], which can effectively reflect the classification ability of condition attributes relative to decision attributes. Therefore, improved information measures can achieve complement information and enhancement measurement. (3) Regarding ARs, monotonic/non-monotonic ARs are derived from monotonic/non-monotonic measures, respectively. In fact, non-monotonic ARs have been widely applied in other systems, such as neighborhood systems [24] and fuzzy neighborhood systems [7 , 26]. These ideas encourage us to establish non-monotonic ARs.

Fig. 1

Research framework and development thoughts of similarity degrees, information measures and attribute reductions.

Through the above background, there is improvement room in the existing knowledge granularity, information measures and ARs. More specifically, Dai et al. [15] utilize 3 SDs (i.e., SD, IUSD, RBD) and a CE H^δ (D|B) to establish a monotonic ARs framework. This paper puts forward a non-monotonic CE FH^δ (D|B) and a harmonic similarity degree (recorded as HSD) to reconstruct the uncertainty measures, and thus constructs a non-monotonic ARs framework. These two innovations lead to the systematic comparison and optimal selection in the horizontal-vertical framework, meanwhile our research framework and development thoughts of SDs, information measures and ARs are shown in Fig. 1. (1) Horizontally, the proposed harmonic similarity degree HSD improves the existing SD, IUSD and RBD in [10 , 15]. HSD is defined by using harmonic average of inclusion degrees that indicates the degree of one interval included by another one. Inclusion degrees have good semantics, and they have many practical applications [20, 27]. In addition, the value of HSD is between SD and IUSD, i.e., IUSD ≤ HSD ≤ SD (See Theorem 1), the fact shows that HSD can better depict the similarity degree because SD is often too large while IUSD is too small. Based on HSD, interval similarity relations and similarity classes are naturally established to further improve the subsequent measurements. (2) Vertically, a non-monotonic δ-fusion CE FH^δ (D|B) in Equation (13) is proposed to improve the monotonic CE H^δ (D|B) in [15]. H^δ (D|B) only depends on the cardinality of similarity classes and their intersection with decision classes, while the proposed FH^δ (D|B) can objectively reflect the classification decision-making ability because of deeply considering the interval credibility and coverage. Therefore, improved FH^δ (D|B) can maintain the information integrity and systematization. In addition, information entropy and joint entropy are systematically constructed. (3) By Fig. 1, 8 ARs are compared based on four SDs and two CEs, for example, FH-HSD-AR and HSD-AR represent two combinations of (HSD, FH^δ (D|B)) and (HSD, H^δ (D|B)). It is worth noting that our focus is HSD, FH^δ (D|B) (recorded as FH(D|B)) and 5 improved ARs, these major contributions are marked with green border. Our proposed 5 ARs (i.e., HSD-AR, FH-SD-AR, FH-IU-AR, FH-RBD-AR, FH-HSD-AR) outperform the existing 3 methods (i.e., SD-AR, IU-AR, RBD-AR) by classification performance. Therefore, these horizontal-vertical comprehensive comparisons can lead to optimized choices of ARs on SDs and CEs.

The rest is organized as follows. Section 2 reviews some basic concepts in IVDSs. Section 3 proposes a novel harmonic similarity degree which induces similarity relation and similarity class, and their basic properties are further studied. Section 4 first defines the interval credibility degree and coverage degree, then comprehensively utilizes them to construct a non-monotonic CE. Furthermore, the responding information and joint entropies are systematically constructed, and their relationships and properties are verified. Section 5 constructs an ARs framework for heuristic search based on 4 SDs and 2 CEs, and designs the corresponding 4×2=8 reduct algorithms. In Section 6, datasets experiments verify our methods by SDs, CEs and classification accuracies of ARs. Finally, Section 7 summarizes this paper.

2 Preliminary knowledge

This section will review some basic concepts in IVDSs, such as similarity degree, similarity relation and similarity class. Next, a possibility degree between two interval values is reviewed first.

Definition 1. [10] Given two interval values $A = [a^{-}, a^{+}]$ and $B = [b^{-}, b^{+}]$ , the possible degree of $A$ relative to $B$ is defined as $P_{(A \geq B)} = \min {1, \max {\frac{a^{+} - b^{-}}{(a^{+} - a^{-}) + (b^{+} - b^{-})}, 0}} .$ (2) Further, the similarity degree of $A$ relative to $B$ is ${SD}_{A B} = 1 - | P_{(A \geq B)} - P_{(B \geq A)} | .$ (3)

Considering an interval-valued decision system IVDS = (U, AT = C ∪ D, V, f), where U = {x₁, x₂, ⋯ , x_n} is a non-empty finite set of objects, C = {c_k|k = 1, 2, ⋯ , r} represents a non-empty finite set of condition attributes while the set D = {d} is composed of a decision attribute d, f is a mapping function: U × C → V_a, V_a is an interval, and the value range is V = ⋃ _a∈CV_a.

If x_i, x_j ∈ U, $x_{ik} = [x_{ik}^{-}, x_{ik}^{+}]$ and $x_{jk} = [x_{jk}^{-}, x_{jk}^{+}]$ are values of x_i and x_j on attribute c_k, then the similarity degree between x_i and x_j with respect to c_k is ${SD}_{c_{k}} (x_{i}, x_{j}) = 1 - | P_{(x_{ik} \geq x_{jk})} - P_{(x_{jk} \geq x_{ik})} | .$ (4)

Definition 2. [20] In an IVDS = (U, C ∪ D, V, f) with b ∈ B ⊆ C and δ ∈ [0, 1], the δ interval similarity relation with respect to b and B are $\begin{matrix} {SR}_{b}^{δ} & = {(x_{i}, x_{j}) \in U \times U | {SD}_{b} (x_{i}, x_{j}) \geq δ}, \\ {SR}_{B}^{δ} & = {(x_{i}, x_{j}) \in U \times U | {SD}_{b} (x_{i}, x_{j}) \geq δ, \forall b \in B} . \end{matrix}$

Table 1

An interval-valued decision table

U	c ₁	c ₂	c ₃	c ₄	c ₅	d
x ₁	[0.15, 0.94]	[0.07, 0.35]	[0.55, 0.74]	[0.13, 0.14]	[0.19, 0.57]	3
x ₂	[0.01, 0.12]	[0.43, 0.84]	[0.72, 0.74]	[0.50, 0.74]	[0.38, 0.48]	2
x ₃	[0.20, 0.56]	[0.28, 0.60]	[0.07, 0.95]	[0.57, 0.92]	[0.64, 0.70]	3
x ₄	[0.53, 0.64]	[0.57, 0.58]	[0.67, 0.80]	[0.21, 0.54]	[0.02, 0.13]	1
x ₅	[0.42, 0.75]	[0.51, 0.65]	[0.26, 0.43]	[0.60, 0.82]	[0.72, 0.74]	1
x ₆	[0.16, 0.96]	[0.25, 0.87]	[0.75, 1.00]	[0.47, 0.74]	[0.58, 0.78]	3
x ₇	[0.14, 0.28]	[0.49, 0.53]	[0.31, 0.32]	[0.23, 0.68]	[0.01, 0.57]	2
x ₈	[0.11, 0.89]	[0.19, 0.77]	[0.81, 0.87]	[0.10, 0.86]	[0.81, 0.99]	1

Table 2

Granulation ${SC}_{B}^{δ}$ on representative δ=0.3 and three B subsets

δ\B	B₁ = {c₁}	B₃ = {c₁, c₂, c₃}	B₅ = {c₁, c₂, c₃, c₄, c₅}
δ = 0.3	({x₁, x₃, x₄, x₅, x₆, x₈} , {x₂} ,	({x₁} , {x₂} ,	({x₁} , {x₂} ,
	{x₁, x₃, x₅, x₆, x₇, x₈} , {x₁, x₄, x₅, x₆, x₈} ,	{x₃, x₅, x₆, x₇} , {x₄} ,	{x₃, x₆} , {x₄} ,
	{x₁, x₃, x₄, x₅, x₆, x₈} , {x₁, x₃, x₄, x₅, x₆, x₈} ,	{x₃, x₅} , {x₃, x₆, x₈} ,	{x₅} , {x₃, x₆} ,
	{x₃, x₇, x₈} , {x₁, x₃, x₄, x₅, x₆, x₇, x₈})	{x₃, x₇} , {x₆, x₈})	{x₇} , {x₈})

Further, the δ interval similarity class of x_i and the similarity granulation with respect to B are $\begin{matrix} {SC}_{B}^{δ} (x_{i}) = {x_{j} \in U | {SD}_{b} (x_{i}, x_{j}) \geq δ, \forall b \in B}, \\ {SC}_{B}^{δ} = ({SC}_{B}^{δ} (x_{1}), {SC}_{B}^{δ} (x_{2}), \dots, {SC}_{B}^{δ} (x_{n})) . \end{matrix}$ (5)

Proposition 1. Similarity classes exhibit granular monotonicity with respect to attribute and δ.

If B₁ ⊆ B₂ ⊆ C, then ${SC}_{B_{1}}^{δ} (x) \supseteq {SC}_{B_{2}}^{δ} (x)$ .

If 0 ≤ δ₁ ≤ δ₂ ≤ 1, then ${SC}_{B}^{δ_{1}} (x) \supseteq {SC}_{B}^{δ_{2}} (x)$ .

Next, a useful example is offered to explain relevant concepts mentioned above in an IVDS.

Example 1. Table 1 offers an IVDS (U, C ∪ D, V, f) [15, 17], where C = {c₁, c₂, c₃, c₄, c₅}, D = {d} and U = {x₁, x₂, ⋯ , x₈}. Taking x₁, x₃ on attribute c₁ as a case, corresponding intervals x₁₁ = [0.15, 0.94] and x₃₁ = [0.20, 0.56] can be used to obtain the possible degrees and similarity degree between x₁ and x₃ on c₁, i.e.,

$\begin{matrix} P_{(x_{11} \geq x_{31})} = \frac{0.74}{1.15}, P_{(x_{31} \geq x_{11})} = \frac{0.41}{1.15}, \\ {SD}_{c_{1}} (x_{1}, x_{3}) = 1 - | P_{(x_{11} \geq x_{31})} - P_{(x_{31} \geq x_{11})} | = 0.71 . \end{matrix}$ By considering all objects, we further obtain the similarity degree matrix ${SD}_{c_{1}} = {[\begin{matrix} 1.00 & 0.00 & 0.71 & 0.91 & 0.93 & 0.98 & 0.28 & 0.94 \\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.02 \\ 0.71 & 0.00 & 1.00 & 0.12 & 0.41 & 0.69 & 0.32 & 0.79 \\ 0.91 & 0.00 & 0.12 & 1.00 & 1.00 & 0.95 & 0.00 & 0.81 \\ 0.93 & 0.00 & 0.41 & 1.00 & 1.00 & 0.96 & 0.00 & 0.85 \\ 0.98 & 0.00 & 0.69 & 0.95 & 0.96 & 1.00 & 0.26 & 0.92 \\ 0.28 & 0.00 & 0.32 & 0.00 & 0.00 & 0.26 & 1.00 & 0.37 \\ 0.94 & 0.02 & 0.79 & 0.81 & 0.85 & 0.92 & 0.37 & 1.00 \end{matrix}]}_{8 \times 8} .$

For similarity granulation, the condition attribute addition chain and δ increase sequence are set as $\begin{matrix} B & : B_{1} = {c_{1}} \subset B_{2} = {c_{1}, c_{2}} \subset \dots \subset \\ B_{5} = {c_{1}, c_{2}, c_{3}, c_{4}, c_{5}} = C, \\ δ & : δ_{1} = 0.1 < δ_{2} = 0.2 < \dots < δ_{10} = 1.0 . \end{matrix}$ (6) Table 2 offers a part of results of granular structures ${SC}_{B}^{δ} (x)$ on B₁, B₃, B₅ and δ = 0.3.

3 An advanced harmonic similarity degree based on inclusion degree

In IVDSs, similarity degrees (SDs) deduce similarity relations and condition granularities, so they are the foundation of subsequent uncertainty measurements and ARs. An improved similarity degree is proposed by means of inclusion degree and harmonic average, so as to promote later uncertainty measurements (especially CE) and ARs.

By Equation (1) in Introduction, the relationships of two inclusion degrees and two SDs (SD and IUSD) seem to be unreasonable. Specifically, if two inclusion degrees have different values $D (B, A) = 0.5, D (A, B) = 1$ , then the reasonable similarity degree of $A$ relative to $B$ should be between 0.5 and 1. However, the existing SDs can not meet this requirement. Thus, the following mean value inequality can be an enlightening method. $\begin{matrix} \min {a, b} \leq \frac{2}{\frac{1}{a} + \frac{1}{b}} \leq \sqrt{ab} \leq \frac{a + b}{2} \\ \leq \sqrt{\frac{a^{2} + b^{2}}{2}} \leq \max {a, b}, \end{matrix}$ (7) where a, b > 0 and the equal sign is true iff a = b. In brief, the above four types of average values can deduce four SDs which can obtain good properties and semantics based on inclusion degree and average strategy. In fact, arithmetic average SD has been applied to measure the similarity degree between interval sets for interval set decision systems [20]. Based on the above analysis, harmonic average method with simple formula is focused on in this paper, so a harmonic similarity degree is defined as follow.

Definition 3. Given two interval values $A = [a^{-}, a^{+}]$ and $B = [b^{-}, b^{+}]$ , the harmonic similarity degree of $A$ relative to $B$ is defined as ${HSD}_{A B} = \frac{2}{\frac{1}{D (A, B)} + \frac{1}{D (B, A)}} = \frac{2 | A \cap B |}{| A | + | B |},$ (8) where $D (A, B) = \frac{| A \cap B |}{| A |}$ and $D (B, A) = \frac{| A \cap B |}{| B |}$ are the inclusion degrees of $A$ relative to $B$ and $B$ relative to $A$ , respectively.

Definition 4. In IVDS = (U, C ∪ D, V, f), if $x_{ik} = [x_{ik}^{-}, x_{ik}^{+}] \equiv A$ and $x_{jk} = [x_{jk}^{-}, x_{jk}^{+}] \equiv B$ are values of x_i, x_j ∈ U on attribute c_k ∈ C, then three SDs between x_i and x_j regarding c_k are $\begin{matrix} {SD}_{c_{k}} (x_{i}, x_{j}) = {SD}_{A B}, \\ {IUSD}_{c_{k}} (x_{i}, x_{j}) = {IUSD}_{A B}, \\ {HSD}_{c_{k}} (x_{i}, x_{j}) = {HSD}_{A B} . \end{matrix}$ (9) Furthermore, three SDs regarding B ⊆ C are $\begin{matrix} {SD}_{B} (x_{i}, x_{j}) & = ⋀_{c_{k} \in B} {SD}_{c_{k}} (x_{i}, x_{j}), \\ {IUSD}_{B} (x_{i}, x_{j}) & = ⋀_{c_{k} \in B} {IUSD}_{c_{k}} (x_{i}, x_{j}), \\ {HSD}_{B} (x_{i}, x_{j}) & = ⋀_{c_{k} \in B} {HSD}_{c_{k}} (x_{i}, x_{j}) . \end{matrix}$ (10) They can form n × n matrices SD _{c
_k}, IUSD _{c
_k}, linebreak HSD _{c
_k} and SD _B, IUSD _B, HSD _B via changes i, j = 1, ⋯ , n.

By Equations (9) (10), 3 SDs are concretized in IVDS, thus IUSD-based and HSD-based granularity structures can be obtained by analogy with Definition 2.

Definition 5. The IUSD-based similarity relation and class of x_i ∈ U with respect to B ⊆ C are $\begin{matrix} {IUSR}_{B}^{δ} & = {(x_{i}, x_{j}) \in U \times U | {IUSD}_{B} (x_{i}, x_{j}) \geq δ}, \\ {IUSC}_{B}^{δ} (x_{i}) & = {x_{j} \in U | {IUSD}_{B} (x_{i}, x_{j}) \geq δ} . \end{matrix}$ Similarly, HSD-based similarity relation and class are $\begin{matrix} {HSR}_{B}^{δ} = {(x_{i}, x_{j}) \in U \times U | {HSD}_{B} (x_{i}, x_{j}) \geq δ}, \\ {HSC}_{B}^{δ} (x_{i}) = {x_{j} \in U | {HSD}_{B} (x_{i}, x_{j}) \geq δ} . \end{matrix}$

Theorem 1. For ∀x_i, x_j, B, δ, we have

(1) IUSD_B (x_i, x_j) ≤ HSD_B (x_i, x_j) ≤ SD_B (x_i, x_j).

(2) ${SC}_{B}^{δ} (x) \supseteq {HSC}_{B}^{δ} (x) \supseteq {IUSC}_{B}^{δ} (x)$ .

In general, the proposed HSD has following three advantages, compared with the existing SD and IUSD.

According to Equation (8), HSD is defined from the perspective of inclusion degrees’ harmonic mean, which has clearer semantics and richer information than SD and IUSD. The idea is inspired by a case on arithmetic average of inclusion degrees [20], i.e., ${PD}_{A_{1} B_{1}} = \frac{1}{2} (\frac{| A_{1} \cap B_{1} |}{| A_{1} |} + \frac{| A_{2} \cap B_{2} |}{| A_{2} |})$ , where $A_{1} = [A_{1}, A_{2}], B_{1} = [B_{1}, B_{2}]$ are two interval sets for interval-set decision systems.

By Theorem 1, the advanced HSD achieves a compromise from the perspective of size relationship. In fact, SD is too large and IUSD is too small, while HSD becomes a better case between them. In addition, ${HSD}_{A B}$ neither equals to $D (A, B)$ nor equals to $D (B, A)$ if $D (A, B) \neq D (B, A)$ , the fact is attributed to the basic properties of harmonic average inequality. Unfortunately, ${SD}_{A B} = D (A, B)$ and ${IUSD}_{A B} = D (B, A)$ when $D (A, B) \neq D (B, A)$ , as revealed in Equation (1). Therefore, HSD provides a compromise option, and it becomes more reasonable.

Through Theorem 1, HSD-based similarity class ${HSC}_{B}^{δ} (x)$ can collect more reasonable samples, no more and no less. More specifically, ${SC}_{B}^{δ} (x)$ collects too many samples easily to become a complete set U, while ${IUSC}_{B}^{δ} (x)$ collects few samples easily to become a single element set {x}, as shown in Example 1. Thus, HSD-based similarity class is reasonable and applicable.

Therefore, HSD enriches the research of SDs, and it can obtain the mechanism improvement and development. Furthermore, HSD will be used for subsequent uncertainty measures and ARs to show its improvement effects. Before that, an example is utilized to illustrate the effectiveness of the proposed HSD.

Example 2. Continue Example 1, three SDs between x₁₁ = [0.15, 0.94] and x₃₁ = [0.20, 0.56] are SD_{c
₁} (x₁, x₃) =0.7130, HSD_{c
₁} (x₁, x₃) =0.6261, and IUSD_{c
₁} (x₁, x₃) =0.4557.

Focus on all objects on attribute c₁, we have

${HSD}_{c_{1}} = [\begin{matrix} 1.00 & 0.00 & 0.63 & 0.24 & 0.59 & 0.98 & 0.28 & 0.94 \\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.02 \\ 0.63 & 0.00 & 1.00 & 0.13 & 0.41 & 0.62 & 0.32 & 0.63 \\ 0.24 & 0.00 & 0.13 & 1.00 & 0.50 & 0.24 & 0.00 & 0.25 \\ 0.59 & 0.00 & 0.41 & 0.50 & 1.00 & 0.58 & 0.00 & 0.59 \\ 0.98 & 0.00 & 0.62 & 0.24 & 0.58 & 1.00 & 0.26 & 0.92 \\ 0.28 & 0.00 & 0.32 & 0.00 & 0.00 & 0.26 & 1.00 & 0.30 \\ 0.94 & 0.02 & 0.63 & 0.25 & 0.59 & 0.92 & 0.30 & 1.00 \end{matrix}],$

${IUSD}_{c_{1}} = [\begin{matrix} 1.00 & 0.00 & 0.46 & 0.14 & 0.42 & 0.96 & 0.16 & 0.89 \\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.01 \\ 0.46 & 0.00 & 1.00 & 0.07 & 0.25 & 0.45 & 0.19 & 0.46 \\ 0.14 & 0.00 & 0.07 & 1.00 & 0.33 & 0.14 & 0.00 & 0.14 \\ 0.42 & 0.00 & 0.25 & 0.33 & 1.00 & 0.41 & 0.00 & 0.42 \\ 0.96 & 0.00 & 0.45 & 0.14 & 0.41 & 1.00 & 0.15 & 0.86 \\ 0.16 & 0.00 & 0.19 & 0.00 & 0.00 & 0.15 & 1.00 & 0.18 \\ 0.89 & 0.01 & 0.46 & 0.14 & 0.42 & 0.86 & 0.18 & 1.00 \end{matrix}] .$ Thus, the size relationship with respect to 3 SDs, i.e., IUSD_{c
₁} (x_i, x_j) ≤ HSD_{c
₁} (x_i, x_j) ≤ SD_{c
₁} (x_i, x_j), is consistent with Theorem 1.

As for similarity granulation, the granularity relationship ${SC}_{B}^{δ} (x) \supseteq {HSC}_{B}^{δ} (x) \supseteq {IUSC}_{B}^{δ} (x)$ in Theorem 1 can be verified by ${SC}_{B_{3}}^{0.3} = ({x_{1}}, {x_{2}}, {x_{3}, x_{5}, x_{6}, x_{7}}, {x_{4}}, {x_{3}, x_{5}}, {x_{3}, x_{6}, x_{8}}, {x_{3}, x_{7}}, {x_{6}, x_{8}})$ , ${HSC}_{B_{3}}^{0.3} = ({x_{1}}, {x_{2}}, {x_{3}, x_{5}}, {x_{4}}, {x_{5}}, {x_{3}, x_{6}$ , x₈} , {x₇}, {x₆, x₈}), ${IUSC}_{B_{3}}^{0.3} = ({x_{1}}, {x_{2}}, {x_{3}}$ , {x₄} , {x₅} , {x₆} , {x₇} , {x₈}). Therefore, HSD-based ${HSC}_{B}^{δ} (x)$ is reasonable and effective.

4 The improved information measures for interval-valued decision systems

So far, three SDs (i.e., SD, IUSD, HSD) can obtain three granulation coverings (such as in Table 2), which are the foundations of information measures. For convenience, we mainly utilize the representative similarity class ${SC}_{B}^{δ} (x)$ to formulate information measures, while ${IUSC}_{B}^{δ} (x)$ and ${HSC}_{B}^{δ} (x)$ have similar forms.

In interval-valued systems, many information measures pursue granularity monotonicity [12 , 19]. However, monotonic attribute reduction methods may not get ideal classification effects [28], because these measurement values may be low when the original data has poor classification performance. For this reason, robuster measures with great values and granularity non-monotonicity are worth further exploration. In this section, two concepts of interval credibility degree and coverage degree are first introduced into IVDSs, then they are comprehensively utilized for improved information measures including CE, information entropy and joint entropy. First of all, the existing CE is reviewed as follows.

Definition 6.[15] The δ-condition entropy of B with respect to the decision attribute D is defined as $\begin{matrix} H^{δ} (D | B) = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \\ \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| {SC}_{B}^{δ} (x_{i}) |} . \end{matrix}$ (11)

Next, the interval credibility degree and coverage degree are generalized in IVDSs to establish improved information measures.

Definition 7. The interval credibility degree α_ij and coverage degree β_ij of x_i ∈ U on D_j ∈ U/D are $\begin{matrix} α_{ij} = \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| {SC}_{B}^{δ} (x_{i}) |}, β_{ij} = \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| D_{j} |} . \end{matrix}$ (12)

Definition 8. The improved δ-fusion condition entropy of B with respect to D is defined as $\begin{matrix} {FH}^{δ} (D | B) = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \\ \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| {SC}_{B}^{δ} (x_{i}) | | D_{j} |} . \end{matrix}$ (13)

Proposition 2. (Size relationship)

$H^{δ} (D | B) = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} (α_{ij})$ .

${FH}^{δ} (D | B) = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} (α_{ij} β_{ij})$ .

FH^δ (D|B) ≥ H^δ (D|B).

Proposition 3. (Maximum and minimum)

If ${SC}_{B}^{δ} (x_{i}) = U$ for ∀x_i ∈ U and |D_j|=1 for ∀D_j ∈ U/D, then FH^δ (D|B) _max = log₂|U|.

If ${SC}_{B}^{δ} (x_{i}) = {x_{i}}$ for ∀x_i ∈ U and |D_j|=1 for ∀D_j ∈ U/D, then FH^δ (D|B) _min = 0.

Theorem 2. (Granulation non-monotonicity)

If ∅ ⊂ B₁ ⊆ B₂ ⊆ C, then neither

FH^δ (D|B₁) ≤ FH^δ (D|B₂) nor

FH^δ (D|B₁) ≥ FH^δ (D|B₂) always holds.

If 0 ≤ δ₁ ≤ δ₂ ≤ 1, then neither

FH^δ
₁ (D|B) ≤ FH^δ
₂ (D|B) nor

FH^δ
₁ (D|B) ≥ FH^δ
₂ (D|B) always holds.

Proof. These conclusions are verified by Equation (17) and Fig. 3.

Base on these properties mentioned above, the relevant construction mechanism and rationality of improved FH^δ (D|B) are described as follows.

In fact, credibility degree and coverage degree have been applied in other systems, such as dual quantization fusion of $\frac{| [x_{i}]_{B} \cap D_{j} |}{| [x_{i}]_{B} |}$ and $\frac{| [x_{i}]_{B} \cap D_{j} |}{| D_{j} |}$ in classical decision systems [22 , 29] as well as $\frac{| n_{B}^{δ} (x_{i}) \cap [x_{i}]_{D} |}{| n_{B}^{δ} (x_{i}) |}$ and $\frac{| n_{B}^{δ} (x_{i}) \cap [x_{i}]_{D} |}{| [x_{i}]_{D} |}$ in neighborhood decision systems [24]. Thus, the concepts of interval credibility degree and coverage degree are naturally introduced into IVDSs.

In contrast, the existing H^δ (D|B) is mainly related to α_ij, while the advanced FH^δ (D|B) is concerned with α_ij and β_ij. In other words, H^δ (D|B) mainly emphasizes condition-oriented interval credibility degree α_ij with relative information, but ignores decision-oriented coverage degree β_ij with absolute information. Therefore, improved FH^δ (D|B) can contain more information by comprehensively fusing interval credibility degree and coverage degree, and thus it can strengthen subsequent ARs of IVDSs.

H^δ (D|B) impels monotonic ARs, while non-monotonic ARs may achieve better classification learning effect, and the latter is becoming a development direction of ARs, such as applications in incomplete interval-valued decision systems [21], neighborhood decision systems [24] and fuzzy neighborhood decision systems [7 , 26]. In contrast, FH^δ (D|B) obtain granularity non-monotonicity in Theorem 2, thus it stimulates non-monotonic ARs to improve classification learning performance.

In general, the advanced FH^δ (D|B) fusing interval credibility degree with coverage degree can effectively reflect the classification ability from dual quantitative perspective of condition-oriented relative information and decision-oriented absolute information, and it obtains a larger value and measurement development from granularity monotonicity to non-monotonicity.

Furthermore, information entropy and joint entropy are successively constructed based on proposed CE, and their relationship are further studied as follows.

Definition 9. The δ-fusion information entropy of B and joint entropy of B with respect to D are defined as $\begin{matrix} {FH}^{δ} (B) = - \sum_{i = 1}^{| U |} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |}, \\ {FH}^{δ} (B, D) = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \\ \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| U | | D_{j} |} . \end{matrix}$ (14)

Proposition 4. The following size relationship is true, ${FH}^{δ} (B, D) = {FH}^{δ} (D | B) + {FH}^{δ} (B) .$ (15)

Proof. ∀x_i ∈ U, D_j ∈ U/D, we have $\sum_{j = 1}^{| U |} | {SC}_{B}^{δ} (x_{i}) \cap D_{j} | = | {SC}_{B}^{δ} (x_{i}) |$ . By Equation (14), $\begin{matrix} {FH}^{δ} (B) = - \sum_{i = 1}^{| U |} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \\ = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| {SC}_{B}^{δ} (x_{i}) |} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \\ = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} . \end{matrix}$ (16)

By Equations (13) (14) (16), we have $\begin{matrix} {FH}^{δ} (D | B) + {FH}^{δ} (B) = \\ - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| {SC}_{B}^{δ} (x_{i}) | | D_{j} |} \\ - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \\ = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2} | {SC}_{B}^{δ} (x_{i}) |}{| {SC}_{B}^{δ} (x_{i}) | | D_{j} | | U |} \\ = - \sum_{i = 1}^{| U |} \sum_{j = 1}^{| U / D |} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| U | | D_{j} |} \\ = {FH}^{δ} (B, D) . \end{matrix}$

Next, all the proposed uncertainty measurements are calculated in Algorithm 1. Steps 1-9 calculate similarity class ${SC}_{B}^{δ} (x_{i})$ by three “for” loops. Step 10 initializes uncertainty measurements. Steps 11-16 calculate measurements by two “for” loops. Finally, Step 17 outputs FH^δ (D|B), FH^δ (B) and FH^δ (B, D).

Algorithm 1 The calculation of FH^δ (D|B), FH^δ (B) and FH^δ (B, D)

Input: IVDS = (U, C ∪ D, V, f) with δ ∈ [0, 1] , B ⊆ C.

Output: FH^δ (D|B), FH^δ (B) and FH^δ (B, D).

1: for x_i ∈ U

2: for x_t ∈ U

3: for c_k ∈ B

4: Compute ${HSD}_{c_{k}}^{δ} (x_{i}, x_{t})$ by Equation (9).

5: end for

6: By Equation (10), compute ${HSD}_{B}^{δ} (x_{i}, x_{t})$ .

7: end for

8: By Definition 2, similarity class ${SC}_{B}^{δ} (x_{i})$ is obtained.

9: end for

10: Let FH^δ (D|B) = FH^δ (B) = FH^δ (B, D) =0.

11: for x_i ∈ Udo

12: By Equation (14), information entropy is FH^δ (B)←

${FH}^{δ} (B) - \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) |}{| U |} .$

13: for D_j ∈ U/D

14: Condition entropy and joint entropy are obtained by Equations (13) (14), FH^δ (D|B) ← FH^δ (D|B)-

$\frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| {SC}_{B}^{δ} (x_{i}) | | D_{j} |}$ ,

FH^δ (B, D) ← FH^δ (B, D)-

$\frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |}{| U |} \log_{2} \frac{| {SC}_{B}^{δ} (x_{i}) \cap D_{j} |^{2}}{| U | | D_{j} |}$ .

end for

return FH^δ (D|B), FH^δ (B) and FH^δ (B, D).

Furthermore, we will illustrate the proposed uncertainty measures and their properties by an example below.

Example 3. Continue Example 1 with relevant conditions and conclusions. The decision attribute D can deduce a partition U/D = {D₁, linebreakD₂, D₃}, where D₁ = {x₄, x₅, x₈}, D₂ = {x₂, x₇} , linebreakD₃ = {x₁, x₃, x₆}.

Focus on attribute subset B₁ = {c₁} and threshold δ₁ = 0.1 based on HSD, by Algorithm 1, we have $\begin{matrix} {FH}^{0.1} (D | B_{1}) = 8.3758, {FH}^{0.1} (B_{1}) = 2.0956, \\ {FH}^{0.1} (B_{1}, D) = 10.4714, \\ {FH}^{0.1} (D | B_{1}) + {FH}^{0.1} (B_{1}) = 8.3758 + 2.0956 \\ = 10.4714 = {FH}^{0.1} (B_{1}, D) . \end{matrix}$

In order to justify the non-monotonicity of FH^δ (D|B) in Theorem 2, we offer two counterexamples, $\begin{matrix} {FH}^{0.1} (D | B_{3}) = 3.3711 > {FH}^{0.2} (D | B_{3}) \\ = 3.1793 < {FH}^{0.3} (D | B_{3}) = 3.3162, \\ {FH}^{0.1} (D | B_{1}) = 8.3758 > {FH}^{0.1} (D | B_{3}) \\ = 3.3711 < {FH}^{0.1} (D | B_{4}) = 3.4624 . \end{matrix}$ (17)

5 Monotonic/non-monotonic attribute reductions in interval-valued decision systems

This section will utilize two CEs and four SDs to construct ARs and corresponding algorithms. Thus, ARs based on H^δ (D|B) in [15] will be reviewed first to extend new ARs based on improved HSD and FH^δ (D|B).

Definition 10. [15] Given IVDS = (U, C ∪ D, V, f) with 0 ≤ δ ≤ 1, R ⊆ C is a reduct of IVDS iff

H^δ (D|R) = H^δ (D|C).

∀a ∈ R, H^δ (D|R - {a}) ≠ H^δ (D|R).

Definition 11. Given IVDS = (U, C ∪ D, V, f) with 0 ≤ δ ≤ 1, R ⊆ C is a reduct of IVDS iff

FH^δ (D|R) = FH^δ (D|C).

∀a ∈ R, FH^δ (D|R - {a}) ≠ FH^δ (D|R).

In addition, the significance of attribute a ∈ C - R with respect to R is defined as

{Sig}_{FH} (a, R, D) = {FH}^{δ} (D | R) - {FH}^{δ} (D | R \cup {a}) .

(18)

According to Definitions 10 and 11, reduct R satisfies joint sufficient condition (1) and individual necessary condition (2). Regarding reduction policy, Dai et al. [15] use the minimum entropy H^δ (D|R ∪ {a}) (i.e., maximum significance Sig_H (a, R, D) = H^δ (D|R) - H^δ (D|R ∪ {a})) to add attribute a for heuristic search. The attribute subset R obtained by this policy contains few redundant attributes, so the steps of deleting redundant attributes can be omitted to reduce the complexity of the algorithm. Following the same strategy, our AR algorithms based on ♡^δ (D|B) (♡ ∈ {H, FH}) are designed in Algorithm 2. Step 1 calculates ♡^δ (D|C), then initializes R =∅ and ♡^δ (D|R) = -1. Steps 2-8 constitute a while loop to add attributes. In Step 5, the most important attribute is added by using maximum significance for heuristic search. Step 7 obtains incremental subset R and calculates ♡^δ (D|R). Finally, Step 9 outputs reduct R.

Algorithm 2 Attribute reductions based on ♡^δ (D|B) (♡ ∈ {H, FH}) in interval-valued decision system.

Input: IVDS = (U, C ∪ D, V, f) with δ ∈ [0, 1].

Output: An attribute subset R.

1: Compute the entropy ♡^δ (D|C) by Algorithm 1, initialize R =∅ and ♡^δ (D|R) = -1.

2: while ♡^δ (D|R) ≠ ♡ ^δ (D|C)do

3: R^★ = C - R.

4: for c_j ∈ R^★do

5: Compute Sig_♡ (c_j, R, D) by Equation (18), then sequentially select c_j that satisfies maximum.

6: end for

7: Let R = R ∪ {c_j}, then compute ♡^δ (D|R).

8: end while

9: return R.

By Algorithm 2, two CEs H^δ (D|C) , FH^δ (D|C) can be used to construct reduction algorithms by the same reduction strategy. In addition, 3 reduction algorithms are given in [15] based on 3 different SDs IUSD, SD, RBD and the same CE H^δ (D|C). In order to comprehensively compare with 3 algorithms in [15], 4 SDs IUSD, SD, RBD, HSD and 2 CEs H^δ (D|C) , FH^δ (D|C) are considered from two dimensions of knowledge granularity and information measurement. Thus, we establish a reduct framework involving 4×2 = 8 AR algorithms, i.e., SD - AR, IU - AR, RBD - AR, HSD - AR, FH - SD - AR, FH - IU

-AR, FH - RBD - AR, FH - HSD - AR, where the three existing algorithms SD-AR, IU-AR, RBD-AR become comparative algorithms here which respectively correspond to marks PD, IU, RBD in [15]. For example, H^δ (D|C) and FH^δ (D|C) (or simply FH) can inspire algorithms HSD-AR and FH-HSD-AR based on HSD. Next, the detailed process of representative ARs is revealed by an example.

Table 3

The detailed process of algorithm FH-HSD-AR on δ = 0.5

Addition	Incremental	Sufficient condition	Rest attributes	CEs/SIGs	Added
step	subset R	FH^0.5 (D\|R) = FH^0.5 (D\|C)	subset C - R	FH^0.5 (D\|R ∪ {c_j})/Sig (c_j, R, D) (c_j ∈ C - R)	attribute c_j
(1)	∅	-1 ≠1.44	{c₁, c₂, c₃, c₄, c₅}	[4.01, 5.44, 1.44, 8.84, 2.26]/	c ₃
				[-4.01, - 5.44, - 1.44, - 8.84, - 2.26]
Reduct R	{c₃}	1.44 = 1.44	{c₁, c₂, c₄, c₅}	-	-

Table 4

Details of the 8 datasets

No.	Dataset	Abbr	Object	Attribute	Class
(a)	Wine	Wine	178	13	3
(b)	Algerian Forest Fires	Algerian	244	10	2
(c)	Chemical Composition of Ceramic	Chemical	88	17	2
(d)	Vowel Context	Vowel	990	10	11
(e)	Facebook	Facebook	495	16	3
(f)	HCV	HCV	615	10	5
(g)	Breast Tissue	Breast	106	9	6
(h)	Leaf	Leaf	340	14	36

Fig. 2

Comparison of four SDs between x_i and x₁ on B₁.

Example 4. Continue Example 1 with relevant results, the detailed process of representative FH-HSD-AR on δ = 0.5 is recorded in Table 3. By Table 3, FH-HSD-AR obtains reduct subset {c₃}. Therefore, the proposed FH-HSD-AR can reduce attributes effectively.

6 Experimental verification of uncertainty measurements and attribute reductions

In this section, data experiments will show the advantages of our proposed methods. In simulation experiments, 8 classification datasets from UCI machine learning repository [30] are used, whose details are described in Table 4. In dataset (e) Facebook, five objects with missing values are deleted. In dataset (h) Leaf, the number attribute is also deleted because of independence with classification. Since these datasets are real types, real value r can be converted to interval I_r = [r - 2σ, r + 2σ], where σ is the standard deviation of equivalence class [x_i] _D on attribute c_j [13 , 32].

Regarding SDs, we mainly focus on four ones SD_{c
₁} (x₁, x_i), IUSD_{c
₁} (x₁, x_i), RBD_{c
₁} (x₁, x_i) and HSD_{c
₁} (x₁, x_i) (i = 1, 2, ⋯ , 50) of 4 representative datasets (a)-(d) for overall observation. Thus, a family of two-dimensional graphs are shown in Fig. 2 for comparison of four SDs, where the horizontal and vertical axes represent x_i and SDs. By comparison in Fig. 2, RBD is too big and much close to 1 in most cases, IUSD is always too small and very close to 0, while HSD always locates between SD and IUSD. Therefore, the fact validates Theorem 1, and the proposed HSD is effective.

Fig. 3

Comparison of two CEs on fixed δ = 0.5 and B_i-chain.

As for CEs, H^δ (D|B) and FH^δ (D|B) are calculated based on HSD. Thus, CEs based on fixed δ = 0.5 and B_i-chain (B₁ = {c₁}⊆ ⋯ ⊆ {c₁, ⋯ , c_r} = B_r = C) are depicted in Fig. 3. By Fig. 3, we can verify non-monotonicity of FH^δ (D|B) in Theorem 2 and FH^δ (D|B) ≥ H^δ (D|B) in item (3) of Proposition 2.

Table 5

The classification accuracies and reduction lengths of 8 ARs

No.	Dataset	SD-AR	IU-AR	RBD-AR	HSD-AR	FH-SD-AR	FH-IU-AR	FH-RBD-AR	FH-HSD-AR
(a)	Wine	0.9607 (4)	0.9775 (1)	0.9888 (2)	0.9775 (1)	0.9888 (6)	0.9270 (4)	0.9944 (13)	0.9944 (5)
(b)	Algerian	0.8848 (8)	0.9794 (4)	0.9794 (10)	0.9877 (6)	0.9136 (8)	0.9959 (4)	1.0000 (9)	0.9918 (7)
(c)	Chemical	1.0000 (1)	1.0000 (1)	1.0000 (1)	1.0000 (1)	1.0000 (5)	1.0000 (4)	1.0000 (14)	1.0000 (5)
(d)	Vowel	0.9788 (8)	0.9596 (2)	0.9899 (10)	0.9505 (3)	0.9798 (9)	0.9909 (9)	0.9899 (10)	0.9929 (9)
(e)	Facebook	0.5152 (13)	0.9838 (9)	0.9556 (12)	0.9879 (13)	0.5192 (12)	0.9535 (7)	0.9333 (9)	0.9758 (11)
(f)	HCV	0.9202 (9)	0.9932 (6)	0.9915 (9)	0.9949 (9)	0.9270 (8)	0.9932 (6)	0.9898 (6)	0.9898 (8)
(g)	Breast	0.6415 (5)	0.9340 (1)	0.8868 (5)	0.9528 (1)	0.6321 (6)	0.8962 (3)	0.8868 (5)	0.9340 (6)
(h)	Leaf	0.6353 (6)	0.8059 (2)	0.8971 (13)	0.8265 (2)	0.6882 (6)	0.8647 (4)	0.8971 (12)	0.9441 (5)
(a-h)	Average	0.8171 (6.8)	0.9542 (3.3)	0.9611 (7.8)	0.9597 (4.5)	0.8311 (7.5)	0.9527 (5.1)	0.9614 (9.7)	0.9779 (7.0)
	Win	1	2	1	3	1	1	3	5

Next, 8 algorithms in Equation (19) are mainly implemented for comprehensive comparison, $\begin{matrix} SD - AR, IU - AR, RBD - AR, HSD - AR, \\ FH - SD - AR, FH - IU - AR, FH - RBD - AR, FH - HSD - AR . \end{matrix}$ (19) The reduction lengths and classification accuracies of 8 ARs are used as two evaluation indicators for achievement evaluation, where classification accuracies are obtained by the commonly used K-Nearest Neighbour (KNN) classifier (K=3). Since the classical KNN method deals with real value data, the distances between samples x_i and x_j on B ⊆ C are redefined as ${Dis}_{B} (x_{i}, x_{j}) = \sqrt{\sum_{b \in B} [1 - {HSD}_{b} (x_{i}, x_{j})]^{2}},$ where HSD can be replaced by 3 other SDs, δ = 0.8.

Table 5 records the specific reduct lengths and classification accuracies of 8 ARs on eight datasets, where the bold represent the maximum accuracies while the numbers in parentheses represent reduct lengths.

As shown in Table 5, all 8 algorithms can effectively remove redundant attributes. Specifically, 8 datasets have average length 12.4 of whole attribute sets, IU-AR obtains minimum average length 3.3, while FH-RBD-AR reaches maximum average length 9.7. Thus, 8 ARs are reasonable.

By Table 5, our method FH-HSD-AR can achieve optimal accuracy in most cases. In the last 2 lines of Table 5, average accuracies and maximum frequencies on all the datasets are used for final evaluation, so the final ranking is $\begin{matrix} SD - AR (0.8171) ≺ FH - SD - AR (0.8311) \\ ≺ FH - IU - AR (0.9527) ≺ IU - AR (0.9542) \\ ≺ HSD - AR (0.9597) ≺ RBD - AR (0.9611) \\ ≺ FH - RBD - AR (0.9614) ≺ FH - HSD - AR (0.9779), \end{matrix}$ (20) where FH-RBD-AR and FH-HSD-AR obtain suboptimal and optimal mean accuracies 0.9614 and 0.9779. In addition, ranking results can also be verified by maximum frequencies in the last line of Table 5.

In general, our methods (especially suboptimal FH-RBD-AR and optimal FH-HSD-AR) outperform the existing methods, in term of classification performance. These ranking results in Equation (20) can adequately reflect the advantages of the proposed algorithms from the perspective of robustness and optimization.

The final results in Equation (20) are rearranged into Table 6, where the symbols ← and ↑ respectively indicate the horizontal and longitudinal maximums, while the italic and bold styles respectively indicate the suboptimal and optimal accuracies. Table 6 provides accuracies comparisons to show 2-dimensional (i.e., CEs-longitudinal and SDs-horizontal) improvements.

Firstly, 4 SDs can be compared on all CEs. As for H, HSD-AR (0.9597) and RBD-AR (0.9611) are superior to IU-AR (0.9542) and SD-AR (0.8171). Regarding FH, FH-HSD-AR (0.9779) is significantly better than FH-RBD-AR (0.9614), FH-IU-AR (0.9527) and FH-SD-AR (0.8311). The ranking results can deduce the sorting $SD ≺ IUSD ≺ RBD ≺ HSD .$ Thus, HSD actually outperforms 3 existing SDs.

Secondly, two CEs can be contrasted on all SDs. As for IUSD, IU-AR (0.9542) and FH-IU-AR (0.9527) have almost equal accuracies. Regarding HSD, FH-HSD-AR (0.9779) with more accuracy outperforms HSD-AR (0.9597). The same conclusions are hold for SD and RBD. The ranking results can deduce the sorting of CEs as follows, $H ≺ FH .$ Thus, proposed FH is superior to the existing H.

Finally, the improvements of SDs and CEs are further compared by the increment of accuracies. As for SDs improvement, FH-RBD-AR (0.9614)⟶ FH-HSD-AR (0.9779) gets the increments 0.9779 - 0.9614 = 0.0165. As for CEs improvement, HSD-AR (0.9597)⟶ FH-HSD-AR (0.9779) reaches the increments 0.9779 - 0.9597 = 0.0182. Therefore, two dimensional improvements are comprehensively verified, and CEs improvement is slightly more than SDs improvement.

Table 6

Average accuracies of 8 ARs on 4 SDs and 2 CEs

[10mm]CEsSDs	SD	IUSD	RBD	HSD
H	0.8171	0.9542↑	$\underset{\leftarrow - - - - -}{0.9611}$	0.9597
FH	0.8311↑	0.9527	0.9614↑	$\underset{\leftarrow - - - - -}{0.9779} ↑$

Through the above results and analyses, our methods (especially FH-RBD-AR and FH-HSD-AR) bring about obvious improvements over the existing methods, in terms of SDs, CEs and ARs.

7 Conclusions

In this paper, an ARs framework of 4×2 = 8 algorithms is established by using improved similarity degree HSD and improved condition entropy FH, from knowledge granulation and information representation of IVDSs. The improvement of HSD is based on the harmonic average of inclusion degrees, and it can obtain intermediate value. The improvement of FH benefits from the fusion of comprehensive interval coverage and credibility which are richer and more robust than separate credibility. In addition, information entropy and joint entropy are correspondingly constructed to obtain an information system. Furthermore, 5 improved ARs (especially FH-HSD-AR) can achieve better classification performance than 3 contrastive algorithms in [15]. In general, HSD and FH jointly promote uncertainty measurements from two dimensions, thus their combined ARs provide strong robustness and applicability. Therefore, our studies can promote the uncertainty measurement and ARs of IVDSs.

The following three aspects will be worthy of further study. (1) SDs based on inclusion degrees are reasonable for strong application. (2) Our measurements and ARs can be applied to other systems, such as interval-set systems and neighborhood systems. (3) The study of incremental uncertainty measurements and ARs are valuable for dynamic interval-valued systems.

References

Che

X.Y.

, Chen

D.G.

and Mi

J.S.

, A novel approach for learning label correlation with application to feature selection of multilabel data, Information Sciences 512 (2020), 795–812.

Xie

, Zhang

X.Y.

, Zhang

S.Y.

Rough set theory and attribute reduction in interval-set information system, Journal of Intelligent & Fuzzy Systems (2022) (https://doi.org/10.3233/JIFS-210662).

Elaziz

M.A.

, Abu-Donia

H.M.

, Hosny

R.A.

, Hazae

S.L.

and Ibrahim

R.A.

, Improved evolutionary-based feature selection technique using extension of knowledge based on the rough approximations, Information Sciences 594 (2022), 76–94.

, Gao

L.B.

, Li

Y.H.

, Zhang

and Gao

W.F.

, Feature-specific mutual information variation for multi-label feature selection, Information Sciences 593 (2022), 449–471.

Dhal

and Azad

, A comprehensive survey on feature selection in the various fields of machine learning, Applied Intelligence 52 (2021), 4543–4581.

, Chen

, Li

, Zhong

M.Y.

, Wang

J.J.

, Qian

, Ding

, Yao

J.F.

and Guo

Y.K.

, FTAP: Feature transferring autonomous machine learning pipeline, Information Sciences 593 (2022), 385–397.

Sun

, Wang

L.Y.

, Ding

W.P.

, Qian

Y.H.

and Xu

J.C.

, Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets, IEEE Transactions on Fuzzy Systems 29 (2021), 19–33.

Chen

B.W.

, Zhang

X.Y.

, Yang

J.L.

Feature selections based on three improved condition entropies and one new similarity degree in interval-valued decision systems, Engineering Applications of Artificial Intelligence (2023), (https://doi.org/10.1016/j.engappai.2023.107165).

Pawlak

, Rough set, International Journal of Computer and Information Sciences 11(5) (1982), 341–356.

10.

Nakahara

, Sasaki

and Gen

, On the linear programming problems with interval coefficients, Computers-Industrial Engineering 23 (1992), 301–304.

11.

Dai

J.H.

, Wei

B.J.

, Zhang

X.H.

and Zhang

Q.L.

, Uncertainty measurement for incomplete interval-valued information systems based on α-weak similarity, Knowledge-Based Systems 136 (2017), 159–171.

12.

X.A.

, Measures associated with granularity and rough approximations in interval-valued information tables based on kernel similarity relations, Information Sciences 538 (2020), 337–357.

13.

Liu

X.F.

, Dai

J.H.

, Chen

J.L.

and Zhang

C.C.

, Unsupervised attribute reduction based on α-approximate equal relation in interval-valued information systems, International Journal of Machine Learning and Cybernetics 11(5) (2020), 2021–2038.

14.

Dai

J.H.

and Tian

H.W.

, Fuzzy rough set model for set-valued data, Fuzzy Sets Systems 229 (2013), 54–68.

15.

Dai

J.H.

, Hu

, Zheng

G.J.

, Hu

Q.H.

, Han

H.F.

and Shi

, Attribute reduction in interval-valued information systems basedon information entropies,919– 928, Frontiers of Information Technology-Electronic Engineering 17(9) (2016).

16.

Zhang

, Mei

C.L.

, Chen

D.G.

and Li

J.H.

, Multi-confidence rule acquisition and confidence-preserved attribute reduction in interval-valued decision systems, International Journal of Approximate Reasoning 55(8) (2014), 1787–1804.

17.

Dai

J.H.

, Wang

W.T.

, Xu

and Tian

H.W.

, Uncertainty measurement for interval-valued decision systems based on extended conditional entropy, Knowledge-Based Systems 27(3) (2012), 443–450.

18.

Dai

J.H.

, Wang

and Mi

J.S.

, Uncertainty measurement for interval-valued information systems, Information Sciences 251(4) (2013), 63–78.

19.

Xie

N.X.

, Liu

, Li

Z.W.

and Zhang

G.Q.

, New measures of uncertainty for an interval-valued information system, Information Sciences 470 (2019), 156–174.

20.

Zhang

Y.M.

, Jia

X.Y.

and Tang

Z.M.

, Information-theoretic measures of uncertainty for interval-set decision tables, Information Sciences 577 (2021), 81–104.

21.

Liu

X.F.

, Dai

J.H.

, Chen

J.L.

and Zhang

C.C.

, A fuzzy α-similarity relation-based attribute reduction approach in incomplete interval-valued information systems, Applied Soft Computing 109(5) (2021), 107593.

22.

Wang

G.Y.

Rough set theory and knowledge acquisition, Journal of Tsinghua University (2001).

23.

Tsumoto

Accuracy and coverage in rough set rule induction, International Conference on Rough Sets and Current Trends in Computing (2002), 373–380.

24.

Sun

, Zhang

X.Y.

, Qian

Y.H.

, Xu

J.C.

and Zhang

S.G.

, Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification, Information Sciences 502 (2019), 18–41.

25.

J.C.

, Yuan

and Ma

Y.Y.

, Feature selection using selfinformation and entropy-based uncertainty measure for fuzzy neighborhood rough set, Complex & Intelligent Systems 8 (2022), 287–305.

26.

Gou

H.Y.

and Zhang

X.Y.

, Feature selection based on doublehierarchical and multiplication-optimal fusion measurement in fuzzy neighborhood rough sets, Information Sciences 618 (2022), 437–467.

27.

Bustince

, Indicator of inclusion grade for interval-valued fuzzy sets. Application to approximate reasoning based on interval-valued fuzzy sets, International Journal of Approximate Reasoning 23(3) (2000), 137–209.

28.

H.X.

, Zhou

X.Z.

, Zhao

J.B.

and Liu

, Non-monotonic attribute reduction in decision-theoretic rough sets, Fundamenta Informaticae 126(4) (2013), 415–432.

29.

Zhang

X.Y.

and Miao

D.Q.

, Double-quantitative fusion of accuracy and importance: Systematic measure mining, benign integration construction, hierarchical attribute reduction, Knowledge-Based Systems 91 (2016), 219–240.

30.

Dua

, Graff

UCI machine learning repository (2019) (http://archive.ics.uci.edu/ml).

31.

Leung

, Fischer

M.M.

, Wu

W.Z.

and Mi

J.S.

, A rough set approach for the discovery of classifcation rules in interval-valued information systems, International Journal of Approximate Reasoning 47(2) (2008), 233–246.

32.

Zhang

Y.Y.

, Li

T.R.

, Luo

, Zhang

J.B.

and Chen

H.M.

, Incremental updating of rough approximations in interval-valued information systems under attribute generalization, Information Sciences 373 (2016), 461–475.