Abstract
In practical applications of machine learning, only part of data is labeled because the cost of assessing class label is relatively high. Measure of uncertainty is abbreviated as MU. This paper explores MU for partially labeled real-valued data via a discernibility relation. First, a decision information system with partially labeled real-valued data (p-RVDIS) is separated into two decision information systems: one is the decision information system with labeled real-valued data (l-RVDIS) and the other is the decision information system with unlabeled real-valued data (u-RVDIS). Then, based on a discernibility relation, dependence function, conditional information entropy and conditional information amount, four degrees of importance on an attribute subset in a p-RVDIS are defined. They are calculated by taking the weighted sum of l-RVDIS and u-RVDIS based on the missing rate, which can be considered as four MUs for a p-RVDIS. Combining l-RVDIS and u-RVDIS provides a more accurate assessment of the importance and classification ability of attribute subsets in a p-RVDIS. This is precisely the novelty of this paper. Finally, experimental analysis on several datasets verify the effectiveness of these MUs. These findings will contribute to the comprehension of the essence of the uncertainty in a p-RVDIS.
Introduction
Research background
There are a lot of uncertainties in data. The uncertainty of data is typically caused by the limited resolution and incomplete depiction of the data. This uncertainty is an inherent aspect of the real world and the capturing of it is becoming increasingly widespread. Addressing uncertainty is crucial in artificial intelligence. To a certain extent, the level of artificial intelligence depends on the extent to which uncertain problems are solved. Therefore, measure of uncertainty (MU) emerges as a pivotal subject of study in numerous disciplines, such as environmental conflict analysis [13], face identification [8] and medical decision making [22]. Furthermore, Zhan et al. [38, 43] studied three-way behavioral decision making with hesitant fuzzy information systems, proposed a three-way decision methodology with regret theory via triangular fuzzy numbers in incomplete multi-scale decision information systems and gave a novel group decision-making approach in multi-scale environments. Wang et al. [35] discussed regret theory-based three-way decision model in hesitant fuzzy environments and its application to medical decision. Zhu et al. [39] proposed a probabilistic linguistic three-way decision method with regret theory via fuzzy c-means clustering algorithm. Atef et al. [2] researched fuzzy topological structures via fuzzy graphs. El-Bably et al. [16, 17] considered medical diagnosis for the problem of Chikungunya disease using soft rough sets and came up with new topological approaches to generalized soft rough approximations with medical applications. El-Gayar et al. [18] investigated economic decision-making using rough topological structures. Abu-Gdairi et al. [1] studied topological visualization and graph analysis of rough sets via neighborhoods.
Rough set theory is a tool to handle uncertainty, and it has been applied to pattern recognition, data mining, image processing, as well as medical diagnosis [30]. Pawlak presented the concept of an information system (IS) based on rough set theory. Many applications of rough set theory, such as uncertain reasoning, feature selection, rule extraction and classification, are implemented in an IS [5]. MU has always been an important issue of rough set theory. It plays a significant role in attribute reduction and rule acquisition. Classification accuracy, rough membership, attribute dependence and attribute importance are the basic MUs in rough set theory.
Information entropy, proposed by Shannon [31], is a significant tool for estimating uncertainty as well. Some researchers utilized information entropy to measure the uncertainty of an IS or rough sets. For instance, D
The comparative of this paper with the research results about some above literatures is shown in Table 1.
The comparison of this paper with some recent research results
The comparison of this paper with some recent research results
There are many real-valued data in many real world applications. It requires a considerable amount of human resources to label these data. In practical scenarios, these data typically consist of only a limited number of labeled data. Considering the expense of determining class information, a small portion of real-valued data can be labeled with class information, while the majority remains unlabeled, referred to as unlabeled real-valued data. Due to the limited availability of labeled data, effectively utilizing unlabeled data for attribute reduction has become a prominent issue in the realm of big data. Bao et al. [3] studied partial label dimensionality reduction via confidence-based dependence maximization. Han et al. [24] proposed a semi-supervised attribute reduction algorithm. Campagner et al. [6, 7] introduced rough-set based genetic algorithms for weakly supervised feature selection and presented rough set-based feature selection for weakly labeled data. Dai et al. [12] introduced the concept of distinguish pair and studied partially labeled categorical data by means of distinguish pair. They also provided an importance for each attribute subset based on distinguished pairs and presented an attribute reduction method utilizing this importance. However, the provided importance of partially labeled categorical data did not consider the missing rate of labels, and they only considered a single importance.
This paper investigates MU for partially labeled real-valued data based on a discernibility relation. The major contributions are summarized as below.
(1) In view of labeled and unlabeled data of a decision information system for partially labeled real-valued data (p-RVDIS), the missing rate of labels in a p-RVDIS is defined. A p-RVDIS is induced into two decision information systems: one is the l-RVDIS, and the other one is u-RVDIS. The u-RVDIS is counted as an IS without decision attribute.
(2) Based on a discernibility relation, distinguishable relation, dependency function, conditional information entropy and conditional information amount, four importance of each attribute subset are proposed. They are the weighted sum of the importance of the corresponding subsystem of a l-RVDIS and the corresponding subsystem of a u-RVDIS determined by the missing rate of labels, which can be regarded as four MUs of the corresponding subsystem of a p-RVDIS.
(3) From the perspective of statistical analysis, numerical analysis, discrete analysis, correlation analysis, Friedman test and Nemenyi test are carried out to verify four MUs’ advantages and disadvantages.
The remaining portion of this paper is structured as follows. In Section 2, a p-RVDIS is defined. In Section 3, MU in a p-RVDIS is investigated. In Section 4, numerical analysis and statistical analysis are conducted. In Section 5, a summary of this paper is presented.
Preliminaries
W = {w1, ⋯ , w
n
}, 2
W
and |Z| represent a finite set, the power set of W and the cardinality of Z ∈ 2
W
, respectively. Put
Suppose that (W, A) is an information system (IS) [30]. For B ⊆ A, put
Obviously,
We refer to (W, A, d) as a decision information system, if (W, A) be an IS and d is a decision attribute.
dis d (B) = {(w, w′) ∈ W × W : ∃ a ∈ B, a (w) ≠ a (w′) and d (w) ≠ d (w′)} . Then dis d (B) is known as the discernibility relation of B on W with respect to d.
Let (W, A, d) be a decision information system. If ∀ a ∈ A and w ∈ A, a (w) is a real number, then (W, A, d) is referred to as a real-valued decision information system (RVDIS).
(1) (W, A, d) is known as a decision information system for labeled real-valued data (l-RVDIS), if ∀ w ∈ W, d (w)≠ *.
(2) (W, A, d) is known as a decision information system for partially labeled real-valued data (p-RVDIS), if
(3) (W, A, d) is known as a decision information system for unlabeled real-valued data (u-RVDIS), if ∀ w ∈ W, d (w) =*.
Because each object lacks label in a u-RVDIS (W, A, d), we think that (W, A, d) can be seen as (W, A).
(W, A, d) can be interpreted as the outcome of information fusion of (W l , A, d) and (W u , A, d).
A p-RVDIS (W, A, d)
A p-RVDIS (W, A, d)
It is obvious that
(2)
(3)
(4)
(5)
(6)
In this section, we explore measure of uncertainty in a p-RVDIS via a discernibility relation.
The type 1 importance of a subsystem in a p-RVDIS
(1)
(2)
(3) If B ⊂ C ⊆ A, then ∀ θ
(4) If 0 ≤ θ1 < θ2 ≤ 1, then ∀ B
Since
Thus
(2) This holds by (1).
(3) Suppose B ⊆ C ⊆ A. Then ∀ w ∈ W,
This suggests that
By (1),
(4) Suppose 0 ≤ θ1 < θ2 ≤ 1. Then ∀ w ∈ W,
This suggests that
By (1),
(1)
(2)
(3) B ⊆ C ⊆ A implies
(4)
(3) B ⊆ C ⊆ A implies
Then
Thus
Hence
(4) “ ⇐ " is clear. Next, we provide a proof for the implication “ ⇒ ".
Suppose
This suggests that
Note that
The type 2 importance of a subsystem in a p-RVDIS
(1)
(2)
(3) B ⊆ C ⊆ A implies
(4)
(3) B ⊆ C ⊆ A implies
Then
Thus
Hence
(4) “ ⇐ " is clear. Next, we provide a proof for the implication “ ⇒ ".
Suppose
This suggests that
Note that
The type 3 importance of a subsystem in a p-RVDIS
Then
By Definition 3.4,
If
If
Then
Obviously, ∀ i,
Then
Put
Since
Thus
Denote
Then
By Definition 3.3,
∀ i, j,
Then
By Proposition 3.3,
Hence
(1)
(2)
(3) B ⊆ C ⊆ A implies
(4)
(3) B ⊆ C ⊆ A implies
Then
Thus
Hence
(4) “ ⇐ " is clear. Next, we provide a proof for the implication “ ⇒ ".
Suppose
This suggests that
Note that
The type 4 importance of a subsystem in a p-RVDIS
Obviously,
Thus
If
If
This suggests that ∀ i, j,
Thus
Obviously,
(1)
(2)
(3) B ⊆ C ⊆ A implies
(4)
(3) B ⊆ C ⊆ A implies
Then
Thus
Hence
(4) “ ⇐ " is clear. Next, we provide a proof for the implication “ ⇒ ".
Suppose
This suggests that
Note that
Experimental analysis
This section designs experiments and performs effectiveness analysis on the proposed measures.
Datasets and experimental components
Eight datasets from UCI are selected (see Table 3). They are all real number type. Actually, these datasets consist of labeled real-valued data. However, our study focuses on partially labeled real-valued data. In our experiments, we randomly select and remove certain labeled values from the original dataset to create partially labeled datasets. The missing values are randomly distributed among the decision attributes with λ=20%. Here we take θ=0.4.
Eight datasets from UCI
Eight datasets from UCI
Regarding the dataset Ir, it is recorded as
Experimental results
The experimental results are shown in Figure 1.

Values of MU on eight datasets.
From Figure 1, the following conclusions are obtained:
Standard deviation is primarily employed for gauging the extent of dispersion in numerical data. A larger standard deviation signifies higher data dispersion, whereas a smaller value indicates lower data dispersion.
Suppose U = {u1, ⋯ , u
n
} is a dataset. The arithmetic average value, standard deviation and standard deviation coefficient of U are denoted as σ (U),
Continuing the aforementioned experiment, the coefficient of variation CV-values of four measurement sets were compared. The outcomes are illustrated in Figure 2.

Values of MU on eight datasets.
From the Figure 2, we are able to observe that the CV-values of
Correlation analysis is a statistical analysis and Pearson correlation coefficient is a linear correlation measure that quantifies the degree and direction of the linear relationship between two datasets. Assuming U = {u1, ⋯ , u
n
} and V = {v1, ⋯ , v
n
} are two datasets. Pearson correlation coefficient between U and V, denoted as r (U, V), is defined as
Obviously,
The correlation between U and V can be derived using Table 4.
The corresponding correlation between U and V
The corresponding correlation between U and V
In continuation of the previous experiment, r-values between any two of four measurement sets are compared for each of the eight datasets. The outcomes are presented in Tables 5-12.
By referring to Tables 5-12, one have determined the correlation levels between four measurement metrics across the eight datasets.The evidence from Tables 13-20 indicates that the correlation levels across the 8 datasets are consistent. This confirms the stability of four newly proposed measurements.
To obtain the more comprehensive evaluation of the performance of the proposed measures, we conduct Friedman and Nemenyi test in this part.
The Friedman test is a statistical test based on ranking algorithms. The Friedman statistic is defined by the equation:
If F
F
surpasses the critical value of F
α (k - 1, (k - 1) (N - 1)), it implies rejecting the null hypothesis in the Friedman test. Afterwards, the Nemenyi test with critical distance CD
α is able to be employed to further investigate which algorithm exhibits superior statistical performance, it is defined as
In the context provided, we consider these four MUs as separate algorithms, and proceed to assess their statistical significance using both the Friedman and Nemenyi test.
r-values of eight pairs of four measurement metrics on Ir
r-values of eight pairs of four measurement metrics on Ir
r-values of sixteen pairs of four measurement metrics on Ec
r-values of sixteen pairs of four measurement metrics on Pa
r-values of sixteen pairs of four measurement metrics on Se
r-values of sixteen pairs of four measurement metrics on So
r-values of sixteen pairs of four measurement metrics on Wd
r-values of sixteen pairs of four measurement metrics on Wi
r-values of sixteen pairs of four measurement metrics on Br
The correlation between two measurement metrics on Ir
The correlation between two measurement metrics on Ec
The correlation between two measurement metrics on Pa
The correlation between two measurement metrics on Se
The correlation between two measurement metrics on So
The correlation between two measurement metrics on Wd
The correlation between two measurement metrics on Wi
The correlation between two measurement metrics on Br
(1)The rankings of CV-values is presented for four measurement metrics across eight datasets in Table 21.
The type i importance (i = 1,2,3,4)
(2) The Friedman test is take to examine whether there are significant differences in the performances of four measures. Considering four measures and eight datasets, the F F follows a distribution with 3 and 21 degrees of freedom. Notably, the critical value of the F0.05 (3, 21) is 3.07. With F F being equal to 73.00, it is evident that this value is considerably larger than 3.07. Consequently, at a significance level of α = 0.05, there is substantial evidence to reject the null hypothesis. This implies that the performances of four measures exhibit statistical significance.
(3) To further illustrate the significant difference among four measures, Nemenyi test is employed. Considering a significance level of α = 0.05, we can calculate the critical values as q
α = 2.5690 and
Based on the observation results in Figure 3, the following outcomes are derived:
a) The performance of

Nemenyi test
b) No significant difference is found between the performance of
In this paper, a p-RVDIS has been defined. It has been divided into two DISs: l-RVDIS and u-RVDIS. Based on these two DISs, four degrees of importance on an attribute subset in a p-RVDIS have been presented. They are the weighted sum of l-RVDIS and u-RVDIS determined by the missing rate and may be regarded as MUs for a p-RVDIS. To evaluate the performance of the presented MUs, numerical experiments and statistical tests on eight datasets have been carried out. These findings will be significant in comprehending the core nature of uncertainty in a p-RVDIS. The limitation of the study is that the experimental sample is small and parametric experiment is not conducted. In the future work, we will apply the proposed measures of uncertainty to attribute reduction in a p-RVDIS and study partially labeled gene data.
Footnotes
Acknowledgment
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by Natural Science Research Project of Colleges and Universities in Anhui Province (2023AH040386).
