Abstract
The uncertainty of information plays an important role in practical applications. Uncertainty measurement (UM) can help us in disclosing the substantive characteristics of information. Probabilistic set-valued data is an important class of data in machine learning. UM for probabilistic set-valued data is worth studying. This paper measures the uncertainty of a probability set-valued information system (PSVIS) by means of its information structures based on Gaussian kernel method. According to Bhattacharyya distance, the distance between objects in each subsystem of a PSVIS is first built. Then, the fuzzy T cos -equivalence relations in a PSVIS by using Gaussian kernel method are obtained. Next, information structures in a PSVIS are defined. Moreover, dependence between information structures is investigated by using the inclusion degree. As an application for the information structures, UM in a PSVIS is investigated. Finally, to evaluate the performance of the investigated measures, effectiveness analysis is performed from dispersion analysis, correlation analysis, and analysis of variance and post-hoc test.
Keywords
Introduction
Zadeh [1, 2] proposed the idea of granular computing (GrC). GrC takes an information granule as the basic unit of calculation. The purpose of GrC is to explore an approximation scheme and to build an effective computing model for dealing with large-scale complex data, which allows us to view a phenomenon with different levels of granularity. Then, Lin [3] and Yao [4] explained the importance of GrC. Since then, GrC has played an increasingly important role in soft computing, knowledge discovery, data mining.
Rough set theory (RST) is a basic research method on GrC. Its main idea is to utilize the explicit knowledge base to describe the imprecise or uncertain knowledge. RST can deal with imprecision, fuzziness and uncertainty, and does not need any prior information beyond the data set that the problem needs to be processed. Now, RST has been successfully applied to machine learning, knowledge discovery, intelligent systems, expert systems, inductive reasoning, decision analysis, pattern recognition, mereology, signal analysis [5].
Information system (IS) based on RST was presented by Pawlak [6]. Most applications of RST, such as uncertainty modeling [7], reasoning with uncertainty [8], rule extraction [9], classification and feature selection [10] are related to an IS.
Granular structure is a mathematical structure consist of a family of information granules from a data set. It is one of the main research contents of GrC. Actually, information structures in an IS are granular structures in the sense of GrC. Thus, the study on information structures is a research topic about ISs. Zhang et al. [11] introduced information structures in a fully fuzzy IS. Li et al. [12] looked into information structures in a covering IS.
Uncertainty, such as vagueness, incompleteness, inconsistency, fuzziness and randomness, almost exists everywhere in the real world, and plays an important role in practical applications. Thus, how to measure uncertainty becomes more and more popular. Uncertainty measurement (UM) can supply novel viewpoints for analyzing data and help us disclose the substantive characteristics of data sets. UM has been the research goal in various fields, such as pattern recognition [13], image processing [14], medical diagnosis [15, 16] and data mining [17]. Pawlak et al. [6] proposed two numerical measures, accuracy and roughness, such that the uncertainty of a rough set can be evaluated. This is the initial attempt to measure the uncertainty of a rough set.
Information entropy, proposed by Shannon [18], is an effective tool to measure the uncertainty of information. The extension of entropy and its variants can be applied to an IS or RST. Many scholars have done a lot of research work on this aspect. For example, Yao [8] studied several types of information-theoretical measures for attribute importance in RST; Bianucci et al. [19] explored entropy and co-entropy for UM of coverings; Beaubouef et al. [20] measured the uncertainty of rough sets; Liang et al. [21] researched the information entropy in an IS; Li et al. [22] discussed UM in a fuzzy relation IS; Xie et al. [23] proposed the uncertainty of an interval-valued IS; Tan et al. [24] considered information entropy of intuitionistic fuzzy information; Zhang et al. [25] presented new UM for categorical data based on fuzzy information structures; Dai et al. [26] gave UM for an incomplete decision IS; Huang et al. [27] presented discernibility measures for a fuzzy β-covering; Li et al. [28] investigated UM for fuzzy probabilistic approximation spaces and discussed the relationships between fuzzy probabilistic approximation spaces.
Information granularity based on GrC is also an effective tool to measure the uncertainty of information. Some scholars have achieved good results in this field. For example, Yao et al. [29] presented granularity measure from the angle of granulation; Qian et al. [30] gave the fuzzy information granularity in fuzzy relations; Liang et al. [21] put forward information granulation in complete and incomplete ISs; Tan et al. [23] studied information granularity of intuitionistic fuzzy information; Xu et al. [31] considered knowledge granulation in an ordered IS; Li et al. [32] proposed fuzzy information granularity in fully fuzzy relation ISs based on Gaussian kernel.
In practical situations, set-valued attributes may be more appropriate to describe the uncertain and missing information of some objects in an IS [33]. Orlowska et al. [34] researched a set-valued information system (SVIS) considering non-deterministic information. Yao [4] used the notion of a set-based IS. Dai et al. [35] studied entropy and granularity measures for an SVIS. Qian et al. [36] considered a set-valued ordered IS.
A SVIS is a significant IS. However, in some practical situations, a set is depicted by the probability distribution. This forms probability set-valued data. Due to the rich semantic explanations, randomness and flexibility, probability set-valued data has attracted attention of some scholars. Huang et al. [37] introduced an IS based on probability set-valued data, i.e., a probability set-valued information system (PSVIS).
However, Huang et al. [37] only gave dynamic variable precision rough set approach by means of Bhattacharyya distance. They did not consider information structures and UM in a PSVIS. This paper is devoted to research UM in a PSVIS by means of its information structures. In other words, this paper first investigates information structures in a PSVIS as well as an application in UM. This is one of our research motivations. On the other hand, considering that Gaussian kernel, as a significant technique in machine learning, can simplify classification tasks and make data linear, Li et al. [38] used the fuzzy T cos -equivalence relation based on Gaussian kernel to construct rough sets, but their purpose was to put forward a multi-granulation decision-theoretic rough set method for a fuzzy condition decision IS. This paper attempts to extract the fuzzy T cos -equivalence relation on the object set of a PSVIS by using Gaussian kernel, and to research UM in a PSVIS based on the information granules generated by this T cos -equivalence relation. This is another motivation of our research. The main advantage of the proposed information structures is that it is based on the fuzzy T cos -similarity. Meanwhile, compared with UM on general information structure, the proposed UM also takes fuzziness into account, so they can better reflect the essence of uncertainty and may be more suitable for a PSVIS.
There are three main contributions in this paper: (1) The fuzzy T cos -equivalence relation is extracted based on Gaussian kernel for a PSVIS. In view of this relation, fuzzy information granules are constructed, and fuzzy information structures based on these granules are established. (2) According to the established information structures, UM in a PSVIS is investigated. Besides, some important properties are given, and relationships among these measures are discussed. (3) Numerical experiments and statistical analysis of the proposed measures are conducted to verify their effectiveness.
The remaining part of this paper is organized as follows. In Section 2, we recall elementary notions related to fuzzy relations and PSVISs. In Section 3, we establish the fuzzy T cos -equivalence relations in a PSVIS by using Gaussian kernel method. In Section 4, we study information structures in a PSVIS. In Section 5, we investigate four tools for measuring the uncertainty of a PSVIS and analyze their effectiveness. In Section 6, we present several methodological comparisons. In Section 7, we summarize this paper.
Preliminaries
We first retrospect some fundamental notions related to fuzzy relations and PSVISs.
Throughout this paper, U represents a finite set and I signifies [0, 1]. Put
Fuzzy sets are extensions of ordinary sets. In this paper, F (U) indicates the set of all fuzzy sets in U. The cardinality of P ∈ F (U) can be calculated with
If L is a fuzzy set in U × U, then L will be a fuzzy relation on U. In this paper, F (U × U) typifies the set of all fuzzy relations on U.
Let L ∈ F (U × U). Then L may be showed by
If
Let L ∈ F (U × U). For each u ∈ U, we define a fuzzy set:
(1) Commutativity: π (m, n) = π (n, m) ;
(2) Associativity: π (π (m, n) , l) = π (m, π (n, l)) ;
(3) Monotonicity: m≤ l, n ≤ t ⇒ π (m, n) ≤ π (l, t) ;
(4) Boundary condition: π (m, 1) = m .
(1) Reflexivity: L (u, u) =1 ;
(2) Symmetry: L (u, v) = L (v, u) ;
(3) π-transitivity: π (L (u, v) , L (v, w)) ≤ L (u, w) .
Suppose that U and AT are finite object and attribute sets, respectively. If ∀ a ∈ AT determines an information function a : U → V a , where V a = {a (u) : u ∈ U}, then (U, AT) is referred to as an IS.
Put P (x
i
) = p
i
(i = 1, 2, ⋯ , n) . Then
(1) If n = m and ∀ i, x i = y i , p i = q i , then P and Q are said to be equal. Denote P = Q.
(2) If n = m and ∀ i, x i = y i , then P and Q are said to be approximately equal. Denote P ≃ Q.
Obviously, P = Q ⇒ P ≃ Q.
In statistics, Bhattacharyya distance is used to measure the similarity between two probability distributions. The definition of Bhattacharyya distance is given as follows.
More specifically, the above definition can be essentially described by the following definition.
Actually, Definition 2.11 is more widely acceptable than Definition 2.10.
A PSVIS
where {6} means
In this section, we establish the fuzzy T cos -equivalence relations in a PSVIS by using Gaussian kernel method.
The distance between two objects in a PSVIS
In this paper, denote
By Definition 3.5, we have
d (a1 (u2) , a1 (u4))
≈0.5170 ;
d (a2 (u2) , a2 (u4))
≈0.2989 ;
d (a3 (u2) , a3 (u4))
≈0.4517 ;
d (a5 (u2) , a5 (u4))
≈0.2935 .
Then
d A 5 (u2, u4)
≈0.8042 .
Gaussian kernel method is an important method in machine learning. In this subsection, we use Gaussian kernel to extract a fuzzy T cos -equivalence relation on the object set of a PSVIS.
Gaussian kernel
Given A ⊆ AT, an algorithm for generating the fuzzy T
cos
-equivalence relation
In this section, we investigate information structures in a PSVIS.
Some concepts of information structures in a PSVIS.
(1) S
λ
2
(B) is said to be dependent on S
λ
1
(A), if for each i,
(2) S
λ
2
(B) is said to be dependent partially on S
λ
1
(A), if there exists i,
(3) S
λ
2
(B) is said to be independent on S
λ
1
(A), if for each i,
Obviously,
S λ 1 (A) = S λ 2 (B) ⇔ S λ 1 (A) ⪯ S λ 2 (B) and S λ 2 (B) ⪯ S λ 1 (A) ,
S λ 1 (A) ⪯ S λ 2 (B) ⇒ S λ 1 (A) ⊑ S λ 2 (B) ,
S λ 1 (A) ≺ S λ 2 (B) ⇒ S λ 1 (A) ⊏ S λ 2 (B) .
In this part, we explore several properties of information structures in a PSVIS.
(1) If 0 < λ1 ≤ λ2 ≤ 1, then ∀ A ⊆ AT, S λ 1 (A) ⪯ S λ 2 (A);
(2) If A ⊆ B ⊆ AT, then ∀ λ ∈ (0, 1], S λ (B) ⪯ S λ (A).
Then
So
(2) By Definition 3.8,
Then
(1) 0 ≤ D (S λ (B)/S λ (A)) ≤1;
(2) S λ (A) ⪯ S λ (B) implies D (S λ (B)/S λ (A)) =1;
(3) S
λ
(A) ⊑ S
λ
(B) ⊑ S
λ
(L) implies
D (S
λ
(B)/S
λ
(A))
D (S
λ
(A2)/S
λ
(A5))
= 0. D (S λ (A5)/S λ (A2))
This example illustrates that
(1) S λ (A) ⪯ S λ (B) ⇔ D (S λ (B)/S λ (A)) =1 .
(2) S λ (A) ⋈ S λ (B) ⇔ D (S λ (B)/S λ (A)) =0 .
(3) S λ (A) ⊑ S λ (B) ⇔0 < D (S λ (B)/S λ (A)) ≤1 .
Hence S λ (A) ⪯ S λ (B).
(2) “⇒”. Since S
λ
(A) ⋈ S
λ
(B), ∀ l, we have
Then ∀ l,
Thus D (S λ (B)/S λ (A)) =0.
“⇐”. Since D (S
λ
(B)/S
λ
(A)) =0, ∀ l, we obtain
Then ∀ l,
Thus S λ (A) ⋈ S λ (B).
(3) This follows from (1) and (2). □
Measuring uncertainty of a PSVIS
As an application for information structures, some tools for measuring the uncertainty of a PSVIS are proposed in this part.
Granulation measurement for a PSVIS
Similar to Definition 5 in [39], λ-information granulation of a SVIS is given in the following definition.
If
If
□
(1) If S λ 1 (A) ⪯ S λ 2 (B), then G λ 1 (A) ≤ G λ 2 (B);
(2) If S λ 1 (A) ≺ S λ 2 (B), then G λ 1 (A) < G λ 2 (B).
(2) Since S λ 1 (A) ≺ S λ 2 (B), we have S λ 1 (A) ⪯ S λ 2 (B) and S λ 1 (A) ≠ S λ 2 (B).
Then, ∀ i,
So, ∀ i,
Hence G λ 1 (A) < G λ 2 (B).
□
This proposition illustrates that λ-information granulation increases when the available information becomes coarser, and it decreases when the available information becomes finer. Thus, λ-information granulation can be applied to measure uncertainty of a PSVIS.
(1) If 0 < λ1 ≤ λ2 ≤ 1, then ∀ A ⊆ AT, G λ 1 (A) ≤ G λ 2 (A).
(2) If A ⊆ B ⊆ AT, then ∀ λ ∈ (0, 1], G λ (B) ≤ G λ (A).
(1) If S λ 1 (A) ⪯ S λ 2 (B), then H λ 2 (B) ≤ H λ 1 (A);
(2) If S λ 1 (A) ≺ S λ 2 (B), then H λ 2 (B) < H λ 1 (A).
(2) Since S
λ
1
(A) ≺ S
λ
2
(B), similar to the proof of Proposition 5.3, we obtain that ∀ i,
Then ∀ i,
Hence H λ 2 (B) < H λ 1 (A). □
This theorem shows that λ-information entropy increases when λ-information structure becomes finer, and it decreases when λ-information structure becomes coarser.
(1) If 0 < λ1 ≤ λ2 ≤ 1, then ∀ A ⊆ AT, H λ 2 (A) ≤ H λ 1 (A);
(2) If A ⊆ B ⊆ AT, then ∀ λ ∈ (0, 1], H λ (A) ≤ H λ (B).
Rough entropy, introduced by Yao [8], is applied to measure granularity of a given partition. Similarly, λ-rough entropy of a given PSVIS is proposed in the following definition.
So ∀ i,
Then
By Definition 5.8,
If
If
(1) If S λ 1 (A) ⪯ S λ 2 (B), then (E r ) λ 1 (A) ≤ (E r ) λ 2 (B);
(2) If S λ 1 (A) ≺ S λ 2 (B), then (E r ) λ 1 (A) < (E r ) λ 2 (B).
(2) Since S
λ
1
(A) ≺ S
λ
2
(B), similar to the proof of Theorem 5.3 (2), we obtain that ∀ i,
Then ∀ i,
and ∃ j,
Hence (E r ) λ 1 (A) < (E r ) λ 2 (B). □
(1) If 0 < λ1 ≤ λ2 ≤ 1, then ∀ A ⊆ AT, (E r ) λ 1 (A) ≤ (E r ) λ 2 (A);
(2) If A ⊆ B ⊆ AT, then ∀ λ ∈ (0, 1], (E r ) λ (B) ≤ (E r ) λ (A).
From Theorem 5.10 and Proposition 5.11, we come to the conclusion that λ-rough entropy can be applied to measure the uncertainty of a PSVIS.
= log 2n .
□
By Theorem 5.12,
Thus 0 ≤ H λ (A) ≤ log 2 n. □
(1) If S λ 1 (A) ⪯ S λ 2 (B), then E λ 2 (B) ≤ E λ 1 (A);
(2) If S λ 1 (A) ≺ S λ 2 (B), then E λ 2 (B) < E λ 1 (A).
(2) Since S
λ
1
(A) ≺ S
λ
2
(B), similar to the proof of Proposition 5.3 (2), we obtain that ∀ i,
Hence E λ 2 (B) < E λ 1 (A). □
(1) If 0 < λ1 ≤ λ2 ≤ 1, then ∀ A ⊆ AT, E λ 2 (A) ≤ E λ 1 (A);
(2) If A ⊆ B ⊆ AT, then ∀ λ ∈ (0, 1], E λ (A) ≤ E λ (B).
From Theorem 5.15 and Proposition 5.16, we come to the conclusion that λ-information amount can be applied to measure uncertainty of a PSVIS.
= 1 . □
By Theorem 5.17, E λ (A) =1 - G λ (A).
Thus
The discrete variable A and the continuous variable λ are involved in each of G λ (A), H λ (A), Er λ (A) and E λ (A). Note that A = A i , i = 1, 2, ⋯ , 5, and λ can be discretized as λ2 = 0.1, 0.2, ⋯ , 0.9.
UM for the subsystem (U, A1)
UM for the subsystem (U, A2)
UM for the subsystem (U, A3)
UM for the subsystem (U, A4)
UM for the subsystem (U, A5)
Given A ⊆ AT, an algorithm for generating UM for the subsystem is given as follows (see Algorithm 2).
(1) If we only consider the monotonicity, then λ-information granulation and λ-rough entropy are both monotonically increasing with the growth of the value λ, which means the uncertainty of four subsystems increases as the value λ increases. Meanwhile, λ-information amount and λ-information entropy are both monotonically decreasing with the growth of the value λ, which means the uncertainty of four subsystem decreases as the value λ increases (see Figures 1,2,3,4 and 5).
(2) Consider λ-information granulation and λ-rough entropy. If we pick λ2 = 0.8, then G λ (A5) < G λ (A4) = G λ (A3) < G λ (A2) < G λ (A1) , (Er) λ (A5) < (Er) λ (A4) = (Er) λ (A3) < (Er) λ (A2) < (Er) λ (A1) .That shows the larger the subsystem, the smaller the measured value. We have E λ (A1) < E λ (A2) < E λ (A3) = E λ (A4) < E λ (A5) , H λ (A1) < H λ (A2) < H λ (A3) = H λ (A4) < H λ (A5) . That displays the measured value of the subsystem is larger than the smaller one (see Fig. 6).

Uncertainty measures of (U, A1) with different λ.

Uncertainty measures of (U, A2) with different λ.

Uncertainty measures of (U, A3) with different λ.

Uncertainty measures of (U, A4) with different λ.

Uncertainty measures of (U, A5) with different λ.

Uncertainty measures of subsystems with the changeless
In this subsection, we give effectiveness analysis from two aspects of dispersion and correlation in statistics.
Dispersion analysis
In this paper, we apply the standard deviation coefficient to do effectiveness analysis of the proposed measures.
Given a data set X = {x1, ⋯ , x
n
}. Then its arithmetic average value
For i = 1, 2, 3, 4, 5, denote
CV-vaules
CV-vaules
From Figures 1,2,3,4,5,6 and Table 13, we obtain the following results:
(1) If we only need monotonicity, then G λ , E r λ , H λ and E λ have better performance for measuring uncertainty of a PSVIS;
(2) If we only consider dispersion degree, then E r λ has better performance for measuring uncertainty of a PSVIS;
(3) If we both need monotonicity and dispersion degree, then E r λ has better performance for measuring uncertainty of a PSVIS.
In statistics, Pearson correlation coefficient is a measure of the strength of a linear correlation between two variables or two data sets.
Given two data sets X = {x1, ⋯ , x n } and Y = {y1, ⋯ , y n }. Pearson correlation coefficient between X and Y, denoted by r XY , is defined by follows:
For i = 1, 2, 3, 4, 5, denote
Then the following results are obtained (see Tables 7-11).
The corresponding correlation between X and Y
Correlation analysis in the subsystem (U, A5)
Correlation analysis in the subsystem (U, A1)
Correlation analysis in the subsystem (U, A2)
From Table 8, the following conclusion is given (see Table 13).
Correlation analysis in the subsystem (U, A3)
Correlation analysis in the subsystem (U, A4)
From Tables 9-12, we come to the same conclusion as Table 13.
To further explore whether the performances of the four UMs are significantly different, analysis of variance and post-hoc pairwise comparison are given in this part.
In order to determine whether there is a statistically significant difference among three or more independent samples, analysis of variance (ANOVA) was invented by Fisher. If we only study the change of one factor during the test, it is referred to as one-way analysis of variance. Here, we only consider one-way analysis of variance. Due to the influence of the factor, the data obtained from the test shows fluctuation. There are two reasons for the fluctuation: one is from the uncontrollable random factors, and the other is from the controllable factors exerted on the results in the test. Suppose that SST means the sum of squares for total, SSA expresses the sum of squares for factor A among groups, and SSE indicates the sum of squares for the uncontrollable random factors. Then
The statistic F follows the fisher distribution with k - 1 and n - k degrees of freedom. If the value of the statistic F is larger than the critical value of F α (k - 1, n - k), it means the null hypothesis is rejected under the test. However, this does not tell which samples are different from each other, it simply tells that not all of the sample means are equal. Then post-hoc test can be conducted to further explore the difference among sample means. Post-hoc test includes Tukey’s test, Holm’s method, Dunnett’s correction, least significant difference, and so on.
Below, we view CV-values of the four UMs as four independent samples, and demonstrate the statistically significant difference by using ANOVA test and post-hoc test. We choose Tukey’s test to perform a post-hoc test. We pick 0.05 as the significance level, and the test results are obtained (see Tables 14 and 15, and Fig. 7).
The correlation between two λ-measures
ANOVA
where “df”, “sum sq” and “mean sq” express “degree of freedom”, “the sum of squares” and “the mean of squares”, respectively.

Tukey’s test.
From Table 14 output, we see that the F-statistic is 21.42 and the corresponding p-value is 7.56 × 10-6 which is extremely small. This means we have sufficient evidence to reject the null hypothesis that all of the group means are equal. Next, we can use a post hoc test (here we choose Tukey’s test) to find which group means are different from each other.
Tukey’s test gives two metrics to compare each pairwise difference, one is confidence interval for the mean difference (given by the values of lwr and upr), and the other is adjusted p-value for the mean difference. Both the confidence interval and the p-value lead to the same conclusion. From Table 15, the confidence interval for the mean difference between the measurement G and H is (-0.4462, -0.17005), and since this interval does not contain zero, we know that the difference between these two sample means is statistically significant. Likewise, the p-value for the mean difference between the measurement G and H is 4.87 × 10-5, which is less than our significance level of 0.05, so this also indicates that the difference between these two sample means is statistically significant. In particular, the difference is negative, and this means that H has smaller CV-values.
In Fig. 7, if the interval contains zero, then we know that the difference in sample means is not statistically significant. Therefore the differences for sample pairwise 2-1, 3-2 and 4-2 are statistically significant, but the differences for the other three pairwise comparisons are not statistically significant.
Based on the above analysis, we can come to the following conclusions:
a) The differences between G and H are statistically significant, and the performance of G is better than that of H;
b) The differences between H and E r are statistically significant, and the performance of E r is better than that of H;
c) The differences between H and E are statistically significant, and the performance of E is better than that of H;
d) There are no significant differences between G and H;
e) There are no significant differences between G and E;
f) There are no significant differences between E r and E.
In order to see the novelty of this study more clearly, we do comparison and discussion in this part.
1) Lang et al. [46] proposed three relations for attribute selection of a SVDIS. Then, they designed an incremental algorithm to compress a dynamic SVDIS. Concretely, they mainly addressed the compression updating from three aspects: variations of attribute set, immigration and emigration of objects and alterations of attribute values.
2) Dai et al. [47] gave the relative bound difference similarity degree between two objects and presented a α-tolerance relation by using this similarity degree. Moreover, they brought up some information theory concepts, including entropy, conditional entropy, and joint entropy. Based on these concepts, they provided an information theory view for attribute selection in ISDISs.
3) Wang et al. [48] dealt with attribute selection of a SVDIS based on α-level tolerance relation. They introduced the concepts of distribution reduct based on a SVDIS and examined the judgement theorems and discernibility matrices associated with the α-level tolerance relation.
4) Liu et al. [49] defined a dominance relation in a SVDIS based on similarity degree of attribute value sets. They a discernibility matrix by this dominance relation and studied the problem of attribute selection and the judgment in a SVDIS.
5) Singh et al. [50] introduced a new similarity degree between two set-valued attribute values and then proposed a fuzzy similarity-based rough set approach based on this fuzzy tolerance relation. In addition, attribute selection of a SVDIS based on degree of dependency is postulated.
6) This article aims to explore UM in a probability set-valued information system (PSVIS) based on Gaussian kernel method by means of its information structures. Given an PSVIS, we build the distance between objects in each subsystem of a PSVIS according to Bhattacharyya distance and obtain the fuzzy T cos -equivalence relations in a PSVIS by using Gaussian kernel method. Then, we define the information structures in a PSVIS. As an application for the information structures, we investigate UM for a PSVIS. In order to evaluate the performance of the investigated measures, we perform effectiveness analysis from the angle of statistics.
The comparison and discussion between this paper and several representative literatures are shown in Table 17.
Tukey’s test
Tukey’s test
where “diff”, “lwr” and “upr” indicate “difference value”, “the lower bound of the confidence interval” and “the upper bound of the confidence interval”, respectively.
The comparison of this paper with other literatures
In this paper, information structures in a PSVIS have been studied. By using information structures, granularity measure and entropy measures for a PSVIS have been investigated. Moreover, information amount in a PSVIS has been also considered. To evaluate the performance of the proposed measures, effectiveness analysis has been performed from the angle of statistics. It is worth mentioning that the comparison criteria for evaluating the performance of the proposed measures includes standard deviation coefficient, Pearson correlation coefficient, etc. A PSVIS in this paper is required that the information values of two objects about the same attribute obey the same probability distribution. This is the limitation of the proposed method. In future work, we will study the application of the obtained results in attribute reduction in a PSVIS.
For convenience, we give the appendix of these symbols as follows (see Table 18).
The appendix of symbols
Footnotes
Acknowledgements
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper.
