Measures of uncertainty for a fuzzy probabilistic approximation space

Abstract

An approximation space (A-space) is the base of rough set theory and a fuzzy approximation space (FA-space) can be seen as an A-space under the fuzzy environment. A fuzzy probability approximation space (FPA-space) is obtained by putting probability distribution into an FA-space. In this way, it combines three types of uncertainty (i.e., fuzziness, probability and roughness). This article is devoted to measuring the uncertainty for an FPA-space. A fuzzy relation matrix is first proposed by introducing the probability into a given fuzzy relation matrix, and on this basis, it is expanded to an FA-space. Then, granularity measurement for an FPA-space is investigated. Next, information entropy measurement and rough entropy measurement for an FPA-space are proposed. Moreover, information amount in an FPA-space is considered. Finally, a numerical example is given to verify the feasibility of the proposed measures, and the effectiveness analysis is carried out from the point of view of statistics. Since three types of important theories (i.e., fuzzy set theory, probability theory and rough set theory) are clustered in an FPA-space, the obtained results may be useful for dealing with practice problems with a sort of uncertainty.

Keywords

fuzzy relation uncertainty measure information granulation entropy effectiveness

1 Introduction

1.1 Research background and related works

Uncertainty mainly contains fuzziness, randomness and incompleteness, which can be seen everywhere in life and result from the imperfection imprecision or imperfection of knowledge. It can be applied in many areas, for example medical diagnosis, information retrieval and data mining.

Rough set theory, an important tool to deal with uncertainty, proposed by Pawlak [27]. This theory has been widely used [10 , 33].

An approximation space (A-space) is a fundamental concept in this theory, which consists of a given universe and its equivalent relation.

Fuzziness is one important aspect to describe uncertain information. Fuzziness is not considered in Pawlak’s rough set model. To extend this model, some scholars studied the combination of rough set theory and fuzzy set theory [8]. Mieszkowicz-Rolka et al. [25] obtained fuzzy rough set model with variable precision. Beaubouef et al. [2] looked into fuzzy rough set techniques for uncertainty processing in a relational database. Radzikowska et al. [31] inquired into a more general method to fuzzify rough sets. Chen et al. [5] put forward the roughness of fuzzy rough sets. Liu. [19] provided axiomatic systems for the unity of rough sets and fuzzy rough sets. Chen et al. [4] gave a feature selection mechanism for fuzzy rough sets.

As an A-space under the fuzzy environment, a fuzzy approximation space (FA-space) is the basis of fuzzy rough set theory. With the application of a rough approximation in fuzzy environment, more and more researches on fuzzy generalization of a rough approximation are carried out [35, 36].

Uncertainty measurement, a significant research topic, has an extensive application in many areas, such as data mining [9], medical diagnosis [15], feature selection [39], pattern recognition [3, 14] and machine learning [41]. Many scholars have made some profound explorations on uncertainty measurement and obtained some important research results. Shannon [32] obtained a notion of entropy for the measurement of random variables. Since the concept of entropy was introduced, entropy theory has gradually become the main research method to measure the uncertainty of rough sets or information systems. Many generalizations of Shannon’s entropy have been presented. For example, Yao. [44] provided a granularity measure from granulation’s point of view. Wierman. [37] considered the measure of granularity in rough set theory. Dai et al. [7] investigated entropy measures and granularity measures for set-valued information systems. Liang et al. [22] researched information granules and entropy theory in an information system. Dai et al. [6] inquired into uncertainty measures like entropy and granularity for covering rough set models. Li et al. [21] gave entropy measurement based on Gaussian kernel for a fully fuzzy information system; Xie et al. [40] studied some new measures of uncertainty for an interval-valued information system. Li et al. [24] researched information entropy in a fuzzy relation information system.

1.2 Motivation and contributions

Although many scholars have studied FA-spaces, most of their studies are based on the fact that the probability distribution of objects in the universe is uniform. Obviously, the existence of any object in the universe has a probability distribution [13]. This also means that their research can not fully reflect the essence of things. Therefore, in order to solve this problem, we study the combination of a probability distribution and an FA-space.

In order to better integrate probability distribution into an FA-space, we represent the fuzzy equivalent relationship generated by attribute subset with a relationship matrix. By putting probability distribution into fuzzy relationship matrix, we propose a new expression of fuzzy relationship matrix, which includes not only the similarity between objects but also the probability distribution of objects. Thus, we obtain a fuzzy probability approximation space (FPA-space), which can also be regarded as a special an FA-space containing three kinds of uncertainties: fuzziness, randomness and roughness. Then, we study the uncertainty of FPA-spaces, and consider new methods to express fuzzy information granulation, fuzzy information quantity, fuzzy rough entropy and fuzzy information entropy by relation matrix and probability distribution. In addition, it is proved that when the probability distribution of objects is uniform, the four methods will degenerate into the measurement of general FA-spaces. This also verifies the rationality of these methods.

At present, the real data that is applied to FPA-spaces can not be found temporarily. Thus, we make the experimental analysis of these four measures through an example. In this example, we use MATLAB to randomly generate 8 unequal fuzzy relation matrices, in which the similarity of objects is increasing. In addition, we regard probability as a parameter and analyze the effectiveness of the measurement of objects under different probability distribution. Through this experiment, we can not only mine the influence of similarity between objects on these measures, but also discover the influence of probability distribution on these measures. Although we lack real data testing, through these experiments, we can more comprehensively show the advantages and disadvantages of the obtained measures. The four measures proposed in this paper can not only be used to construct heuristic functions in feature selection, but also be applied to weight selection in multi-classifier systems. Therefore, it is of great significance to study the uncertainty measure of FPA-spaces.

1.3 Comparison and discussion

In this part, we do comparison and discussion discuss by means of some references in order to see more clearly the innovations in this paper.

(1) Aggarwal. [1] considered not only the probability of objects, but also the randomness of membership grades. On this basis, Aggarwal proposed a probabilistic fuzzy domain information system (PFIS) and studied its entropy measure. Finally, Aggarwal studied a probabilistic variable precision rough set model. One of its main advantages is that this paper considered the probability related to fuzzy membership.

(2) Hu et al. [17] introduced probability into an FA-space and obtained an FPA-space theory. Then, they introduced shannon entropy to measure the information quantity in Pawlak’s A-spaces. Next, they also proposed a new representation of Shannon entropy from the point of relation matrix. Based on the modified formula, they obtained some generalizations of information entropy in FPA-spaces and FA-spaces, respectively.

(3) Based on the reflexive L-fuzzy relations, Qiao et al. [30] introduced the relationship between L-FA-spaces and L-fuzzy pretopological spaces from a categorical viewpoint.

(4) Yu [43] explored the upper and lower fuzzy sets generated by a given fuzzy relationship. Then, based on the upper and lower fuzzy information granules, Yu obtained two measures and then presented the average of the two measures as the uncertainty of FA-spaces. Moreover, Yu analyzed the effectiveness of the measures, and considered the relationships (namely, equality, dependence, independence and difference) between FA-spaces. Finally, Yu obtained the mathematical characteristics of FA-spaces.

(5) Yu et al. [45] studied the entropy measure of fuzzy relation with a probability distribution. This measure can not only calculate the diversity of information fusion systems, but also measure the uncertainty of granulated problem spaces. In addition, the weight vector of a multi classifier system is regarded as a probability distribution, and the uncertainty of the system is calculated by the measure. It is found that the weight of the classifier in a multi classifier system will affect the uncertainty of this fusion system. Therefore, the obtained measure can be used to solve the weight allocation problem of multi classifier systems.

(6) Zhang et al. [49] defined probability measure and Pythagorean fuzzy approximation operator. On this basis, they introduced plausibility and belief functions of a Pythagorean fuzzy information system. Then they also discussed the relationship between Pythagorean fuzzy belief structure and Pythagorean FA-spaces. Finally, they also proposed a reduction algorithm for Pythagorean fuzzy decision information systems.

(7) Zhang et al. [50] further explored the problems in β-fuzzy covering approximation space (β-FCAS). First, they put forward the concepts of I-reduct and I-irreducible element, which are a supplement to the existing concepts. Next, they studied the relationship between fuzzy β-covering and its I-reduct, and fuzzy β-minimal description and fuzzy β-reduct respectively. In addition, they also proposed some new notions between the two β-FCASs and their properties. Moreover, they introduced the condition that two fuzzy β-coverings have the same reduct by these obtained concepts. Finally, they further studied seven derivatives of β-FCASs and proposed corresponding lattices of them.

(8) In this paper, we first show how the probability distribution is combined with the fuzzy approximate relation, that is, the probability distribution is put into the fuzzy relation matrix, and the new expression of the fuzzy relation matrix is obtained. In addition, based on the obtained matrix, we study some new methods to express fuzzy information granularity, fuzzy information amount, fuzzy rough entropy and fuzzy information entropy. Finally, the effects of similarity and probability distribution on the four measures are analyzed by numerical experiments.

1.4 Organization

In this paper, we extract the probability of an FPA-space so that this FPA-space can induce the FA-space. In addition, give some tools for measuring uncertainty of an FPA-space.

The work process of this paper is shown in Fig. 1.

Fig. 1

The work flow of this paper.

As shown in Figure 1, an FPA-space is firstly obtained by combining an FA-space with a probability distribution, which is also regarded as a special FA-space whose fuzzy relations include the probability distribution. Then, four tools for measuring uncertainty in FPA-spaces are studied. Finally, the validity of these four measurement methods is analyzed by an example.

The remaining part of this article is organized as follows. In Section 2, we recall some basic concepts about fuzzy relations and FPA-spaces, and obtain an FA-space by defining a fuzzy relation matrix. In Section 3, we study four tools to measure uncertainty in an FPA-space. In Section 4, we display a numerical experiment and do effectiveness analysis. In Section 5, we make a summary of this paper.

2 Preliminaries

In this section, we mainly present some notions about fuzzy sets, fuzzy relations and FPA-spaces.

In this paper, U expresses a non-empty finite set and I indicates [0, 1] . Put $U = {x_{1}, x_{2}, \dots, x_{n}} .$

2.1 Fuzzy relations

F is reputedly a fuzzy set whenever F is a function defined by F : U → I .

For a ∈ I, $\bar{a}$ denotes the constant fuzzy set on U, namely, ∀ x ∈ U, $\bar{a} (x) = a .$

Throughout this paper, I^U indicates the collection of fuzzy sets on U .

Let F ∈ I^U. Then F is denoted as $F = \frac{F (x_{1})}{x_{1}} + \frac{F (x_{2})}{x_{2}} + \dots + \frac{F (x_{n})}{x_{n}}$ and $| F | = \sum_{i = 1}^{n} F (x_{i})$ expresses the cardinality of F .

If R is a fuzzy set in U × U, then R is said to be a fuzzy relation on U .

In this article, I^U×U show the collection of all fuzzy relations on U .

Put R ∈ I^U×U . Then R can be expressed by the following matrix ([43]) $M (R) = (\begin{matrix} r_{11} & r_{12} & \dots & r_{1 n} \\ r_{21} & r_{22} & \dots & r_{2 n} \\ \dots & \dots & \dots & \dots \\ r_{n 1} & r_{n 2} & \dots & r_{nn} \end{matrix}),$ where r_ij = R (x_i, x_j) expresses the degree of similarity between x_i and x_j .

Suppose R ∈ I^U×U. Then R is called reflexive, if R (x, x) =1 for any x ∈ U; R is called symmetric, if R (x, y) = R (y, x) for any x, y ∈ U; R is called transitive, if R (x, z) ≥ R (x, y) ∧ R (y, z) for any x, y, z ∈ U.

If R is reflexive and symmetric, then R is called tolerance; if R is reflexive and transitive, then R is called similarity; if R is reflexive, symmetric and transitive, R is called equivalence. Moreover, if M (R) = E (here, E is an identity matrix), then R is called a fuzzy identity relation on U, and we mark it as R =▵ ; if R (x_i, x_j) =1 for any i, j, then R is called a fuzzy universal relation on U, and we mark it as R = ω .

Below, we give some operations and relations on I^U×U [46].

Put R, R′ ∈ I^U×U with $M (R) = (r_{ij})_{nn}, M (R^{'}) = (r_{ij}^{'})_{nn}$ . Then

(1) R = R′ ⇔ $\forall i, j, r_{ij} = r_{ij}^{'}$ ;

(2) R ⊆ R′ ⇔ $\forall i, j, r_{ij} \leq r_{ij}^{'}$ ;

(3) R ⊂ R′ ⇔ R ⊆ R′ and R ≠ R′;

(4) ∀ i, j, M (≀ R) =1 - r_ij.

Given R ∈ I^U×U . For any x ∈ U, two fuzzy sets on U are defined as follows: $[x]^{R} (y) = R (x, y), \forall y \in U;$ $[x]_{R} (y) = R (y, x), \forall y \in U .$ Then [x] ^R and [x] _R are called the upper-fuzzy and lower-fuzzy sets of x with respect to R, respectively.

Obviously, $[x_{i}]^{R} = \frac{r_{i 1}}{x_{1}} + \frac{r_{i 2}}{x_{2}} + \dots + \frac{r_{in}}{x_{n}},$ $[x_{i}]_{R} = \frac{r_{1 i}}{x_{1}} + \frac{r_{2 i}}{x_{2}} + \dots + \frac{r_{ni}}{x_{n}} .$ Then $| [x_{i}]^{R} | = \sum_{j = 1}^{n} r_{ij}, | [x_{i}]_{R} | = \sum_{j = 1}^{n} r_{ji} .$

2.2 FPA-spaces

Definition 2.1. ([8]) Let U be a nonempty and finite set of objects, called the universe. Then the ordered pair (U, R) is referred to as a fuzzy approximation space (FA-space), if R ∈ I^U×U .

Definition 2.2. ([17]) Let U = {x₁, x₂, ⋯ , x_n} . Suppose that probability of occurrence on x_i is p_i (i = 1, 2, ⋯ , n) . If 0 < p_i ≤ 1 (i = 1, 2, ⋯ , n) and $\sum_{i = 1}^{n} p_{i} = 1,$ then $P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}$ is called a probability distribution over U .

Definition 2.3. ([17]) Let U be a nonempty and finite set of objects, called the universe. Suppose R ∈ I^U×U . Then the ordered pair (U, P, R) is referred to as a fuzzy probabilistic approximation space (FPA-space), if P is a probability distribution over U .

Let (U, P, R) be an FPA-space where $P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}, M (R) = (r_{ij})_{n \times n} .$

Define $M (R_{P}) = (\begin{matrix} p_{1} r_{11} & p_{2} r_{12} & \dots & p_{n} r_{1 n} \\ p_{1} r_{21} & p_{2} r_{22} & \dots & p_{n} r_{2 n} \\ \dots & \dots & \dots & \dots \\ p_{1} r_{n 1} & p_{2} r_{n 2} & \dots & p_{n} r_{nn} \end{matrix}) .$

Then R_P is called the fuzzy relation induced by (U, P, R).

The matrix expression M (R_P) of the fuzzy relation R_P contains not only the similarity between objects but also the probability distribution. Thus, R_P reflects the internal characteristics of (U, P, R).

Suppose that (U, P, R) and (U, P, R′) are two FPA-spaces where $P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}},$ $M (R) = (r_{ij})_{nn}, M (R^{'}) = (r_{ij}^{'})_{nn} .$ Then $M (R_{P}) = (p_{j} r_{ij})_{nn}, M (R_{P}^{'}) = (p_{j} r_{ij}^{'})_{nn} .$

Obviously,

$R = R^{'} \Leftrightarrow R_{P} = R_{P}^{'},$

$R \subseteq R^{'} \Leftrightarrow R_{P} \subseteq R_{P}^{'},$

$R \subset R^{'} \Leftrightarrow R_{P} \subset R_{P}^{'} .$

Example 2.4. Given U = {x₁, x₂, ⋯ , x₇}. Suppose $P = {\frac{x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}}{0.17, 0.22, 0.18, 0.14, 0.16, 0.03, 0.10}} .$ Put $M (R) = (\begin{matrix} 0.57 & 0.55 & 0.63 & 0.91 & 0.45 & 0.99 & 0.65 \\ 0.01 & 0.02 & 0.11 & 0.04 & 0.15 & 0.11 & 0.11 \\ 0.45 & 0.65 & 0.76 & 0.66 & 0.55 & 0.55 & 0.75 \\ 0.06 & 0.07 & 0.11 & 0.12 & 0.23 & 0.21 & 0.14 \\ 0.01 & 0.12 & 0.23 & 0.08 & 0.08 & 0.07 & 0.08 \\ 0.55 & 0.79 & 0.55 & 0.33 & 0.33 & 0.47 & 0.85 \\ 0.36 & 0.33 & 0.43 & 0.74 & 0.48 & 0.67 & 0.66 \end{matrix}) .$ Then (U, P, R) is an FPA-space. We have $M (R_{P}) = (\begin{matrix} 0.0969 & 0.1210 & 0.1134 & 0.1274 & 0.0720 & 0.0297 & 0.0650 \\ 0.0017 & 0.0044 & 0.0198 & 0.0056 & 0.0240 & 0.0033 & 0.0110 \\ 0.0765 & 0.1430 & 0.1368 & 0.0924 & 0.0880 & 0.0165 & 0.0750 \\ 0.0102 & 0.0154 & 0.0198 & 0.0168 & 0.0368 & 0.0063 & 0.0140 \\ 0.0017 & 0.0264 & 0.0414 & 0.0112 & 0.0128 & 0.0021 & 0.0080 \\ 0.0935 & 0.1738 & 0.0990 & 0.0462 & 0.0528 & 0.0141 & 0.0850 \\ 0.0612 & 0.0726 & 0.0774 & 0.1036 & 0.0768 & 0.0201 & 0.0660 \end{matrix}) .$

3 Uncertainty measurement for an FPA-space

In this section, we study four tools to measure the uncertainty of an FPA-space.

3.1 Granulation measurement for an FPA-space

Definition 3.1. ([43]) Let (U, R) be an FA-space. Then, the upper information, lower information and information granulation of (U, R) are defined as follows: $G_{u} (R) = \frac{1}{n^{2}} \sum_{i = 1}^{n} | [x_{i}]^{R} |, G_{l} (R) = \frac{1}{n^{2}} \sum_{i = 1}^{n} | [x_{i}]_{R} | .$ $G (R) = \frac{1}{2} (G_{u} (R) + G_{l} (R)) .$

Definition 3.2. Let (U, P, R) be an FPA-space.

(1) Upper-information granulation of (U, P, R) is defined as $G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} | [x_{i}]^{R_{P}} | .$

(2) Lower-information granulation of (U, P, R) is defined as $G_{l} (P, R) = \sum_{i = 1}^{n} p_{i} | [x_{i}]_{R_{P}} | .$

(3) Information granulation of (U, P, R) is defined as $G (P, R) = \frac{1}{2} (G_{u} (P, R) + G_{l} (P, R)) .$

Clearly, $G_{u} (P, R) + G_{u} (P, ≀ R) = 1, G_{l} (P, R) + G_{l} (P, ≀ R) = 1,$ $G_{u} (P, R) = G_{l} (P, R^{- 1}) .$

Given $\begin{matrix} U = {x_{1}, x_{2}, \dots, x_{n}}, P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}, \\ M (R) = (r_{ij})_{n \times n} . \end{matrix}$

Then $\begin{matrix} G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} R_{P} (x_{i}, x_{j}) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} r_{ij}, \\ G_{l} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} R_{P} (x_{j}, x_{i}) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{i} r_{ji} . \end{matrix}$

If $p_{1} = p_{2} = \dots = p_{n} = \frac{1}{n}$ , then $\begin{matrix} G_{u} (P, R) = \frac{1}{n^{2}} \sum_{i = 1}^{n} | [x_{i}]^{R} | = G_{u} (R) \\ G_{l} (P, R) = \frac{1}{n^{2}} \sum_{i = 1}^{n} | [x_{i}]_{R} | = G_{l} (R) \\ G (P, R) = \frac{1}{2} (G_{u} (P, R) + G_{l} (P, R)) \\ = \frac{1}{2} (G_{u} (R) + G_{l} (R)) = G (R) . \end{matrix}$

Example 3.3 (Continue to Example 2.4) $\begin{matrix} G_{u} (P, R) = \sum_{i = 1}^{7} p_{i} | [x_{i}]^{R_{P}} | \approx 0.3327, \\ G_{l} (P, R) = \sum_{i = 1}^{7} p_{i} | [x_{i}]_{R_{P}} | \approx 0.4216, \\ G (P, R) = \frac{1}{2} (G_{u} (P, R) + G_{l} (P, R)) \approx 0.3772 . \end{matrix}$

This example illustrates that G_u (P, R) notapproxG_l (P, R) . Thus, Definition 3.2 is reasonable.

Proposition 3.4. Let (U, P, R) be an FPA-space. Then $0 \leq G_{u} (P, R) \leq 1 .$ Moreover, if R = o, then G_u (P, R) reaches the minimum value 0; if R = ω, then G_u (P, R) reaches the maximum value 1.

Proof. (1) By Definition 3.2,

$G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} r_{ij} .$

Obviously, ∀ i, j, 0 ≤ r_ij ≤ 1 .

This implies that ∀ i, j, $0 \leq \sum_{j = 1}^{n} p_{j} r_{ij} \leq \sum_{j = 1}^{n} p_{j} = 1 .$

Then $0 \leq \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} r_{ij} \leq \sum_{i = 1}^{n} p_{i} = 1 .$

Thus $0 \leq G_{u} (P, R) \leq 1 .$

(2) Suppose R = o. Then ∀ i, j, r_ij = 0, so

$G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} r_{ij} = 0$ .

(3) Suppose R = ω. Then ∀ i, j, r_ij = 1, Thus

$G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} = \sum_{i = 1}^{n} p_{i} = 1 .$

Hence G_u (P, R) =1.□

Proposition 3.5. Let (U, P, R) be an FPA-space. Then $0 \leq G_{l} (P, R) \leq \sum_{i = 1}^{n} {np}_{i}^{2} .$ Moreover, if R = o, then G_l (P, R) reaches the minimum value 0; if R = ω, then G_l (P, R) reaches the maximum value $\sum_{i = 1}^{n} {np}_{i}^{2}$ .

Proof. (1) By Definition 3.2,

$G_{l} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{i} r_{ji} .$

Obviously, ∀ i, j, 0 ≤ r_ji ≤ 1 .

This implies that ∀ i, j, $0 \leq \sum_{j = 1}^{n} p_{i} r_{ji} \leq \sum_{j = 1}^{n} p_{i} = {np}_{i} .$

Then $0 \leq \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{i} r_{ji} \leq \sum_{i = 1}^{n} {np}_{i}^{2} .$

Thus $0 \leq G_{l} (P, R) \leq \sum_{i = 1}^{n} {np}_{i}^{2} .$

(2) Suppose R = o. Then ∀ i, j, r_ji = 0. So

$G_{l} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{i} r_{ji} = 0$ .

Hence, G_l (P, R) achieves the minimum value 0 when R = o.

(3) Suppose R = ω. Then ∀ i, j, r_ji = 1, Thus

$G_{l} (P, R) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} p_{j} = \sum_{i = 1}^{n} {np}_{i}^{2} .$

This implies that $G_{l} (P, R) = \sum_{i = 1}^{n} {np}_{i}^{2}$ .

Hence, G_l (P, R) achieves the maximum value $\sum_{i = 1}^{n} {np}_{i}^{2}$ when R = ω.□

Proposition 3.6. Let (U, P, R) be an FPA-space. Then $0 \leq G (P, R) \leq \frac{1}{2} + \frac{1}{2} \sum_{i = 1}^{n} {np}_{i}^{2} .$ Moreover, if R = o, then G (P, R) reaches the minimum value 0; if R = ω, then G (P, R) reaches the maximum value $\frac{1}{2} + \frac{1}{2} \sum_{i = 1}^{n} {np}_{i}^{2} .$

Proof. It can be proved by Propositions 3.4 and 3.5. □

Theorem 3.7. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) If T ⊂ Q, then G (P, T)< G (P, Q) ;

(2) If T ⊆ Q, then G (P, T) ≤ G (P, Q) .

Proof. (1) Since T ⊂ Q, we have T_P ⊂ Q_P. Then ∀ i, [x_i] ^{T
_P} ⊆ [x_i] ^{Q
_P} (x_i) and ∃ i′, [x_i′] ^{T
_P} ⊊ [x_i′] ^{Q
_P}.

Thus ∀ i, j, T_P (x_i, x_j) ≤ Q_P (x_i, x_j) and ∃ i′, j′, T_P (x_i′, x_j′) < Q_P (x_i′, x_j′) .

By Definition 3.2, $\begin{matrix} G_{u} (P, T) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} T_{P} (x_{i}, x_{j}), \\ G_{u} (P, Q) = \sum_{i = 1}^{n} p_{i} \sum_{j = 1}^{n} Q_{P} (x_{i}, x_{j}) . \end{matrix}$

Hence G_u (P, T) < G_u (P, Q).

In the same manner, we can prove that G_l (P, T) < G_l (P, Q).

Hence, G (P, T) < G (P, Q).

(2) The proof is similar to (1) .□

The above theorem indicates that the information granulation increases when the FPA-space becomes coarser and decreases when the FPA-space becomes finer.

3.2 Entropy measurement for an FPA-space

3.2.1 Rough entropy measurement for an FPA-space

Definition 3.8. ([43]) Let (U, R) be an FA-space. Then, the upper rough, lower rough and rough entropy of (U, R) are defined as follows: $\begin{matrix} (E_{r})_{u} (R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{1}{| [x_{i}]^{R} |}, \\ (E_{r})_{l} (R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{1}{| [x_{i}]_{R} |}, \\ E_{r} (R) = \frac{1}{2} ((E_{r})_{u} (R) + (E_{r})_{l} (R)) . \end{matrix}$

Definition 3.9. Let (U, P, R) be an FPA-space.

(1) Upper-rough entropy of (U, P, R) is defined as $(E_{r})_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]^{R} |} .$

(2) Lower-rough entropy of (U, P, R) is defined as $(E_{r})_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]_{R} |} .$

(3) Rough entropy of (U, P, R) is defined as $E_{r} (P, R) = \frac{1}{2} ((E_{r})_{u} (P, R) + (E_{r})_{l} (P, R)) .$

Given $\begin{matrix} U = {x_{1}, x_{2}, \dots, x_{n}}, P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}, \\ M (R) = (r_{ij})_{n \times n} . \end{matrix}$

Then $\begin{matrix} (E_{r})_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]^{R} |} \\ = \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} r_{ij}, \\ (E_{r})_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]_{R} |} \\ = \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} r_{ji} . \end{matrix}$

If $p_{1} = p_{2} = \dots = p_{n} = \frac{1}{n}$ , then $\begin{matrix} (E_{r})_{u} (P, R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{1}{| [x_{i}]^{R} |} = (E_{r})_{u} (R), \\ (E_{r})_{l} (P, R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{1}{| [x_{i}]_{R} |} = (E_{r})_{l} (R), \\ E_{r} (P, R) = \frac{1}{2} ((E_{r})_{u} (P, R) + (E_{r})_{l} (P, R)) \\ = \frac{1}{2} ((E_{r})_{u} (R) + (E_{r})_{l} (R)) = E_{r} (R) . \end{matrix}$

Example 3.10. (Continue to Example 2.4) $\begin{matrix} (E_{r})_{u} (P, R) = - \sum_{i = 1}^{7} p_{i} {log}_{2} \frac{1}{| [x_{i}]^{R} |} \approx 0.7166, \\ (E_{r})_{l} (P, R) = - \sum_{i = 1}^{7} p_{i} {log}_{2} \frac{1}{| [x_{i}]_{R} |} \approx 1.3561, \\ E_{r} (P, R) = \frac{1}{2} ((E_{r})_{u} (P, R) + (E_{r})_{l} (P, R)) \approx 1.0363 . \end{matrix}$

This example illustrates that (E_r) _u (P, R) notapprox (E_r) _l (P, R) . Thus, Definition 3.9 is reasonable.

Proposition 3.11. Let (U, P, R) be an FPA-space. Then $- \infty < E_{r} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$ If R is reflexive, then $0 \leq E_{r} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$ Furthermore, if R = ω, then E_r (P, R) reaches the maximum value $\sum_{i = 1}^{n} p_{i} \log_{2} n$ .

Proof. (1) By Definition 3.9, $(E_{r})_{u} (P, R) = \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} r_{ij} .$

∀ i, j, $0 \leq r_{ij} \leq 1 .$

So $\forall i, 0 < \sum_{j = 1}^{n} r_{ij} \leq \sum_{j = 1}^{n} 1 = n .$

This means that $- \infty < {log}_{2} \sum_{j = 1}^{n} r_{ij} \leq \log_{2} n .$

Thus $- \infty < (E_{r})_{u} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$

Similarly, we can prove that $- \infty < (E_{r})_{l} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$

Hence $- \infty < E_{r} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$ (2) Suppose that R is reflexive. Then ∀ i, r_ii = 1.

So $\forall i, 1 \leq \sum_{j = 1}^{n} r_{ij} \leq n .$

Thus $\forall i, 0 \leq {log}_{2} (\sum_{j = 1}^{n} r_{ij}) \leq {log}_{2} n .$

This shows that $0 \leq (E_{r})_{u} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$

Similarly, we can prove that $0 \leq (E_{r})_{l} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$

Hence $0 \leq E_{r} (P, R) \leq \sum_{i = 1}^{n} p_{i} \log_{2} n .$

(3) Suppose R = ω. Then ∀ i, j, r_ij = 1. So $(E_{r})_{u} (P, R) = (E_{r})_{l} (P, R) = \sum_{i = 1}^{n} p_{i} \log_{2} n .$

Thus $E_{r} (P, R) = \sum_{i = 1}^{n} p_{i} \log_{2} n .$ □

Proposition 3.12. Let (U, P, R) be an FPA-space.

(1) If T ⊂ Q, then E_r (P, T) < E_r (P, Q).

(2) If T ⊆ Q, then E_r (P, T) ≤ E_r (P, Q).

Proof. (1) Since T ⊂ Q, we have T_P ⊂ Q_P. Then ∀ i, [x_i] ^{T
_P} ⊆ [x_i] ^{Q
_P} (x_i) and ∃ i′, [x_i′] ^{T
_P} ⊊ [x_i′] ^{Q
_P}.

Thus ∀ i, j, T_P (x_i, x_j) ≤ Q_P (x_i, x_j) and ∃ i′, j′, T_P (x_i′, x_j′) < Q_P (x_i′, x_j′) .

By Definition 3.9, $\begin{matrix} (E_{r})_{u} (P, T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]^{T} |} \\ = \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} T_{P} (x_{i}, x_{j}), \\ (E_{r})_{u} (P, Q) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{1}{| [x_{i}]^{Q} |} \\ = \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} Q_{P} (x_{i}, x_{j}) . \end{matrix}$

Hence $(E_{r})_{u} (P, T) < (E_{r})_{u} (P, Q) .$

Similarly, we can prove that (E_r) _l (P, T) < (E_r) _l (P, Q) .

Thus $E_{r} (P, T) < E_{r} (P, Q) .$

(2) The proof is similar to (1).□

The above theorem indicates that the rough entropy increases when the FPA-space becomes coarser and decreases when the FPA-space becomes finer.

3.2.2 Information entropy measurement for an FPA-space

Definition 3.13. ([43]) Let (U, R) be an FA-space. Then, the upper information, lower information and information entropy of (U, R) are defined as follows: $\begin{matrix} H_{u} (R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| [x_{i}]^{R} |}{n}, \\ H_{l} (R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| [x_{i}]_{R} |}{n}, \\ H (R) = \frac{1}{2} (H_{u} (R) + H_{l} (R)) . \end{matrix}$

Definition 3.14. Let (U, P, R) be an FPA-space.

(1) Upper-information entropy of (U, P, R) is defined as $H_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]^{R_{P}} | .$

(2) Lower-information entropy of (U, P, R) is defined as $H_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]_{R_{P}} | .$

(3) Information entropy of (U, P, R) is defined as

$H (P, R) = \frac{1}{2} (H_{u} (P, R) + H_{l} (P, R)) .$

Given $\begin{matrix} U = {x_{1}, x_{2}, \dots, x_{n}}, P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}, \\ M (R) = (r_{ij})_{n \times n} . \end{matrix}$

Then $\begin{matrix} H_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]^{R_{P}} | \\ = - \sum_{i = 1}^{n} p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}), \\ H_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]_{R_{P}} | \\ = - \sum_{i = 1}^{n} p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) . \end{matrix}$

If $p_{1} = p_{2} = \dots = p_{n} = \frac{1}{n}$ , then $\begin{matrix} H_{u} (P, R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| [x_{i}]^{R} |}{n} = H_{u} (R), \\ H_{l} (P, R) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| [x_{i}]_{R} |}{n} = H_{l} (R), \\ H (P, R) = \frac{1}{2} (H_{u} (P, R) + H_{l} (P, R)) \\ = \frac{1}{2} (H_{u} (R) + H_{l} (R)) = H (R) . \end{matrix}$

Example 3.15. (Continue to Example 2.4) $\begin{matrix} H_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]^{R_{P}} | \approx 2.1649, \\ H_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]_{R_{P}} | \approx 1.3085, \\ H (P, R) = \frac{1}{2} (H_{u} (P, R) + H_{l} (P, R)) \approx 1.7367 . \end{matrix}$

This example illustrates that H_u (P, R)≠H_l (P, R) . Thus, Definition 3.14 is reasonable.

Proposition 3.16. Let (U, P, R) be an FPA-space. Then $0 \leq H_{u} (P, R) < + \infty .$ Moreover, if R is reflexive, then $0 \leq H_{u} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i};$ if R = ω, then H_u (P, R) achieves the minimum value 0 ;

Proof. (1) By Definition 3.14, $H_{u} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}) .$

It should be noted that ∀ i, j, 0 ≤ r_ij ≤ 1 . Then $\forall i, 0 \leq \sum_{j = 1}^{n} p_{j} r_{ij} \leq \sum_{j = 1}^{n} p_{j} = 1 .$

Thus ∀ i, $- \infty < {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}) \leq 0 .$

This implies that $\forall i, 0 \leq - p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}) < + \infty .$

Hence $0 \leq H_{u} (P, R) < + \infty .$

(2) Suppose that R is reflexive. Then ∀ i, r_ii = 1 . Thus $\forall i, p_{i} = p_{i} r_{ii} \leq \sum_{j = 1}^{n} p_{j} r_{ij} .$

Thus $\forall i, p_{i} {log}_{2} p_{i} \leq p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}) .$

Hence $H_{u} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i} .$

By (1), $0 \leq H_{u} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i} .$

(3) Suppose R = ω . Then ∀ i, j, r_ij = 1 . Thus $\forall i, {log}_{2} (\sum_{j = 1}^{n} p_{j} r_{ij}) = {log}_{2} (\sum_{j = 1}^{n} p_{j}) = {log}_{2} 1 = 0 .$

This implies that H_u (P, R) =0.Hence, H_u (P, R) achieves the minimum value 0 when R = ω.□

Proposition 3.17. Let (U, P, R) be an FPA-space. Then $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H_{l} (P, R) < + \infty .$ Moreover, if R is reflexive, then $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H_{l} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i};$ if R = ω, then H_l (P, R) achieves the minimum value $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) .$

Proof. (1) By Definition 3.14, $H_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) .$

It should be noted that ∀ i, j, 0 ≤ r_ji ≤ 1 . Then $\forall i, 0 \leq \sum_{j = 1}^{n} p_{i} r_{ji} \leq \sum_{j = 1}^{n} p_{i} = {np}_{i} .$

Thus $\forall i, - \infty < {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) \leq {log}_{2} ({np}_{i}) .$

This implies that $\forall i, - p_{i} {log}_{2} ({np}_{i}) \leq - p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) < + \infty .$

Hence $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H_{l} (P, R) < + \infty .$

(2) Suppose that R is reflexive. Then ∀ i, r_ii = 1 . This implies that $\forall i, p_{i} = p_{i} r_{ii} \leq \sum_{j = 1}^{n} p_{i} r_{ji} .$

Thus $\forall i, p_{i} {log}_{2} p_{i} \leq p_{i} {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) .$

Hence $H_{l} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i} .$

By (1), $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H_{l} (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i} .$

(3) Suppose R = ω . Then ∀ i, j, r_ij = 1 . Thus $\forall i, {log}_{2} (\sum_{j = 1}^{n} p_{i} r_{ji}) = {log}_{2} (\sum_{j = 1}^{n} p_{i}) = {log}_{2} ({np}_{i}) .$

This implies that $H_{l} (P, R) = - \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i})$ .

Hence, H_l (P, R) achieves the minimum value $- \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i})$ when R = ω.□

Theorem 3.18. Let (U, P, R) be an FPA-space. Then $- \frac{1}{2} \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H (P, R) < + \infty .$ Moreover, if R is reflexive, then $- \frac{1}{2} \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) \leq H (P, R) \leq - \sum_{i = 1}^{n} p_{i} {log}_{2} p_{i};$ if R = ω, then H achieves the minimum value $- \frac{1}{2} \sum_{i = 1}^{n} p_{i} {log}_{2} ({np}_{i}) .$

Proof. It can be proved by Propositions 3.16 and 3.17.□

Theorem 3.19. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) If T ⊂ Q, then H (P, Q) < H (P, T).

(2) If T ⊆ Q, then H (P, Q) ≤ H (P, T).

Proof. (1) Since T ⊂ Q, we have T_P ⊂ Q_P. Then ∀ i, [x_i] ^{T
_P} ⊆ [x_i] ^{Q
_P} (x_i) and ∃ i′, [x_i′] ^{T
_P} ⊊ [x_i′] ^{Q
_P}.

So ∀ i, j, T (x_i, x_j) ≤ Q (x_i, x_j) and ∃ i′, j′, T (x_i′, x_j′) < Q (x_i′, x_j′) .

By Definition 3.14, $H_{u} (P, T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \sum_{j = 1}^{n} [x_{i}]^{T_{P}} (x_{j}) .$

Then $H_{u} (P, T) = - \sum_{i = 1}^{n} {log}_{2} (\sum_{j = 1}^{n} T_{P} (x_{i}, x_{j})) .$

Homoplastically, we have $H_{u} (P, Q) = - \sum_{i = 1}^{n} {log}_{2} (\sum_{j = 1}^{n} Q_{P} (x_{i}, x_{j})) .$

Thus H_u (P, Q) < H_u (P, T) .

In the same manner, we can prove that H_l (P, Q) < H_l (P, T) .

Hence, H (P, Q) < H (P, T) .

(2) The proof is similar to (1) .□

The above theorem indicates that the information entropy increases while the FPA-space becomes finer; inversely, it decreases while the FPA-space becomes coarser.

Naturally, we can also consider other types of information entropy such as joint entropy, conditional entropy, cross entropy, relative entropy. Limited by space, this paper only gives their definitions and does not discuss them in depth.

Definition 3.20. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) Upper-joint entropy of Q and T is defined as $H_{u} (P, Q \cup T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]^{Q_{P} \cap T_{P}} | .$

(2) Lower-joint entropy of Q and T is defined as $H_{l} (P, Q \cup T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} | [x_{i}]_{Q_{P} \cap T_{P}} | .$

(3) The joint entropy of Q and T is defined as

$H (P, Q \cup T) = \frac{1}{2} (H_{u} (P, Q \cup T) + H_{l} (P, Q \cup T)) .$

Definition 3.21. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) Upper-conditional entropy of Q to T is defined as $H_{u} (P, Q | T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{| [x_{i}]^{Q_{P} \cap T_{P}} |}{| [x_{i}]^{T_{P}} |} .$

(2) Lower-conditional entropy of Q to T is defined as $H_{l} (P, Q | T) = - \sum_{i = 1}^{n} p_{i} {log}_{2} \frac{| [x_{i}]_{Q_{P} \cap T_{P}} |}{| [x_{i}]_{T_{P}} |} .$

(3) The conditional entropy of Q to T is defined as

$H (P, Q | T) = \frac{1}{2} (H_{u} (P, Q | T) + H_{l} (P, Q | T)) .$

Obviously, H_u (P, Q|T) = H_u (P, Q ∪ T) - H_u (P, T) , H_l (P, Q|T) = H_l (P, Q ∪ T) - H_l (P, T) , H (P, Q|T) = H (P, Q ∪ T) - H (P, T) .Definition 3.22. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) Upper-cross entropy of Q and T is defined as $H_{u} (P, Q, T) = - \sum_{i = 1}^{n} p_{i} | [x_{i}]^{Q_{P}} | {log}_{2} | [x_{i}]^{T_{P}} | .$

(2) Lower-cross entropy of Q and T is defined as $H_{l} (P, Q, T) = - \sum_{i = 1}^{n} p_{i} | [x_{i}]_{Q_{P}} | {log}_{2} | [x_{i}]_{T_{P}} | .$

(3) The cross entropy of Q and T is defined as

$H (P, Q, T) = \frac{1}{2} (H_{u} (P, Q \cup T) + H_{l} (P, Q \cup T)) .$

Definition 3.23. Let (U, P, T) and (U, P, Q) be two FPA-spaces.

(1) Upper-relative entropy of Q and T is defined as $H_{u} (P, Q | | T) = \sum_{i = 1}^{n} p_{i} | [x_{i}]^{Q_{P}} | {log}_{2} \frac{| [x_{i}]^{Q_{P}} |}{| [x_{i}]^{T_{P}} |} .$

(2) Lower-relative entropy of Q and T is defined as $H_{l} (P, Q | | T) = \sum_{i = 1}^{n} p_{i} | [x_{i}]_{Q_{P}} | {log}_{2} \frac{| [x_{i}]_{Q_{P}} |}{| [x_{i}]_{T_{P}} |} .$

(3) The relative entropy of Q and T is defined as

$H (P, Q | | T) = \frac{1}{2} (H_{u} (P, Q | | T) + H_{l} (P, Q | | T)) .$

Clearly, H_u (P, Q||T) = H_u (P, Q, T) - H_u (P, Q) , H_l (P, Q||T) = H_l (P, Q, T) - H_l (P, Q) , H (P, Q||T) = H (P, Q, T) - H (P, Q) .

3.3 Information amount of an FPA-space

Definition 3.24. ([43]) Let (U, R) be an FA-space. Then, the upper information, lower information and information amount of (U, R) are defined as follows: $\begin{matrix} E_{u} (R) = \sum_{i = 1}^{n} \frac{1}{n} (1 - \frac{| [x_{i}]^{R} |}{n} |), \\ E_{l} (R) = \sum_{i = 1}^{n} \frac{1}{n} (1 - \frac{| [x_{i}]_{R} |}{n} |), \\ E (R) = \frac{1}{2} (E_{u} (R) + E_{l} (R)) . \end{matrix}$

Definition 3.25. Let (U, P, R) be an FPA-space.

(1) Upper-information amount of (U, P, R) is defined as $E_{u} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]^{R_{P}} |) .$

(2) Lower-information amount of (U, P, R) is defined as

$E_{l} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]_{R_{P}} |) .$

(3) Information amount of (U, P, R) is defined as

$E (P, R) = \frac{1}{2} (E_{u} (P, R) + E_{l} (P, R)) .$

Given $\begin{matrix} U = {x_{1}, x_{2}, \dots, x_{n}}, P = {\frac{x_{1}, x_{2}, \dots, x_{n}}{p_{1}, p_{2}, \dots, p_{n}}}, \\ M (R) = (r_{ij})_{n \times n} . \end{matrix}$

Then $\begin{matrix} E_{u} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - \sum_{j = 1}^{n} p_{j} r_{ij}) \\ E_{l} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - p_{i} \sum_{j = 1}^{n} r_{ji}) . \end{matrix}$

If $p_{1} = p_{2} = \dots = p_{n} = \frac{1}{n}$ , then $\begin{matrix} E_{u} (P, R) = \sum_{i = 1}^{n} \frac{1}{n} (1 - \frac{| [x_{i}]^{R} |}{n} |) = E_{u} (R), \\ E_{l} (P, R) = \sum_{i = 1}^{n} \frac{1}{n} (1 - \frac{| [x_{i}]_{R} |}{n} |) = E_{l} (R), \\ E (P, R) = \frac{1}{2} (E_{u} (P, R) + E_{l} (P, R)) \\ = \frac{1}{2} (E_{u} (R) + E_{l} (R)) = E (R) . \end{matrix}$

Example 3.26. (Continue to Example 2.4) $\begin{matrix} E_{u} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]^{R_{P}} |) \approx 0.6673, \\ E_{l} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]_{R_{P}} |) \approx 0.5784, \\ E (P, R) = \frac{1}{2} (E_{u} (P, R) + E_{l} (P, R)) \approx 0.6228 . \end{matrix}$

This example illustrates that E_u (P, R) ≠ E_l (P, R) . Thus, Definition 3.25 is reasonable.

Theorem 3.27. Let (U, P, R) be an FPA-space.

(1) If T ⊂ Q, then E (P, Q) < E (P, T).

(2) If T ⊆ Q, then E (P, Q) ≤ E (P, T).

Proof. (1) Since T ⊂ Q, we have T_P ⊂ Q_P. Then ∀ i, [x_i] ^{T
_P} ⊆ [x_i] ^{Q
_P} (x_i) and ∃ i′, [x_i′] ^{T
_P} ⊊ [x_i′] ^{Q
_P}.

Thus ∀ i, j, T_P (x_i, x_j) ≤ Q_P (x_i, x_j) and ∃ i′, j′, T_P (x_i′, x_j′) < Q_P (x_i′, x_j′) .

By Definition 3.25, $\begin{matrix} E_{u} (P, T) = \sum_{i = 1}^{n} p_{i} (1 - \sum_{j = 1}^{n} T_{P} (x_{i}, x_{j})), \\ E_{u} (P, Q)) = \sum_{i = 1}^{n} p_{i} (1 - \sum_{j = 1}^{n} Q_{P} (x_{i}, x_{j})) . \end{matrix}$

Thus E_u (P, Q) < E_u (P, T).

In the same manner, we can prove that E_l (P, Q) < E_l (P, T).

Hence E (P, Q) < E (P, T).

(2) The proof is similar to (1).□

The above theorem indicates that the fuzzy information amount increases while the FPA-space becomes finer; inversely, it decreases while the FPA-space becomes coarser.

Theorem 3.28. Let (U, P, R) be an FPA-space. Then $G (P, R) + E (P, R) = 1 .$

Proof. By Definitions 3.2 and 3.25, $\begin{matrix} G_{u} (P, R) = \sum_{i = 1}^{n} p_{i} | [x_{i}]^{R_{P}} | \\ E_{u} (P, R) = \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]^{R_{P}} |) . \end{matrix}$

Then $\begin{matrix} G_{u} (P, R) + E_{u} (P, R) \\ = \sum_{i = 1}^{n} p_{i} | [x_{i}]^{R_{P}} | + \sum_{i = 1}^{n} p_{i} (1 - | [x_{i}]^{R_{P}} |) \\ = \sum_{i = 1}^{n} p_{i} | [x_{i}]^{R_{P}} | + \sum_{i = 1}^{n} (p_{i} - p_{i} | [x_{i}]^{R_{P}} |) \\ = \sum_{i = 1}^{n} p_{i} \\ = 1 . \end{matrix}$

So $G_{u} (P, R) + E_{u} (P, R) = 1 .$

Similarly, we can prove that G_l (P, R) + E_l (P, R) =1.

Thus $G (P, R) + E (P, R) = 1 .$ □

Proposition 3.29. Let (U, P, R) be an FPA-space. Then $0 \leq E_{u} (P, R) \leq 1 .$ Moreover, if R = o, then E_u (P, R) reaches the maximum value 1; if R = ω, then E_u (P, R) reaches the minimum value 0.

Proof. The proof is straightforward from Proposition 3.4 and Theorem 3.28.□

Proposition 3.30. Let (U, P, R) be an FPA-space. Then $1 - \sum_{i = 1}^{n} {np}_{i}^{2} \leq E_{l} (P, R) \leq 1 .$ Moreover, if R = o, then E_l (P, R) reaches the maximum value 1; if R = ω, then E_l (P, R) reaches the minimum value $1 - \sum_{i = 1}^{n} {np}_{i}^{2}$

Proof. The proof is straightforward from Proposition 3.5 and Theorem 3.28.□

Proposition 3.31. Let (U, P, R) be an FPA-space. Then $\frac{1}{2} - \frac{1}{2} \sum_{i = 1}^{n} {np}_{i}^{2} \leq E (P, R) \leq 1 .$ Moreover, if R = o, then E (P, R) reaches the maximum value 1; if R = ω, then E (P, R) reaches the minimum value $\frac{1}{2} - \frac{1}{2} \sum_{i = 1}^{n} {np}_{i}^{2}$

Proof. The proof is straightforward from Propositions 3.30 and 3.31.□

4 Examples and effectiveness analysis

In this section, we carry out effective analysis through examples from three sides of dispersion analysis, Friedman test and Nemenyi test, and association analysis in statistics.

4.1 Examples

In this subsection, some fuzzy relational matrices were randomly generated by MATLAB. To assess the influence of probability distribution on the proposed measures, we assume that the probability of only one object is different from the other objects. All the other objects have equal probability, and the equal probability increases from 0.02 to 0.14 with a step size of 0.02.

Example 4.1. Suppose U = {x₁, x₂, x₃, x₄, x₅, x₆, x₇}. In order to effectively analyze the monotony of the proposed measurement methods, eight fuzzy relational matrices were randomly generated by MATLAB, as shown below: $M (R_{1}) = (\begin{matrix} 0.1115 & 0.0166 & 0.0418 & 0.0114 & 0.0376 & 0.0413 & 0.0006 \\ 0.0526 & 0.0952 & 0.1041 & 0.0901 & 0.0107 & 0.0456 & 0.1063 \\ 0.0500 & 0.0814 & 0.0292 & 0.0428 & 0.0792 & 0.0740 & 0.0869 \\ 0.0692 & 0.0439 & 0.0087 & 0.0708 & 0.0333 & 0.1131 & 0.0703 \\ 0.0745 & 0.0391 & 0.0618 & 0.0818 & 0.1118 & 0.0331 & 0.0422 \\ 0.0633 & 0.0343 & 0.0153 & 0.0816 & 0.0375 & 0.1187 & 0.0416 \\ 0.0734 & 0.0187 & 0.0503 & 0.0876 & 0.0894 & 0.0923 & 0.0097 \end{matrix}),$ $M (R_{2}) = (\begin{matrix} 0.1115 & 0.0166 & 0.1145 & 0.0977 & 0.1350 & 0.0413 & 0.0551 \\ 0.0939 & 0.0952 & 0.1041 & 0.0901 & 0.0258 & 0.0713 & 0.1063 \\ 0.0817 & 0.1001 & 0.0806 & 0.0717 & 0.0792 & 0.0740 & 0.0869 \\ 0.0692 & 0.1004 & 0.1222 & 0.0708 & 0.0333 & 0.1131 & 0.0703 \\ 0.1335 & 0.0729 & 0.0842 & 0.0818 & 0.1118 & 0.0798 & 0.0422 \\ 0.0633 & 0.0875 & 0.0726 & 0.1296 & 0.0618 & 0.1187 & 0.0416 \\ 0.1354 & 0.1066 & 0.1219 & 0.0876 & 0.0894 & 0.0923 & 0.1076 \end{matrix}),$ $M (R_{3}) = (\begin{matrix} 0.1115 & 0.1170 & 0.1145 & 0.0977 & 0.1350 & 0.0413 & 0.0551 \\ 0.1204 & 0.1629 & 0.1041 & 0.1077 & 0.0622 & 0.0959 & 0.1285 \\ 0.1269 & 0.1001 & 0.0806 & 0.0717 & 0.0792 & 0.0859 & 0.0869 \\ 0.0692 & 0.1004 & 0.1275 & 0.0937 & 0.0835 & 0.1131 & 0.0875 \\ 0.1335 & 0.0970 & 0.1531 & 0.0818 & 0.1362 & 0.1439 & 0.0885 \\ 0.0633 & 0.0875 & 0.1154 & 0.1296 & 0.0618 & 0.1187 & 0.1456 \\ 0.1354 & 0.1066 & 0.1618 & 0.0876 & 0.0894 & 0.0923 & 0.1563 \end{matrix}),$ $M (R_{4}) = (\begin{matrix} 0.1115 & 0.1516 & 0.1632 & 0.0977 & 0.1455 & 0.0572 & 0.1300 \\ 0.1958 & 0.1832 & 0.1041 & 0.1896 & 0.0622 & 0.1758 & 0.1285 \\ 0.1269 & 0.1259 & 0.1521 & 0.0792 & 0.1111 & 0.1873 & 0.1714 \\ 0.0692 & 0.1912 & 0.1275 & 0.1065 & 0.0910 & 0.1612 & 0.1886 \\ 0.1335 & 0.1510 & 0.1531 & 0.1807 & 0.1362 & 0.1938 & 0.0885 \\ 0.0720 & 0.1052 & 0.1683 & 0.1296 & 0.1088 & 0.1187 & 0.1456 \\ 0.1354 & 0.1700 & 0.1618 & 0.1328 & 0.1037 & 0.0923 & 0.1598 \end{matrix}),$ $M (R_{5}) = (\begin{matrix} 0.1115 & 0.1968 & 0.1884 & 0.2112 & 0.1455 & 0.0572 & 0.1806 \\ 0.1958 & 0.1832 & 0.1041 & 0.1896 & 0.1049 & 0.1973 & 0.1285 \\ 0.1269 & 0.1327 & 0.2153 & 0.1758 & 0.1665 & 0.1873 & 0.1714 \\ 0.1432 & 0.2310 & 0.1645 & 0.1065 & 0.2433 & 0.2209 & 0.1886 \\ 0.1431 & 0.2465 & 0.1531 & 0.1807 & 0.1362 & 0.1938 & 0.0885 \\ 0.1854 & 0.2438 & 0.1683 & 0.1296 & 0.1328 & 0.2435 & 0.1838 \\ 0.1354 & 0.1700 & 0.2409 & 0.1328 & 0.2327 & 0.1890 & 0.2057 \end{matrix}),$

$M (R_{6}) = (\begin{matrix} 0.1145 & 0.2261 & 0.1884 & 0.2112 & 0.2855 & 0.0966 & 0.2975 \\ 0.1958 & 0.1832 & 0.2463 & 0.3293 & 0.2008 & 0.3105 & 0.1285 \\ 0.2220 & 0.3260 & 0.2153 & 0.1758 & 0.1665 & 0.1873 & 0.1714 \\ 0.2768 & 0.2310 & 0.1645 & 0.3144 & 0.2433 & 0.2918 & 0.2339 \\ 0.2245 & 0.2890 & 0.1531 & 0.1807 & 0.2182 & 0.1938 & 0.1205 \\ 0.1854 & 0.2438 & 0.1683 & 0.3083 & 0.1328 & 0.2948 & 0.1838 \\ 0.1354 & 0.1700 & 0.2409 & 0.2514 & 0.2327 & 0.1890 & 0.2057 \end{matrix}),$ $M (R_{7}) = (\begin{matrix} 0.1601 & 0.4665 & 0.2110 & 0.2492 & 0.2855 & 0.1623 & 0.4697 \\ 0.1967 & 0.1832 & 0.2463 & 0.3293 & 0.2008 & 0.3105 & 0.3340 \\ 0.2220 & 0.3260 & 0.3371 & 0.1758 & 0.3382 & 0.4325 & 0.2406 \\ 0.2768 & 0.2541 & 0.1645 & 0.3144 & 0.2848 & 0.4971 & 0.2339 \\ 0.2969 & 0.3796 & 0.3900 & 0.1807 & 0.2758 & 0.4192 & 0.2976 \\ 0.2972 & 0.2438 & 0.4814 & 0.3083 & 0.4837 & 0.2948 & 0.1838 \\ 0.1354 & 0.4655 & 0.3441 & 0.2948 & 0.3142 & 0.3747 & 0.4798 \end{matrix}),$ $M (R_{8}) = (\begin{matrix} 0.4421 & 0.4665 & 0.2598 & 0.2492 & 0.5253 & 0.2728 & 0.6919 \\ 0.9620 & 0.8304 & 0.3540 & 0.3293 & 0.2832 & 0.6356 & 0.3340 \\ 0.6764 & 0.4641 & 0.3371 & 0.8367 & 0.7149 & 0.8062 & 0.4987 \\ 0.7061 & 0.2987 & 0.3507 & 0.3144 & 0.7344 & 0.6679 & 0.9860 \\ 0.9577 & 0.5233 & 0.3900 & 0.7813 & 0.4496 & 0.4192 & 0.9049 \\ 0.9399 & 0.8316 & 0.8357 & 0.5315 & 0.4837 & 0.2948 & 0.5752 \\ 0.8338 & 0.5391 & 0.3842 & 0.2948 & 0.3142 & 0.3747 & 0.7665 \end{matrix}) .$ Obviously, $R_{1} \subset R_{2} \subset R_{3} \subset R_{4} \subset R_{5} \subset R_{6} \subset R_{7} \subset R_{8} .$ Suppose k = 1, ⋯ , 7, $\begin{matrix} X_{G}^{P^{2} (k)} = {G (P^{2} (k), R_{1}), \dots, G (P^{2} (k), R_{8})}, \\ X_{E_{r}}^{P^{2} (k)} = {E_{r} (P^{2} (k), R_{1}), \dots, E_{r} (P^{2} (k), R_{8})}, \\ X_{H}^{P^{2} (k)} = {H (P^{2} (k), R_{1}), \dots, H (P^{2} (k), R_{8})}, \\ X_{E}^{P^{2} (k)} = {E (P^{2} (k), R_{1}), \dots, E (P^{2} (k), R_{8})} . \end{matrix}$ $\begin{matrix} P^{1} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.02, 0.02, 0.88, 0.02, 0.02}}, \\ P^{2} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.04, 0.04, 0.76, 0.04, 0.04}}, \end{matrix}$ $\begin{matrix} P^{3} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.06, 0.06, 0.64, 0.06, 0.06}}, \\ P^{4} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.08, 0.08, 0.52, 0.08, 0.08}}, \\ P^{5} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.10, 0.10, 0.40, 0.10, 0.10}}, \\ P^{6} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.12, 0.12, 0.28, 0.12, 0.12}}, \\ P^{7} (k) = {\frac{x_{1}, \dots, x_{k}, \dots, x_{7}}{0.14, 0.14, 0.16, 0.14, 0.14}} . \end{matrix}$ Put $\begin{matrix} P^{1} = {P^{1} (1), \dots, P^{1} (7)}, P^{2} = {P^{2} (1), \dots, P^{2} (7)}, \\ P^{3} = {P^{3} (1), \dots, P^{3} (7)}, P^{4} = {P^{4} (1), \dots, P^{4} (7)}, \end{matrix}$ $\begin{matrix} P^{5} = {P^{5} (1), \dots, P^{5} (7)}, P^{6} = {P^{6} (1), \dots, P^{6} (7)}, \\ P^{7} = {P^{7} (1), \dots, P^{7} (7)} . \end{matrix}$

Obviously, ∀ j, k, P^j (k) is a probability distribution where only the probability of occurrence on x_k is different from the other elements. In addition, ∀ j, P^j is a set make up of different probability distributions. So, the probability distribution in P¹ has the highest concentration, while the probability distribution in P^j (j ≠ 1) gradually decreases in turn. Therefore, for each j, k = 1, … 7, i = 1, … 8, (U, P^j (k) , R_i) can be seen as an FPA-space. In the following experiment, we compare four tools for measuring the uncertainty of an FPA-space.

Uncertainty measurement for an FPA-space shows certain regularity on the fuzzy relation and the probability distribution, which is reflected in the following facts (see Figs. 2-8):

Fig. 2

Values of uncertainty measurement on (U, P¹ (k) , R_i).

Fig. 3

Values of uncertainty measurement on (U, P² (k) , R_i).

Fig. 4

Values of uncertainty measurement on (U, P³ (k) , R_i).

Fig. 5

Values of uncertainty measurement on (U, P⁴ (k) , R_i).

Fig. 6

Values of uncertainty measurement on (U, P⁵ (k) , R_i).

Fig. 7

Values of uncertainty measurement on (U, P⁶ (k) , R_i).

Fig. 8

Values of uncertainty measurement on (U, P⁷ (k) , R_i).

(1) When the probability distribution is a fixed value, E and H monotonically decrease as the similarity between any two objects increases, while G and Er is the opposite;

(2) When the probability distribution is a fixed value, if the similarity between any two objects is high, then Er is mostly higher than G, H and E;

(3) When the probability distribution is a fixed value, if the similarity between any two objects is low, then H is mostly higher than G, Er and E;

(4) When the similarity between any two objects is given, the more concentrated the probability distribution is, the greater the effect on G, Er, H, and E is. In addition, G and E are more sensitive than Er and H;

(5) When the similarity between any two objects is given, the more concentration of probability distribution is, the larger G is, and the opposite is true for E.

4.2 Dispersion analysis

In statistics, one often considers the dispersion degree of numerical data by using standard deviation coefficient. Since H and E may be less than 0, this paper analyzes the dispersion degree of those measures by using modified standard deviation coefficient. Among them, the larger the modified standard deviation, the higher the dispersion of data. Instead, it refers to the degree of data dispersion is lower.

Suppose that X = {x₁, ⋯ , x_n} is a data set. Then, $\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ and $σ (X) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}}$ are its arithmetic average value and standard deviation respectively. Therefore, its modified standard deviation coefficient can be defined as follows: $CV (X) = \frac{σ (X)}{| \bar{x} |} .$ (4.1)

Example 4.2. (Continue to Example 4.1) Denote $\begin{matrix} X_{G}^{P^{2} (k)} = {G (P^{2} (k), R_{1}), \dots, G (P^{2} (k), R_{8})}, \\ X_{E_{r}}^{P^{2} (k)} = {E_{r} (P^{2} (k), R_{1}), \dots, E_{r} (P^{2} (k), R_{8})}, \\ X_{H}^{P^{2} (k)} = {H (P^{2} (k), R_{1}), \dots, H (P^{2} (k), R_{8})}, \\ X_{E}^{P^{2} (k)} = {E (P^{2} (k), R_{1}), \dots, E (P^{2} (k), R_{8})} . \end{matrix}$ $\begin{matrix} X_{G}^{P^{j} (2)} = {G (P^{j} (2), R_{1}), \dots, G (P^{j} (2), R_{8})}, \\ X_{E_{r}}^{P^{j} (2)} = {E_{r} ((P^{j} (2), R_{1}), \dots, E_{r} ((P^{j} (2), R_{8})}, \\ X_{H}^{P^{j} (2)} = {H (P^{j} (2), R_{1}), \dots, H (P^{j} (2), R_{8})}, \\ X_{E}^{P^{j} (2)} = {E (P^{j} (2), R_{1}), \dots, E (P^{j} (2), R_{8})} \end{matrix}$ $\begin{matrix} X_{G}^{R_{i}} (2) = {G (P^{2} (1), R_{i}), \dots, G (P^{2} (7), R_{i})}, \\ X_{E_{r}}^{R_{i}} (2) = {E_{r} (P^{2} (1), R_{i}), \dots, E_{r} (P^{2} (7), R_{i})}, \\ X_{H}^{R_{i}} (2) = {H (P^{2} (1), R_{i}), \dots, H (P^{2} (7), R_{i})}, \\ X_{E}^{R_{i}} (2) = {E (P^{2} (1), R_{i}), \dots, E (P^{2} (7), R_{i})} . \end{matrix}$

According to Formula 4.1, the CV values of the above measure sets are obtained (see Figs. 9-11)