Information structures and uncertainty in an image information system

Abstract

An information system as a database that stands for relationships between objects and attributes is an important mathematical model. An image information system is an information system where each of its information values is an image and its information structures embody internal features of this type of information system. Uncertainty measurement is an effective tool for evaluation. This paper explores measures of uncertainty for an information system by using the proposed information structures. The distance between two objects in an image information system is first given. After that, the fuzzy T_cos-equivalence relation, induced by this system by using Gaussian kernel method, is obtained, where Gaussian kernel is based on this distance. Next, information structures of this system are described by set vectors, dependence between information structures is studied and properties of information structures are given by using inclusion degree, and application for information structures and uncertainty measures of an image information system are investigated by the information structures. Moreover, effectiveness analysis is done to show the feasibility of the proposed measures from the angle of statistics. Finally, an application of the proposed measurement for attribute reduction is given. These results will be helpful for understanding the essence of uncertainty in an image information system.

Keywords

Granular computing image information system distance information structure dependence inclusion degree uncertainty measure

1 Introduction

Granular computing, presented by Zadeh [43, 44], is a basic issue in knowledge representation and data mining. Its purpose is to find an approximation scheme, which allows us to view a phenomenon with different levels of granularity and then can effectively solve complex problems. Information granulation, organization and causation are basic concepts of granular computing. Information granule, is a family of objects and described by some constraints, such as indistinguishability, similarity or functionality. The process of constructing information granules is called information granulation. It granulates a universe into a family of disjoint or overlapping information granules. Granular structure is the family of information granules where the internal structure of each information granule is visible as a sub-structure. Naturally, granular structure can be depicted as a vector consisting of information granules. Lin [14, 15] and Yao [38, 39] explained the importance of granular computing, this aroused people’s attention on it. Until now, the search on granular computing mainly has four methods, i.e., rough set theory [23], fuzzy set theory [42], concept lattice [21] and quotient space theory [48].

Rough set theory is an important tool to deal with uncertainty. Rough set theory proposed by Pawlak [23, 24] as an important approach for the joint management of uncertainty is developed around the concept of an information system. Most applications of rough sets, such as uncertainty modeling [5 , 31], reasoning with uncertainty [38], rule extraction [4, 33], classification and feature selection [13 , 32] are related to information systems.

In an information system, the study of information structure is an important research topic. An equivalence relation is a special kind of similarity between two objects from a data set. Given an information system, each attribute subset determines an equivalence relation on the object set of this system. This equivalence relation partitions the object set into some disjoint classes, these classes are said to be equivalence classes. If two objects belong to the same equivalence class, then we may say that they cannot be distinguished under this equivalence relation. Thus, each equivalence class is seen as an information granule consisting of indistinguishable objects [4]. The family of all these information granules constitutes a vector, this vector is said to be an information structure in the given information system induced by this attribute subset. Actually, information structures in an information system are granular structures in the meaning of granular computing.

Uncertainty, including randomness, fuzziness, vagueness, incompleteness and inconsistency, nearly exists in everywhere of the actual world. Uncertainty measurement is a basic problem in the research of many fields, such as machine learning [35], pattern recognition [6], image processing [22], medical diagnosis [10], information retrieval [34] and data mining [8]. A number of scholars have done some exploration in this aspect and many excellent research contributions have been made. For example, Yao et al. [39] gave a granularity measure from the angle of granulation; Wierman [33] presented measures of uncertainty and granularity in rough set theory; Bianucci et al. [1] explored entropy and co-entropy approaches for uncertainty measurements of coverings; Beaubouef et al. [3] proposed a method for measuring the uncertainty of rough sets. Liang et al. [18] investigated information granulation in complete and incomplete information systems; Dai et al. [9] researched entropy measures and granularity measures for set-valued information systems; Qian et al. [26, 27] presented the axiomatic definition of information granulation in a knowledge base and studied information granularity of a fuzzy relation by using its fuzzy granular structure; Yao [37] studied several types of information-theoretical measures for attribute importance in rough set theory; Xu et al. [36] considered knowledge granulation in ordered information systems.

With the arrival of the information age, information acquisition, analysis and processing have become a hot research topic in the field of information technology. Information uncertainty analysis, information fusion, attribute reduction and classification are becoming more and more important. Multi-source information includes the following types: different sources of the same type of information, different sources of the different types of information and multilingual information. Through multi-source information fusion, it is helpful to further excavate the value of data and enhance the function of information analysis; through the cross validation of multi-source information, information errors and omissions can be reduced, and decision-making errors can be prevented. Multi-source information or data mean that they comes from many kinds of attributes. If an information system has kinds of attributes or data, such as boolean attributes, categorical attributes, real-valued attributes, set-valued attributes, interval-valued attributes, missing value, text, image, video, audio, sensor signal data and so on, then this information system can be called a hybrid information system. Zeng et al. [45, 46] studied a hybrid information system in which it has boolean attributes, categorical attributes, real-valued attributes, set-valued attributes, interval-valued attributes, image attributes and missing values, and gave a fuzzy rough set approach for incremental feature selection on this system.

The purpose of this paper is to study information structures and uncertainty in an image information system where each of its information values is an image. It is worth mentioning that this paper only measure uncertainty and does not carry out image processing. Thus, obtained the results can not apply in image processing. To date, no scholar has studied an image information system. Thus, it is also not convenient for us to compare methods.

The remaining part of this paper is organized as follows. In Section 2, we recall some basic concepts about fuzzy sets, fuzzy relations and image information systems. In Section 3, we introduce the distances in an image information system, In Section 4, we give the fuzzy T_cos-equivalence relation induced by an image information system with images by using Gaussian kernel method. In Section 5, we investigate information structures in an image information system. In Section 6, we give some tools for measuring uncertainty of an image information system. In Section 7, to evaluate the performance of the proposed measures in an image information system, we conduct numerical experiments and do effectiveness analysis from the angle of statistics. In Section 8, we give an application of the proposed measurement for attribute reduction. In Section 9, we make comparison and discussion. Section 10 summarizes this paper.

2 Preliminaries

We first review some basic concepts about fuzzy sets and fuzzy relations, and then propose the concept of image information systems.

Throughout this paper, U denotes a finite set called the universe, 2^U denotes the family of all subsets of U. I denotes the unit interval [0, 1].

Put $U = {x_{1}, x_{2}, \dots, x_{n}} .$

2.1 Fuzzy sets and fuzzy relations

Fuzzy sets are extensions of ordinary sets [42]. A fuzzy set P in U is defined as a function assigning to each element x of U a value P (x) ∈ I and P (x) is called the membership degree of x to the fuzzy set P.

In this paper, I^U denotes the set of all fuzzy sets in U. The cardinality of P ∈ I^U can be calculated with $| P | = \sum_{i = 1}^{n} P (x_{i}) .$

If R is a fuzzy set in U × U, then R is called a fuzzy relation on U. In this paper, I^U×U denotes the set of all fuzzy relations on U.

Let R ∈ I^U×U. Then R may be represented by $M (R) = (\begin{matrix} r_{11} & r_{12} & . . . & r_{1 n} \\ r_{21} & r_{22} & . . . & r_{2 n} \\ . . . & . . . & . . . & . . . \\ r_{n 1} & r_{n 2} & . . . & r_{nn} \end{matrix}),$ where r_ij = R (x_i, x_j) ∈ I means the degree of similarity between two objects x_i and x_j.

If M (R) = E (E is an identity matrix), then R is said to be a fuzzy identity relation, and we write as R =▵; if r_ij = 1, i, j ≤ n, then R is said to be a fuzzy universal relation, and we write as R = ω.

Let R ∈ I^U×U. For each x ∈ U, we define a fuzzy set S_R (x): $S_{R} (x) (y) = R (x, y) .$ Then S_R (x) can be viewed as the fuzzy neighborhood or the information granule of the point x (see [27]).

Definition 2.1. [20] A function T : I² → I is called a t-norm, if it satisfies the following conditions:

(1) Commutativity: T (a, b) = T (b, a) ,

(2) Associativity: T (T (a, b) , c) = T (a, T (b, c)) ,

(3) Monotonicity: a ≤ c, b ⩽ d = T (a, b) ⩽ T (c, d) ,

(4) Boundary condition: T (a, 1) = a .

Example 2.2. For any x, y ∈ U, define $T_{\cos} (x, y) = (x \cdot y - \sqrt{1 - x^{2}} \cdot \sqrt{1 - y^{2}}) \lor 0 .$ Then T_cos is a t-norm.

Definition 2.3. [46] Let T be the t-norm. Suppose R ∈ I^U×U. Then R is a T-fuzzy equivalence relation on U if it satisfies the following conditions:

(1) Reflexivity: R (x, x) =1,

(2) Symmetry: R (x, y) = R (y, x) ,

(3) T-transitivity: T (R (x, y) , R (y, z)) ⩽ R (x, z) .

Proposition 2.4. [19] Suppose that f : U × U → I satisfies f (x, x) =1 for all x ∈ U . Then for any x, y, z ∈ U, $T_{\cos} (f (x, y), f (y, z)) \leq f (x, z) .$

Corollary 2.5. Given R ∈ I^U×U. If R is reflexive, then R is T_cos-transitive.

2.2 Image information systems

Definition 2.6. [23] Let U be an object set and A a attribute set. Suppose that U and A are finite sets. Then the pair (U, A) is called an information system, if each attribute a ∈ A determines an information function a : U → V_a, where V_a is the set of information function values of the attribute a.

If P ⊆ A, then (U, P) is called a subsystem of (U, A).

Definition 2.7. Let (U, A) be an information system. If any a ∈ A and x ∈ U, a (x) is an image, then (U, A) is called an image information system.

If P ⊆ A, then (U, P) is called a subsystem of (U, A).

Example 2.8. Table 1 is a (CT)-image information system with “Ankle", “Head", “Hip", “Knee", “Pelvis" and “Shoulder" (denoted as a₁, a₂, a₃, a₄, a₅, a₆, respectively).

Table 1
An image information system

3 The distance between two objects in an image information system

In this paper, the getGaborKernel function of OpenCV is used to construct convolution kernel, and the filter 2D function is applied to obtain the feature vector of an image. In this way, the image as an information value is transformed into feature vector as an information value. The relationship between two feature vectors can be easily described by distance.

Definition 3.1. Let (U, A) be an image information system. Given x, y ∈ U and a ∈ A. Then the distance between a (x) and a (y) is defined as $d (a (x), a (y)) = \frac{| | a (x) - a (y) | |}{| | a_{\max} - a_{\min} | |},$ where || . || is the Euclidean distance, and a_max and a_min denote the maximum value and minimum value of V_a, respectively.

Example 3.2. In Table 1, a₁ (x₁) and a₁ (x₃) are two computed tomography (CT) images. Using Gabor Wavelets (GWs), fivelevel and eight-direction decomposition is performed, and 40 GW sub-bands are generated on both a₇ (x₁) and a₇ (x₃). The means and standard deviations of each sub-band are extracted, and all of the means and standard deviations are connected to an 80-dimensional feature vector [16]. To be exact,

X=<4.508, 14.245, 2.137, 7.206, 1.972, 6.77, 1.91, 6.512, 1.911, 6.517, 4.508, 14.245, 10.387, 32.338, 7.615, 24.713, 7.585, 24.622, 7.59, 24.642, 4.508, 14.245, 18.994, 50.446, 49.511, 80.783, 60.907, 91.703, 60.689, 91.513, 4.508, 14.245, 39.229, 65.569, 29.673, 58.588, 29.274, 58.319, 29.346, 58.371, 4.508, 14.245, 17.978, 47.001, 20.683, 50.435, 19.272, 48.557, 19.325, 48.614, 4.508, 14.245, 39.229, 65.569, 29.673, 58.588, 29.274, 58.319, 29.346, 58.371, 4.508, 14.245, 18.994, 50.446, 49.511, 80.783, 60.91, 91.705, 60.69, 91.515, 4.508, 14.245, 10.387, 32.338, 7.615, 24.713, 7.585, 24.622, 7.59, 24.642 >,

Y=<4.619, 14.747, 2.18, 7.262, 1.973, 6.752, 1.928, 6.529, 1.928, 6.532, 4.619, 14.747, 9.889, 30.914, 7.474, 24.214, 7.455, 24.141, 7.458, 24.158, 4.619, 14.747, 19.321, 53.147, 46.507, 78.065, 57.515, 88.639, 57.306, 88.453, 4.619, 14.747, 36.654, 65.957, 28.673, 61.002, 28.356, 60.812, 28.409, 60.847, 4.619, 14.747, 18.528, 51.164, 20.817, 53.632, 19.625, 52.264, 19.671, 52.305, 4.619, 14.747, 36.654, 65.957, 28.673, 61.002, 28.356, 60.812, 28.409, 60.847, 4.619, 14.747, 19.321, 53.147, 46.507, 78.065, 57.517, 88.642, 57.309, 88.456, 4.619, 14.747, 9.889, 30.914, 7.474, 24.214, 7.455, 24.142, 7.458, 24.158>

For images, let a_max =<255, ⋯ , 255 > and a_min = <0, ⋯ , 0 > .

By Definition 3.1, $\begin{matrix} d (a_{1} (x_{1}), a_{1} (x_{3})) \\ = \sqrt{\sum_{i = 1}^{n} (X_{i} - Y_{i})^{2}} / \sqrt{\sum_{i = 1}^{n} (225 - 0)^{2}} \\ \approx 0.0069 . \end{matrix}$

Based on the above definition and OpenCV, an algorithm for obtain feature vector of images can be designed as below.

Algorithm 1 Feature extraction

Require: An image information system (U, A), parameters ksize, sigma, theta, lambda, gamma, phi and ktype. ksize is the size of the filter returned, sigma is the standard deviation of the gaussian envelope, theta is the orientation of the normal to the parallel stripes a Gabor function, lambd is the wavelength of the sinusoidal factor, gamma is the spatial aspect ratio, phi is phase offset, ktype is the type and range of values that each pixel in the Gabor kernel can hold.

Ensure: A collection of 80-dimensional feature vectors

Initialize:

ksizes ← {(1 × 1) , (3 × 3) , (5 × 5) , (7 × 7) , (9 × 9)}

sigma ← 1.0

lambd ← π/2

gamma ← 1

phi ← 0

ktype ← cv2 . CV _ 32F

for each a ∈ A, x ∈ U do

while theta ≤ π do

for each ksize in ksizes do

kernel ← cv2 . getGaborKernel ()

fimg ← cv2 . filter2D (a (x_i) , -1, kernel)

matrix ← fimg

Extract the means and standard

deviations of matrix

Connect all of the means and standard

deviations to an 80-dimensional feature vector

end for

Put 80-dimensional feature vector to a collection

theta ← theta + π/8

end while

end for

return

A collection of 80-dimensional feature vectors

In Algorithm 1, the complexity of convolution depends directly on the size of the Gabor filter. The complexity of calculating the filter response for one point is O (M²), where M is the width and height of Gabor filter. If the filtering is done on the entire image of size N × N, the complexity becomes O (M²N²). To an image information system, the time complexity is O (|U||A|M²N²)

Definition 3.3. Let (U, A) be an image information system. Given P ⊆ A. ∀ x, y ∈ U, the distance between x and y in the subsystem (U, P) is defined as $d_{P} (x, y) = \sqrt{\sum_{a \in P} d^{2} (a (x), a (y))} .$

Proposition 3.4. Let (U, A) be an image information system. Given P ⊆ A. Then ∀ x, y ∈ U, $0 \leq d_{P} (x, y) \leq \sqrt{| P |} .$

Proof. By Definition 3.1, $\forall a \in P, \forall x, y \in U, 0 \leq d (a (x), a (y)) \leq 1 .$

Then $\forall x, y \in U, 0 \leq \sum_{a \in P} d^{2} (a (x), a (y)) \leq | P | .$

Thus $\forall x, y \in U, 0 \leq d_{P} (x, y) \leq \sqrt{| P |} .$

□

Example 3.5. (Continued from Example 2.8) By Definition 3.1 and Example 3.2, we have $\begin{matrix} d_{A} (x_{2}, x_{4}) = \sqrt{\sum_{a \in A} d^{2} (a (x_{2}), a (x_{4}))} \\ \approx 0.1724 \end{matrix}$

Example 3.6. (Continued from Example 2.8)

Put P_i = {a₁, ⋯ , a_i} (i = 1, 2, 3, 4, 5, 6).

Then P₆ = A, P₁ ⊆ P₂ ⊆ P₃ ⊆ P₄ ⊆ P₅ ⊆ P₆.

Thus (U, P₁), (U, P₂), (U, P₃), (U, P₄), (U, P₅) and (U, P₆) are six subsystems of (U, A).

By Definition 3.1 and Example 3.5,

$\begin{matrix} d_{P_{5}} (x_{2}, x_{4}) = \sqrt{\sum_{a \in P_{5}} d^{2} (a (x_{2}), a (x_{4}))} \\ = \sqrt{0 . 0007^{2} + 0 . 0263^{2} + 0 . 0129^{2} + 0 . 0848^{2} + 0 . 0056^{2}} \\ \approx 0.0899 . \end{matrix}$

Definition 3.7. Let (U, A) be an image information system. Given P ⊆ A. Then $d_{P} = (d (x_{i}, x_{j}))_{n \times n}$ is called the distance matrix of the subsystem (U, P).

Based on the above definitions, an algorithm for computing the distance between two images in an image information system can be designed as below.

Algorithm 2 The distance between two object in the subsystem (U, P)

Require: The result of Algorithm 1.

Ensure: Distance matrix.

for each a ∈ P do

for each x ∈ U do

matrix [i] [j] ← d (a (x_i) , a (x_j)) a (x) is the feature vector of each image is (U, P)

end for

return matrix

Algorithm 2 describes the calculation of distance matrix for subsystem (U, P). For a subsystem of size |U| × |P|, the computational complexity of this approach is O (|U||P|). When P = A, the complexity is O (|U||A|).

Example 3.8. (Continued from Examples 3.5 and 3.6) We have

4 The fuzzy T_cos-equivalence relation induced by an image information system

In this section, we give the fuzzy T_cos-equivalence relation induced by a image information system by means of Gaussian kernel method.

Gaussian kernel method is an important methodology in machine learning and pattern recognition. For making data linear and simplifying classification tasks, it maps data into a higher dimensional feature space [29, 41]. Hu et al. [11, 12] found that there are some relationships between rough sets and Gaussian kernel method, so Gaussian kernel is used to obtain fuzzy relations.

In a typical kernel learning algorithm, the nonlinear mapping realized by Gaussian kernel function transforms the original data matrix into a Gaussian kernel matrix, which represents the structure and describes relationships between samples. Kernel matrix plays an important role in a kernel learning algorithm because it contains all available information for further learning. The learning algorithm depends on the training data information obtained by kernel matrix.

A Gaussian kernel matrix can be regarded as a relation matrix, and a relation matrix can be regarded as a Gaussian kernel matrix. It can be found that there is a high similarity between Gaussian kernel method and rough sets. Most relation matrices used in existing rough set models satisfy the conditions of Gaussian kernel function, which are positive-semidefinite and symmetric. Simultaneously, Gaussian kernel matrices are symmetric and reflective. This means that Gaussian kernel matrices can be used as fuzzy relation matrices of fuzzy rough sets.

In the last section, the image as an information value is transformed into feature vector as an information value, and the distance between two images is transformed into the distance between feature vectors. Gaussian kernel is based on the distance. Thus, the images are processed by means of Gaussian kernel.

Below, we use Gaussian kernel to extract a fuzzy T_cos-equivalence relation on the object set of a given image information system.

Gaussian kernel $G (x_{,} y) = \exp (- \frac{∥ x - y ∥^{2}}{2 δ^{2}})$ is using to compute the similarity between two objects x and y, where ∥x - y∥ is the Euclidean distance between two objects x and y, δ is a threshold. In this paper, pick δ ∈ (0, 1].

Obviously, G (x, y) satisfies:

(1) G (x, y) ∈ [0, 1];

(2) G (x, y) = G (y, x);

(3) G (x, x) =1.

The shortcoming of Gaussian kernel method is that it heavily depends on the threshold δ. Sometimes, differences of δ lead to large differences of the extracted fuzzy relations.

Definition 4.1. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1], denote $R_{P}^{G} (δ) (x_{i}, x_{j}) = \exp (- \frac{d_{P}^{2} (x_{i}, x_{j})}{2 δ^{2}}),$ $M (R_{P}^{G} (δ)) = (R_{P}^{G} (δ) (x_{i}, x_{j}))_{n \times n} .$ Then $M (R_{P}^{G} (δ))$ is called the Gaussian kernel matric of the subsystem (U, P) with respect to δ.

Theorem 4.2. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then $R_{P}^{G} (δ)$ is a fuzzy T_cos-equivalence relation on U.

Proof. This holds by Corollary 2.5.□

Definition 4.3. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then $R_{P}^{G} (δ)$ is called the fuzzy T_cos-equivalence relation induced by the subsystem (U, P) with respect to δ.

Example 4.4. (Continued from Example 2.8) Pick $δ = \sqrt{0.8}$ . Then

5 Information structures in an image information system

In this section, we investigate information structures in an image information system.

5.1 Some concepts of information structures in an image information system

Given R ∈ I^U×U. Then for each i, S_R (x_i) can be viewed as the fuzzy neighborhood or the information granule of the point x_i [27]. According to this view, Qian et al. [27] defined the fuzzy granular structure of R as follows: $S (R) = (S_{R} (x_{1}), S_{R} (x_{2}), \dots, S_{R} (x_{n})) .$

Let (U, A) be an image information system. Given $P \subseteq \vec{A}$ and δ ∈ (0, 1]. Then, by Theorem 4.2, $R_{P}^{G} (δ)$ is a fuzzy T_cos-equivalence relation on the object set U. For each i, $S_{R_{P}^{G} (δ)} (x_{i})$ can be viewed as the fuzzy neighborhood or the information granule of the point x_i. Based on Qian’s idea, $S (R_{P}^{G} (δ)) = (S_{R_{P}^{G} (δ)} (x_{1}), S_{R_{P}^{G} (δ)} (x_{2}), \dots, S_{R_{P}^{G} (δ)} (x_{n}))$ can be viewed as the fuzzy granular structure of $R_{P}^{G} (δ)$ . Thus, $S (R_{P}^{G} (δ))$ can be seen as the information structure of the subsystem (U, P) with respect to δ. Thus we give the concept of information structures in the following definition.

Definition 5.1. Let (U, A) be an image information system. For any P ⊆ A and δ ∈ (0, 1]. denote $S^{δ} (P) = (S_{R_{P}^{G} (δ)} (x_{1}), S_{R_{P}^{G} (δ)} (x_{2}), \dots, S_{R_{P}^{G} (δ)} (x_{n})) .$ Then S^δ (P) is called the information structure of the subsystem (U, P) with respect to δ or δ-information structure of the subsystem (U, P).

Example 5.2. (Continued from Example 4.4) $\begin{matrix} S^{\sqrt{0.8}} (A) = & (S_{R_{A}^{G} (\sqrt{0.8})} (x_{1}), S_{R_{A}^{G} (\sqrt{0.8})} (x_{2}), \\ \dots \dots, S_{R_{A}^{G} (\sqrt{0.8})} (x_{12})) \end{matrix}$ is $\sqrt{0.8}$ -information structure of (U, A).

Definition 5.3. Let (U, A) be an image information system. Given δ ∈ (0, 1]. Put $S^{δ} (U, A) = {S^{δ} (P) : P \subseteq A}$ Then is called δ-information structure base of (U, A).

Definition 5.4. (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. If for each i, $S_{R_{P}^{G} (δ_{1})} (x_{i}) = S_{R_{Q}^{G} (δ_{2})} (x_{i})$ , then S^{δ
₁} (P) and S^{δ
₂} (Q) are called to be the same. We write S^{δ
₁} (P) = S^{δ
₂} (Q).

Below, we propose dependence between information structures.

Definition 5.5. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A.

(1) S^{δ
₂} (Q) is called to depend on S^{δ
₁} (P), if for each i, $S_{R_{P}^{G} (δ_{1})} (x_{i})$ $\subseteq S_{R_{Q}^{G} (δ_{2})} (x_{i})$ , we write S^{δ
₂} (Q) ⪯ S^{δ
₁} (P); S^{δ
₂} (Q) is called to depend strictly on S^{δ
₁} (P), if S^{δ
₁} (P) ⪯ S^{δ
₂} (Q) and S^{δ
₁} (P) ≠ S^{δ
₂} (Q), we write S^{δ
₁} (P) ≺ S^{δ
₂} (Q).

(2) S^{δ
₂} (Q) is called to depend partially on S^{δ
₁} (P), if there exists i, $S_{R_{P}^{G} (δ_{1})} (x_{i})$ $⊑ S_{R_{Q}^{G} (δ_{2})} (x_{i})$ , we write S^{δ
₁} (P) ⊑ S^{δ
₂} (Q); S^{δ
₂} (Q) is called to depend partially strictly on S^{δ
₁} (P), if $S_{R_{P}^{G} (δ_{1})} (x_{i})$ $⊑ S_{R_{Q}^{G} (δ_{2})} (x_{i})$ and S^{δ
₁} (P) ≠ S^{δ
₂} (Q), we write S^{δ
₁} (P) ⊏ S^{δ
₂} (Q).

(3) S^{δ
₂} (Q) is called to be independent on S^{δ
₁} (P), if for each i, $S_{R_{P}^{G} (δ_{1})} (x_{i})$ $⊊ S_{R_{Q}^{G} (δ_{2})} (x_{i})$ , we write S^{δ
₁} (P) ⋈ S^{δ
₂} (Q).

Obviously, S^{δ
₁} (P) = S^{δ
₂} (Q) ⇔ S^{δ
₁} (P) ⪯ S^{δ
₂} (Q) and S^{δ
₂} (Q) ⪯ S^{δ
₁} (P) , S^{δ
₁} (P) ⪯ S^{δ
₂} (Q) ⇒ S^{δ
₁} (P) ⊑ S^{δ
₂} (Q) , S^{δ
₁} (P) ≺ S^{δ
₂} (Q) ⇒ S^{δ
₁} (P) ⊏ S^{δ
₂} (Q) .

5.2 Properties of information structures in an image information system

In this subsection, we give properties of information structures in an image information system.

Theorem 5.6. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. Then $S^{δ_{1}} (P) = S^{δ_{2}} (Q) \Leftrightarrow R_{P}^{G} (δ_{1}) = R_{Q}^{G} (δ_{2}) .$ Proof. Obviously.□

Theorem 5.7. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. Then $S^{δ_{1}} (P) ⪯ S^{δ_{2}} (Q) \Leftrightarrow R_{P}^{G} (δ_{1}) \subseteq R_{Q}^{G} (δ_{2}) .$ Proof. Clearly.□

Corollary 5.8. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. Then $S^{δ_{1}} (P) ≺ S^{δ_{2}} (Q) \Leftrightarrow R_{P}^{G} (δ_{1}) \subset R_{Q}^{G} (δ_{2}) .$ Proof. This follows from Theorems 5.6 and 5.7.□

Theorem 5.9. Let (U, A) be an image information system.

(1) If 0 < δ₁ ≤ δ₂ ≤ 1, then for any P ⊆ A, S^{δ
₁} (P) ⪯ S^{δ
₂} (P).

(2) If P ⊆ Q ⊆ A, then for any δ ∈ (0, 1], S^δ (Q) ⪯ S^δ (P). Proof. (1) For any i, j, it is clear that $\exp (- \frac{d_{P}^{2} (x_{i}, x_{j})}{2 δ_{1}^{2}}) \leq \exp (- \frac{d_{P}^{2} (x_{i}, x_{j})}{2 δ_{2}^{2}}) .$

Then $R_{P}^{G} (δ_{1}) (x_{i}, x_{j}) \leq R_{P}^{G} (δ_{2}) (x_{i}, x_{j}) .$

So $R_{P}^{G} (δ_{1}) \subseteq R_{P}^{G} (δ_{2}) .$

By Theorem 5.7, $S^{δ_{1}} (P) ⪯ S^{δ_{2}} (P) .$

(2) By Definition 4.1, $R_{P}^{G} (δ) (x_{i}, x_{j}) = \exp (- \frac{d_{P}^{2} (x_{i}, x_{j})}{2 δ^{2}}) .$ $R_{Q}^{G} (δ) (x_{i}, x_{j}) = \exp (- \frac{d_{Q}^{2} (x_{i}, x_{j})}{2 δ^{2}}) .$

Then $R_{Q}^{G} (δ) (x_{i}, x_{j}) \leq R_{P}^{G} (δ) (x_{i}, x_{j}) (1 \leq i, j \leq n) .$

So $R_{Q}^{G} (δ) \subseteq R_{P}^{G} (δ) .$

Thus, by Theorem 5.7, $S^{δ} (Q) ⪯ S^{δ} (P) .$

□

Corollary 5.10. Let (U, A) be an image information system. Given 0 < δ₁ ≤ δ₂ ≤ 1 and P ⊆ Q ⊆ A. Then S^{δ
₁} (Q) ⪯ S^{δ
₂} (Q) ⪯ S^{δ
₂} (P), S^{δ
₁} (Q) ⪯ S^{δ
₁} (P) ⪯ S^{δ
₂} (P). Proof. This holds by Theorem 5.9.□

Definition 5.11. [47] Let (U, A) be an image information system. Given that S^δ (U, A) is the fuzzy information structure base of (U, A). Then a mapping D : S^δ (U, A) × S^δ (U, A) → [0, 1] is called the inclusion degree on S^δ (U, A), if the following conditions hold:

(1) 0 ≤ D (S^δ (Q)/S^δ (P)) ≤1;

(2) S^δ (P) ⪯ S^δ (Q) implies

D (S^δ (Q)/S^δ (P)) =1;

(3) $S^{δ} (P) ⊑ S^{δ} (Q) ⊑ S^{δ} (\vec{L})$ implies

$D (S^{δ} (P) / S^{δ} (\vec{L})) \leq D (S^{δ} (P) / S^{δ} (Q))$ .

Definition 5.12. Let (U, A) be an image information system. For any P, Q ⊆ A, define

$\begin{matrix} D (S^{δ} (Q) / S^{δ} (P)) \\ = \sum_{l = 1}^{n} \frac{| S_{R_{Q}^{G} (δ)} (x_{l}) |}{\sum_{i = 1}^{n} | S_{R_{Q}^{G} (δ)} (x_{i}) |} χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l})), \end{matrix}$

where $\begin{matrix} χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l})) = \\ {\begin{matrix} 1, & if S_{R_{P}^{G} (δ)} (x_{l}) \subseteq S_{R_{Q}^{G} (δ)} (x_{l}), \\ 0, & if S_{R_{P}^{G} (δ)} (x_{l}) ⊈ S_{R_{Q}^{G} (δ)} (x_{l}) . \end{matrix} \end{matrix}$

Proposition 5.13. D in Definition 4.12 is the inclusion degree under Definition 5.11.

Proof. Obviously.□

Example 5.14. (Continued from Example 4.4). Let (U, A) be an image information system. Given P₁, P₂ ⊆ A. Then $\begin{matrix} D (S^{δ} (P_{1}) / S^{δ} (P_{2})) \\ = \sum_{l = 1}^{9} \frac{| S_{R_{P_{1}}^{G} (δ)} (x_{l}) |}{\sum_{i = 1}^{9} | S_{R_{P_{2}}^{G} (δ)} (x_{i}) |} χ_{S_{R_{P_{1}}^{G} (δ)} (x_{l})} (S_{R_{P_{2}}^{G} (δ)} (x_{l})) \\ = 1 . \end{matrix}$

$\begin{matrix} D (S^{δ} (P_{2}) / S^{δ} (P_{1})) \\ = \sum_{l = 1}^{9} \frac{| S_{R_{P_{2}}^{G} (δ)} (x_{l}) |}{\sum_{i = 1}^{9} | S_{R_{P_{2}}^{G} (δ)} (x_{i}) |} χ_{S_{R_{P_{2}}^{G} (δ)} (x_{l})} (S_{R_{p_{1}}^{G} (δ)} (x_{l})) \\ = 0 . \end{matrix}$

Thus $D (S^{δ} (p_{1}) / S^{δ} (p_{2})) + D (S^{δ} (p_{2}) / S^{δ} (p_{1})) = 1 .$

The following theorem shows the fact that relationships between information structures in an image information system can be quantitatively described by the inclusion degree.

Theorem 5.15. Let (U, A) be an image information system. Given P, Q ⊆ A.

(1) S^δ (P) ⪯ S^δ (Q) ⇔ D (S^δ (Q)/S^δ (P)) =1 .

(2) S^δ (P) ⋈ S^δ (Q) ⇔ D (S^δ (Q)/S^δ (P)) =0 .

(3) S^δ (P) ⊑ S^δ (Q) ⇔0 < D (S^δ (Q)/S^δ (P)) ≤1 .

Proof. (1) “⇒” is obvious. We prove “⇐”. Put $| S_{R_{Q}^{G} (δ)} (x_{l}) | = q_{l}, \sum_{l = 1}^{n} | S_{R_{Q}^{G} (δ)} (x_{l}) | = q .$ Then $q = \sum_{l = 1}^{n} q_{l}$ . Since D (S^δ (Q)/S^δ (P)) =1, we have $\sum_{l = 1}^{n} q_{l} χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l})) = q = \sum_{l = 1}^{n} q_{l} .$ Then $\sum_{l = 1}^{n} q_{l} (1 - χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l}))) = 0 .$ Thus ∀ l, $1 - χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l})) = 0 .$

It follows that ∀ l, $S_{R_{P}^{G} (δ)} (x_{l}) \subseteq S_{R_{Q}^{G} (δ)} (x_{l})$ .

Hence S^δ (P) ⪯ S^δ (Q).

(2) “⇒”. Since S^δ (P) ⋈ S^δ (Q), we have $S_{R_{P}^{G} (δ)} (x_{l}) ⊈ S_{R_{Q}^{G} (δ)} (x_{l}) (\forall l)$ . Then ∀ l, $χ_{S_{R_{Q}^{G} (δ)} (x_{l})} (S_{R_{P}^{G} (δ)} (x_{l})) = 0 .$

Thus D (S^δ (Q)/S^δ (P)) =0.

“⇐”. Since D (S^δ (Q)/S^δ (P)) =0, we obtain that ∀ l, $S_{R_{Q}^{G} (δ)} (x_{l}) (S_{R_{P}^{G} (δ)} (x_{l})) = 0 .$

Then ∀ l, $S_{R_{P}^{G} (δ)} (x_{l}) ⊈ S_{R_{Q}^{G} (δ)} (x_{l})$ . Thus S^δ (P) ⋈ S^δ (Q).

(3) This follows from (1) and (2).□

6 Measuring uncertainty of an image information system

Uncertainty measurement for an information system was investigated and relationships between these measures were discussed [18]. These measures include granulation measure, information entropy, rough entropy, and knowledge granulation. They have become an effective mechanism for evaluating the uncertainty of an information system. In this section, we propose some tools for measuring uncertainty of an image information system.

6.1 Granulation measurement for an image information system

We first give the axiom definition of information granulation in an image information system.

Definition 6.1. Let (U, A) be an image information system. Suppose that G^δ : 2^A → (- ∞ , + ∞) is a function. Given δ ∈ (0, 1]. Then G^δ is called an information granulation function in (U, A) with respect to δ, if G^δ satisfies the following conditions:

(1) Non-negativity: ∀ P ⊆ A, G^δ (P) ≥0;

(2) Invariability: ∀ P, Q ⊆ A, if S^δ (P) = S^δ (Q), then G^δ (P) = G^δ (Q);

(3) Monotonicity: ∀ P, Q ⊆ A, if S^δ (P) ≺ S^δ (Q), then G^δ (P) < G^δ (Q).

Here, G^δ (P) is called δ-information granulation of the subsystem (U, P).

Similar to Definition 5 in [27], δ-information granulation of an image information system is given in the following definition.

Definition 6.2. Suppose that (U, A) is an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then δ-information granulation of the subsystem (U, P) is defined as $G^{δ} (P) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{n} | S_{R_{P}^{G} (δ)} (x_{i}) | .$

Example 6.3. (Continued from Example 4.4) $\begin{matrix} G^{\sqrt{0.8}} (P_{1}) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{P_{1}}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{143.9856}{144} \approx 0.9999, \end{matrix}$ $\begin{matrix} G^{\sqrt{0.8}} (P_{2}) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{P_{2}}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{143.7408}{144} \approx 0.9982, \\ G^{\sqrt{0.8}} (P_{3}) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{P_{3}}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{143.1936}{144} \approx 0.9944, \\ G^{\sqrt{0.8}} (P_{4}) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{{\vec{P}}_{4}}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{142.2864}{144} \approx 0.9881, \\ G^{\sqrt{0.8}} (P_{5}) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{{\vec{P}}_{5}}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{142.2144}{144} \approx 0.9876, \\ G^{\sqrt{0.8}} (A) & = \frac{1}{12} \sum_{i = 1}^{n} \frac{1}{12} | S_{R_{A}^{G} (\sqrt{0.8})} (x_{i}) | \\ = \frac{142.1568}{144} \approx 0.9872 . \end{matrix}$

Proposition 6.4. Let (U, A) be an image information system. Then for any P ⊆ A and δ ∈ (0, 1], $\frac{1}{n} \leq G^{δ} (P) \leq 1 .$ Moreover, if $R_{P}^{G} (δ)$ is an universal relation on U, then G^δ achieves the minimum value $\frac{1}{n}$ ; if $R_{P}^{G} (δ)$ is a identity relation on U, then G^δ achieves the maximum value 1.

Proof.

Since ∀ i, $1 \leq | R_{P}^{G} (δ) (x_{i}) | \leq n$ , $n \leq \sum_{i = 1}^{n} | R_{P}^{G} (δ) (x_{i}) | \leq n^{2}$ . By Definition 6.2, $\frac{1}{n} \leq G^{δ} (P) \leq 1 .$

If $R_{P}^{G} (δ)$ is an identity relation on U, for any i, $| R_{P}^{G} (δ) (x_{i}) | = 1$ . So $G^{δ} (P) = \frac{1}{n}$ .

If $R_{P}^{G} (δ)$ is a universal relation on U, for any i, $| R_{P}^{G} (δ) (x_{i}) | = n$ . So G^δ (P) =1.□

Proposition 6.5. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. Then

(1) If S^{δ
₁} (P) ⪯ S^{δ
₂} (Q), then G^{δ
₁} (P) ≤ G^{δ
₂} (Q);

(2) If S^{δ
₁} (P) ≺ S^{δ
₂} (Q), then G^{δ
₁} (P) < G^{δ
₂} (Q).

Proof. (1) Since S^{δ
₁} (P) ⪯ S^{δ
₂} (Q), ∀ i, we have $S_{R_{P}^{G} (δ)} (x_{i}) \subseteq S_{R_{Q}^{G} (δ)} (x_{i})$ . Then $| S_{R_{P}^{G} (δ)} (x_{i}) | \leq | S_{R_{Q}^{G} (δ)} (x_{i}) |$ . By Definition 6.2, $G^{δ} (P) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{n} | S_{R_{P}^{G} (δ)} (x_{i}) |,$ $G^{δ} (Q) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{n} | S_{R_{Q}^{G} (δ)} (x_{i}) | .$ Thus $G^{δ_{1}} (P) \leq G^{δ_{2}} (Q) .$ (2) Since S^{δ
₁} (P) ≺ S^{δ
₂} (Q), we have S^{δ
₁} (P) ⪯ S^{δ
₂} (Q) and S^{δ
₁} (P) ≠ S^{δ
₂} (Q).

Then, ∀ i, $S_{R_{P}^{G} (δ_{1})} (x_{i}) \subseteq S_{R_{Q}^{G} (δ_{2})} (x_{i})$ and ∃ j, $S_{R_{P}^{G} (δ_{1})} (x_{j}) ⊊ S_{R_{Q}^{G} (δ_{2})} (x_{j})$ .

So, ∀ i, $| S_{R_{P}^{G} (δ_{1})} (x_{i}) | \leq | S_{R_{Q}^{G} (δ_{2})} (x_{i}) |$ and ∃ j, $| S_{R_{P}^{G} (δ_{1})} (x_{j}) | < | S_{R_{Q}^{G} (δ_{2})} (x_{j}) |$ .

Hence G^{δ
₁} (P) < G^{δ
₂} (Q).□

This proposition illustrates the fact that δ-information granulation increases when the available information becomes coarser, and it decreases when the available information becomes finer. In other words, the more uncertain the available information is, the bigger δ-information granulation value becomes. Thus, we can conclude that δ-information granulation introduced in Definition 6.2 can be used to evaluate the uncertainty of an image information system.

Proposition 6.6. Let (U, A) be an image information system.

(1) If 0 < δ₁ ≤ δ₂ ≤ 1, then for any P ⊆ A, G^{δ
₁} (P) ≤ G^{δ
₂} (P).

(2) If P ⊆ Q ⊆ A, then for any δ ∈ (0, 1], G^δ (Q) ≤ G^δ (P).

Proof. This holds by Theorem 5.9 and Proposition 6.5(1).□

Example 6.7. Pick $δ_{1} = \sqrt{0.6}$ , $δ_{2} = \sqrt{0.8}$ . Then

We have $G^{δ_{1}} (P_{5}) = \frac{141.6384}{144} \approx 0.9836,$ $G^{δ_{2}} (P_{5}) = \frac{142.2144}{144} \approx 0.9876 .$ Thus $G^{δ_{1}} (P_{5}) < G^{δ_{2}} (P_{5}) .$

Example 6.8. (Continued from Examples 3.6 and 4.4)

Since P₁ ⊆ P₅ ⊆ A, we have $G^{\sqrt{0.8}} (P_{1}) \approx 0.9999,$ $G^{\sqrt{0.8}} (P_{5}) \approx 0.9876,$ $G^{\sqrt{0.8}} (A) \approx 0.9872 .$

Thus $G^{\sqrt{0.8}} (A) < G^{\sqrt{0.8}} (P_{5}) < G^{\sqrt{0.8}} (P_{1}) .$

Theorem 6.9. G^δ in Definition 6.2 is an information granulation function under Definition 6.1.

Proof.

(1) Obviously, “Non-negativity” holds.

(2) Given δ ∈ (0, 1] and P, Q ⊆ A. If S^δ (P) = S^δ (Q), then ∀ i, $S_{R_{P}^{G} (δ)} (x_{i}) = S_{R_{Q}^{G} (δ)} (x_{i})$ .

By Definition 6.2, G^δ (P) = G^δ (Q).

(3) “Monotonicity” follows from Theorem 6.5.□

6.2 Entropy measurement for an image information system

In physics, entropy is often used to measure out-of-order degree of a system. The bigger the entropy value is, the higher out-of-order of a system will be. Shannon [28] applied the concept of entropy in physics to information theory for measurement uncertainty of a system.

Similar to Definition 8 in [17], δ-information entropy of a given image information system is defined as follows.

Definition 6.10. Suppose that (U, A) is an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then δ-information entropy of the subsystem (U, P) is defined as $H^{δ} (P) = - \sum_{i = 1}^{n} (\frac{1}{n} {log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{n}) .$

Example 6.11. (Continued from Example 4.4) $\begin{matrix} H^{\sqrt{0.8}} (P_{1}) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{P_{1}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0024}{12} \approx 0.0002, \\ H^{\sqrt{0.8}} (P_{2}) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{p_{2}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0312}{12} \approx 0.0026, \\ H^{\sqrt{0.8}} (P_{3}) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{P_{3}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0960}{12} \approx 0.0080, \\ H^{\sqrt{0.8}} (P_{4}) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{P_{4}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.2076}{12} \approx 0.0173, \end{matrix}$ $\begin{matrix} H^{\sqrt{0.8}} (P_{5}) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{P_{5}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.2148}{12} \approx 0.0179, \\ H^{\sqrt{0.8}} (A) & = - \sum_{i = 1}^{12} (\frac{1}{12} {log}_{2} \frac{| S_{R_{A}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.2232}{12} \approx 0.0186 . \end{matrix}$

Theorem 6.12. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A. Then

(1) If S^{δ
₁} (P) ⪯ S^{δ
₂} (Q), then H^{δ
₂} (Q) ≤ H^{δ
₁} (P);

(2) If S^{δ
₁} (P) ≺ S^{δ
₂} (Q), then H^{δ
₂} (Q) < H^{δ
₁} (P).

Proof. (1) Obviously.

(2) Since S^{δ
₁} (P) ≺ S^{δ
₂} (Q), similar to the proof of Proposition 6.5, we obtain that ∀ i, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{i}) | \leq | S_{R_{Q}^{G} (δ_{2})} (x_{i}) |$ and ∃ j, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{j}) | < | S_{R_{Q}^{G} (δ_{2})} (x_{j}) | .$

Then ∀ i, $\begin{matrix} - {log}_{2} \frac{| S_{R_{P}^{G} (δ_{1})} (x_{i}) |}{n} \\ = {log}_{2} \frac{n}{| S_{R_{P}^{G} (δ_{1})} (x_{i}) |} \geq {log}_{2} \frac{n}{| S_{R_{Q}^{G} (δ_{2})} (x_{i}) |} \\ = - {log}_{2} \frac{| S_{R_{Q}^{G} (δ_{2})} (x_{i}) |}{n}, \end{matrix}$

and ∃ j, $\begin{matrix} - {log}_{2} \frac{| S_{R_{P}^{G} (δ_{1})} (x_{j}) |}{n} \\ = {log}_{2} \frac{n}{| S_{R_{P}^{G} (δ_{1})} (x_{j}) |} > {log}_{2} \frac{n}{| S_{R_{Q}^{G} (δ_{2})} (x_{j}) |} \\ = - {log}_{2} \frac{| S_{R_{Q}^{G} (δ_{2})} (x_{j}) |}{n} . \end{matrix}$

Hence H^{δ
₂} (Q) < H^{δ
₁} (P).□

This theorem shows that δ-information entropy increases when δ-information structure becomes finer, and it decreases when δ-information structure becomes coarser.

Proposition 6.13. Let (U, A) be an image information system.

(1) If 0 < δ₁ ≤ δ₂ ≤ 1, then for any P ⊆ A, H^{δ
₂} (P) ≤ H^{δ
₁} (P).

(2) If P ⊆ Q ⊆ A, then for any δ ∈ (0, 1], H^δ (P) ≤ H^δ (Q).

Proof. This holds by Theorems 5.9 and 6.12(1).□

Example 6.14. (Continued from Example 6.7 and 6.11) We have $H^{δ_{1}} (P_{5}) = \frac{0.3444}{12} \approx 0.0287,$ $H^{\sqrt{0.8}} (P_{5}) = \frac{0.2148}{12} \approx 0.0179 .$

Thus $H^{\sqrt{0.8}} (P_{5}) < H^{δ_{1}} (P_{5}) .$

Example 6.15. (Continued from Examples 3.6 and 6.11)

Since P₁ ⊆ P₅ ⊆ A, we have $H^{\sqrt{0.8}} (A) \approx 0.0186,$ $H^{\sqrt{0.8}} (P_{1}) \approx 0.0002,$ $H^{\sqrt{0.8}} (P_{5}) \approx 0.0179 .$

Thus $H^{\sqrt{0.8}} (P_{1}) < H^{\sqrt{0.8}} (P_{5}) < H^{\sqrt{0.8}} (A) .$

Rough entropy, introduced by Yao [37], is used to measure granularity of a given partition. It is also called co-entropy by some scholars [2]. Similar to Definition 10 in [17], δ-rough entropy of a given image information system is proposed in the following definition.

Definition 6.16. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then δ-rough entropy of the subsystem (U, P) is defined as $(E_{r})^{δ} (P) = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{1}{| S_{R_{P}^{G} (δ)} (x_{i}) |} .$

Example 6.17. (Continued from Example 4.4)

$\begin{matrix} (E_{r})^{\sqrt{0.8}} (P_{1}) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{P_{1}}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{43.0176}{12} \approx 3.5848, \end{matrix}$

$\begin{matrix} (E_{r})^{\sqrt{0.8}} (P_{2}) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{P_{2}}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{42.9888}{12} \approx 3.5824, \end{matrix}$

$\begin{matrix} (E_{r})^{\sqrt{0.8}} (P_{3}) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{P_{3}}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{42.9228}{12} \approx 3.5769, \end{matrix}$ $\begin{matrix} (E_{r})^{\sqrt{0.8}} (P_{4}) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{P_{4}}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{42.8112}{12} \approx 3.5676, \end{matrix}$ $\begin{matrix} (E_{r})^{\sqrt{0.8}} (P_{5}) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{P_{5}}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{42.804}{12} \approx 3.5670, \end{matrix}$ $\begin{matrix} (E_{r})^{\sqrt{0.8}} (A) & = - \sum_{i = 1}^{12} \frac{1}{12} {log}_{2} \frac{1}{| S_{R_{A}^{G} (\sqrt{0.8})} (x_{i}) |} \\ = \frac{42.7968}{12} \approx 3.5664 . \end{matrix}$

Proposition 6.18. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then $0 \leq (E_{r})^{δ} (P) \leq {log}_{2} n .$ Moreover, if $R_{P}^{G} (δ)$ is an identity relation on U, then (E_r) ^δ achieves the minimum value 0; if $R_{P}^{G} (δ)$ is an universal relation on U, then (E_r) ^δ achieves the maximum value log ₂n.

Proof. Note that $R_{P}^{G} (δ)$ is a fuzzy equivalence relation on U. Then ∀ i, $R_{P}^{G} (δ) (x_{i}) (x_{i}) = 1 .$

So ∀ i, $1 \leq | S_{R_{P}^{G} (δ)} (x_{i}) | \leq n$ .

This implies that $0 \leq - {log}_{2} \frac{1}{| S_{R_{P}^{G} (δ)} (x_{i}) |} = {log}_{2} | S_{R_{P}^{G} (δ)} (x_{i}) | \leq {log}_{2} n .$

Then $0 \leq - \sum_{i = 1}^{n} {log}_{2} \frac{1}{| S_{R_{P}^{G} (δ)} (x_{i}) |} \leq n {log}_{2} n$ .

By Definition 6.16, $0 \leq (E_{r})^{δ} (P) \leq {log}_{2} n .$

If $R_{P}^{G} (δ)$ is an identity relation on U, then ∀ i, $| S_{R_{P}^{G} (δ)} (x_{i}) | = 1$ . So (E_r) ^δ (P) =0.

If $R_{P}^{G} (δ)$ is an universal relation on U, then ∀ i, $| S_{R_{P}^{G} (δ)} (x_{i}) | = n$ . So (E_r) ^δ (P) = log ₂n.□

Proposition 6.19. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A.

(1) If S^{δ
₁} (P) ⪯ S^{δ
₂} (Q), then (E_r) ^{δ
₁} (P) ≤ (E_r) ^{δ
₂} (Q).

(2) If S^{δ
₁} (P) ≺ S^{δ
₂} (Q), then (E_r) ^{δ
₁} (P) < (E_r) ^{δ
₂} (Q).

Proof. (1) Obviously.

(2) Since S^{δ
₁} (P) ≺ S^{δ
₂} (Q), similar to the proof of Theorem 6.5 (2), we obtain that

∀ i, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{i}) | \leq | S_{R_{Q}^{G} (δ_{2})} (x_{i}) |$ and

∃ j, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{j}) | < | S_{R_{Q}^{G} (δ_{2})} (x_{j}) | .$

Then ∀ i, $\begin{matrix} - {log}_{2} \frac{1}{| S_{R_{P}^{G} (δ_{1})} (x_{i}) |} \\ = {log}_{2} | S_{R_{P}^{G} (δ_{1})} (x_{i}) | \leq {log}_{2} | S_{R_{Q}^{G} (δ_{2})} (x_{i}) | \\ = - {log}_{2} \frac{1}{| S_{R_{Q}^{G} (δ_{2})} (x_{i}) |} \end{matrix}$

and ∃ j, $\begin{matrix} - {log}_{2} \frac{1}{| S_{R_{P}^{G} (δ_{1})} (x_{j}) |} \\ = \log_{2} | S_{R_{P}^{G} (δ_{1})} (x_{j}) | < {log}_{2} | S_{R_{Q}^{G} (δ_{2})} (x_{j}) | \\ = - {log}_{2} \frac{1}{| S_{R_{Q}^{G} (δ_{2})} (x_{j}) |} . \end{matrix}$

Hence (E_r) ^{δ
₁} (P) < (E_r) ^{δ
₂} (Q).□

This proposition illustrates the fact that the more uncertain the available information is, the bigger δ-rough entropy value becomes. Thus, we can conclude that δ-rough entropy proposed in Definition 6.16 can be used to evaluate the uncertainty of an image information system.

Proposition 6.20. Let (U, A) be an image information system.

(1) If 0 < δ₁ ≤ δ₂ ≤ 1, then for any P ⊆ A, (E_r) ^{δ
₁} (P) ≤ (E_r) ^{δ
₂} (P).

(2) If P ⊆ Q ⊆ A, then for any δ ∈ (0, 1], (E_r) ^δ (Q) ≤ (E_r) ^δ (P).

Proof. This holds by Theorems 5.9 and 6.19(1).□

This proposition shows that δ-rough entropy increases when the δ becomes bigger, and it decreases when δ becomes smaller; δ-rough entropy increases when δ-information structure becomes smaller, and it decreases when δ-information structure becomes bigger.

Example 6.21. (Continued from Examples 6.7 and 6.17) We have $(E_{r})^{δ_{1}} (P_{5}) = \frac{42.7332}{12} \approx 3.5611,$ $(E_{r})^{δ_{2}} (P_{5}) = \frac{42.8040}{12} \approx 3.5670 .$ Thus $(E_{r})^{δ_{1}} (P_{5}) < (E_{r})^{δ_{2}} (P_{5}) .$

Example 6.22. (Continued from Examples 3.6 and 6.17)

Since P₁ ⊆ P₅ ⊆ A, we have $(E_{r})^{\sqrt{0.8}} (P_{1}) \approx 3.5848,$ $(E_{r})^{\sqrt{0.8}} (P_{5}) \approx 3.5670,$ $(E_{r})^{\sqrt{0.8}} (A) \approx 3.5664 .$

Thus $(E_{r})^{\sqrt{0.8}} (A) < (E_{r})^{\sqrt{0.8}} (P_{5}) < (E_{r})^{\sqrt{0.8}} (P_{1}) .$

From Theorem 6.19 and Proposition 6.20, we come to the conclusion that δ-rough entropy introduced in Definition 6.16 can be used to evaluate the uncertainty of an image information system. That is to say, the more certain δ-information structure is, the smaller δ-rough entropy value becomes.

Theorem 6.23. (E_r) _δ in Definition 6.16 is an information granulation function under Definition 6.1.

Proof.

(1) Obviously, “Non-negativity” holds.

(2) Given δ ∈ (0, 1] and P, Q ⊆ A. If S^δ (P) = S^δ (Q), then ∀ i, $S_{R_{P}^{G} (δ)} (x_{i}) = S_{R_{Q}^{G} (δ)} (x_{i})$ .

By Definition 6.16, (E_r) ^δ (P) = (E_r) ^δ (Q).

(3) “Monotonicity” follows from Theorem 6.19 (2).

□

Theorem 6.24. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then $(E_{r})^{δ} (P) + H^{δ} (P) = \log_{2} n .$

Proof. $\begin{matrix} (E_{r})^{δ} (P) + H^{δ} (P) \\ = - \frac{1}{n} \sum_{i = 1}^{n} ({log}_{2} \frac{1}{| S_{R_{P}^{G} (δ)} (x_{i}) |} + {log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{n}) \\ = - \frac{1}{n} \sum_{i = 1}^{n} {log}_{2} \frac{1}{n} \\ = {log}_{2} n . \end{matrix}$

□

Corollary 6.25. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then $0 \leq H^{δ} (P) \leq \log_{2} n .$

Proof. By Proposition 6.18, 0 ≤ (E_r) ^δ (P) ≤ log ₂n.

By Theorem 6.24, H^δ (P) = log₂ n - (E_r) ^δ (P).

Thus 0 ≤ H^δ (P) ≤ log₂ n.□

6.3 Information amounts in an image information system

Similar to Definition 10 in [17], information amount in a given image information system is presented in the following definition.

Definition 6.26. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then δ-information amount of the subsystem (U, P) is defined as $E^{δ} (P) = \sum_{i = 1}^{n} \frac{1}{n} (1 - \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{n}) .$

Example 6.27. (Continued from Example 4.4) $\begin{matrix} E^{\sqrt{0.8}} (P_{1}) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{P_{1}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0012}{12} \approx 0.0001, \end{matrix}$ $\begin{matrix} E^{\sqrt{0.8}} (P_{2}) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{P_{2}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0216}{12} \approx 0.0018, \end{matrix}$ $\begin{matrix} E^{\sqrt{0.8}} (P_{3}) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{P_{3}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.0672}{12} \approx 0.0056, \end{matrix}$ $\begin{matrix} E^{\sqrt{0.8}} (P_{4}) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{P_{4}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.1428}{12} \approx 0.0119, \end{matrix}$ $\begin{matrix} E^{\sqrt{0.8}} (P_{5}) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{P_{5}}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.1488}{12} \approx 0.0124, \end{matrix}$ $\begin{matrix} E^{\sqrt{0.8}} (A) & = \sum_{i = 1}^{12} \frac{1}{12} (1 - \frac{| S_{R_{A}^{G} (\sqrt{0.8})} (x_{i}) |}{12}) \\ = \frac{0.1536}{12} \approx 0.0128 . \end{matrix}$

Theorem 6.28. Let (U, A) be an image information system. Given δ₁, δ₂ ∈ (0, 1] and P, Q ⊆ A.

(1) If S^{δ
₁} (P) ⪯ S^{δ
₂} (Q), then E^{δ
₂} (Q) ≤ E^{δ
₁} (P).

(2) If S^{δ
₁} (P) ≺ S^{δ
₂} (Q), then E^{δ
₂} (Q) < E^{δ
₁} (P).

Proof. (1) Obviously.

(2) Since S^{δ
₁} (P) ≺ S^{δ
₂} (Q), similar to the proof of Proposition 6.5 (2), we obtain that ∀ i, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{i}) | \leq | S_{R_{Q}^{G} (δ_{2})} (x_{i}) |$ and ∃ j, $1 \leq | S_{R_{P}^{G} (δ_{1})} (x_{j}) | < | S_{R_{Q}^{G} (δ_{2})} (x_{j}) | .$

Hence E^{δ
₂} (Q) < E^{δ
₁} (P).□

This theorem illustrates that δ-information amount increases when δ-information structure becomes finer, and it decreases when δ-information structure becomes coarser.

Proposition 6.29. Let (U, A) be an image information system.

(1) If 0 < δ₁ ≤ δ₂ ≤ 1, then for any P ⊆ A, E^{δ
₂} (P) ≤ E^{δ
₁} (P).

(2) If P ⊆ Q ⊆ A, then for any δ ∈ (0, 1], E^δ (P) ≤ E^δ (Q).

This proposition shows that δ-information amount increases when the δ becomes smaller, and it decreases when δ becomes bigger; δ-information amount increases when δ-information structure becomes bigger, and it decreases when δ-information structure becomes smaller.

Example 6.30. (Continued from Examples 6.7 and 6.27) We have

$E^{δ_{1}} (P_{5}) = \frac{0.1968}{12} \approx 0.0164,$ $E^{δ_{2}} (P_{5}) = \frac{0.1488}{12} \approx 0.0124 .$

Thus $E^{δ_{2}} (P_{5}) < E^{δ_{1}} (P_{5}) .$

Example 6.31. (Continued from Examples 3.6 and 6.27)

Since P₁ ⊆ P₅ ⊆ A, we have E^{δ
₂} (P₁) ≈0.0001, E^{δ
₂} (P₅) ≈0.0124, E^{δ
₂} (A) ≈0.0128,

Thus $E^{δ_{2}} (P_{1}) < E^{δ_{2}} (P_{5}) < E^{δ_{2}} (A) .$

From Theorem 6.28 and Proposition 6.29, we come to the conclusion that δ-information amount introduced in Definition 6.26 can be used to evaluate the uncertainty of an image information system. In other words, the more certain δ-information structure is, the bigger δ-information amount value becomes.

Theorem 6.32. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then $G^{δ} (P) + E^{δ} (P) = 1 .$

Proof. $\begin{matrix} G^{δ} (P) + E^{δ} (P)) \\ = \frac{1}{n^{2}} \sum_{i = 1}^{n} [| S_{R_{P}^{G} (δ)} (x_{i}) | + (n - | S_{R_{P}^{G} (δ)} (x_{i}) |)] \\ = \frac{1}{n^{2}} \sum_{i = 1}^{n} n \\ = 1 . \end{matrix}$

□

Corollary 6.33. Let (U, A) be an image information system. Given δ ∈ (0, 1] and P ⊆ A. Then $0 \leq E^{δ} (P) \leq 1 - \frac{1}{n} .$

Proof. By Proposition 6.4, $\frac{1}{n} \leq G^{δ} (P) \leq 1$ . By Theorem 6.32, E^δ (P) =1 - G^δ (P). Thus $0 \leq E^{δ} (P) \leq 1 - \frac{1}{n}$ .□

Example 6.34. (Continued from Example 3.6)

Pick δ² = 0.1, ⋯ , 0.9 . We obtain the following results:

(1) If we only consider monotonicity, then δ-information granulation and δ-rough entropy are both monotonically increasing with the δ value growth, that means the uncertainty of four subsystems increase as the δ value increases. Meanwhile, δ-information amount and δ-information entropy are both monotonically decreasing with δ value growth, That means the uncertainty of four subsystem decreases as the δ value increases (see Figure 1 –6).

Fig. 1

Uncertainty measures of (U, P₁) with different δ.

Fig. 2

Uncertainty measures of (U, P₂) with different δ.

Fig. 3

Uncertainty measures of (U, P₃) with different δ.

Fig. 4

Uncertainty measures of (U, P₄) with different δ.

Fig. 5

Uncertainty measures of (U, P₅) with different δ.

Fig. 6

Uncertainty measures of (U, A) with different δ.

(2) If we pick $δ = \sqrt{0.8}$ , consider δ-information granulation and δ-rough entropy, we have G^δ (A) < G^δ (P₅) < G^δ (P₄) < G^δ (P₃) < G^δ (P₂) < G^δ (P₁) , and (Er) ^δ (A) < (Er) ^δ (P₅) < (Er) ^δ (P₄) < (Er) ^δ (P₃) < (Er) ^δ (P₂) < (Er) ^δ (P₁) . That shows the larger the subsystem, the smaller the measured value. Pay attention to δ-information amount and δ-information entropy, we have E^δ (P₁) < E^δ (P₂) < E^δ (P₃) < E^δ (P₄) < E^δ (P₅) < E^δ (A) , H^δ (P₁) < H^δ (P₂) < H^δ (P₃) < H^δ (P₄) < H^δ (P₅) < H^δ (A) . That displays the measured value of the subsystem is larger than the smaller one (see Figure 7).

Fig. 7

Uncertainty measures of subsystems with the fixed value $(δ = \sqrt{0.8})$ .

7 Effectiveness analysis

To evaluate the performance of measuring uncertainty in an image information system, this section conducts analyzes the effectiveness of the proposed measures from the angle of statistics.

7.1 Dispersion analysis

In actual statistical work, we often research the dispersion degree of a data set. A amount used to measure the dispersion degree of a data set is called a difference measure. The common difference measures have range, four point difference, average difference, standard deviation, standard deviation coefficient and so on.

In this paper, we apply the standard deviation coefficient to do effectiveness analysis of the proposed measures.

Given a data set X = {x₁, ⋯ , x_n}. Then its arithmetic average value $\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ , its standard deviation $σ (X) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}}$ , and its standard deviation coefficient $CV (X) = \frac{σ (X)}{\bar{x}} .$

Example 7.1. Denote $X_{G}^{0.8} = {G^{\sqrt{0.8}} (P_{1}), G^{\sqrt{0.8}} (P_{2}), G^{\sqrt{0.8}} (P_{3}), G^{\sqrt{0.8}} (P_{4}), G^{\sqrt{0.8}} (P_{5}), G^{\sqrt{0.8}} (A)}, X_{E_{r}}^{0.8} = {{E_{r}}^{\sqrt{0.8}} (P_{1}), {E_{r}}^{\sqrt{0.8}} (P_{2}), {E_{r}}^{\sqrt{0.8}} (P_{3}), {E_{r}}^{\sqrt{0.8}} (P_{4}), {E_{r}}^{\sqrt{0.8}} (P_{5}), {E_{r}}^{\sqrt{0.8}} (A)}}, X_{H}^{0.8} = {H^{\sqrt{0.8}} (P_{1}), H^{\sqrt{0.8}} (P_{2}), H^{\sqrt{0.8}} (P_{3}), H^{\sqrt{0.8}} (P_{4}), H^{\sqrt{0.8}} (P_{5}), H^{\sqrt{0.8}} (A)}, X_{E}^{0.8} = {E^{\sqrt{0.8}} (P_{1}), E^{\sqrt{0.8}} (P_{2}), E^{\sqrt{0.8}} (P_{3}), E^{\sqrt{0.8}} (P_{4}), E^{\sqrt{0.8}} (P_{5}), E^{\sqrt{0.8}} (A)} .$

Then

$X_{G}^{0.8} = {0.9872, 0.9999, 0.9982, 0.9944, 0.9881, 0.9876}, X_{H}^{0.8} = {0.0186, 0.0002, 0.0026, 0.0080, 0.0173, 0.0179}, X_{E_{r}}^{0.8} = {3.5664, 3.5848, 3.5824, 3.5769, 3.5676, 3.5670}, X_{E}^{0.8} = {0.0128, 0.0001, 0.0018, 0.0056, 0.0119, 0.0124} .$

So $\begin{matrix} CV (X_{G}^{0.8}) \approx 0.147934, \\ CV (X_{H}^{0.8}) \approx 1.549847, \\ CV (X_{E_{r}}^{0.8}) \approx 0.078870, \\ CV (X_{E}^{0.8}) \approx 1.336872 (see Table 2) . \end{matrix}$

Thus $CV (X_{E_{r}}^{δ}) < CV (X_{G}^{δ}) < CV (X_{E}^{δ}) < CV (X_{H}^{δ}) .$ This means the dispersion degree of E_r^δ is minimum.

From Figures 1 –6 and Table 2, we obtain the following results:

Table 2
Dispersion analysis of subsystem of CV with different δ

δ²-values $CV (X_{G}^{δ})$ $CV (X_{H}^{δ})$ $CV (X_{E_{r}}^{δ})$ $CV (X_{E}^{δ})$

0.1 0.130421 1.920703 0.073124 1.602787

0.2 0.142560 1.843925 0.078810 1.572594

0.3 0.147524 1.763606 0.080502 1.519729

0.4 0.149481 1.699093 0.080850 1.470136

0.5 0.149967 1.648351 0.080633 1.427614

0.6 0.149666 1.608136 0.080151 1.391891

0.7 0.148926 1.575914 0.079541 1.361974

0.8 0.147934 1.549847 0.078870 1.336872

0.9 0.146802 1.528612 0.078174 1.315749

δ²-values	$CV (X_{G}^{δ})$	$CV (X_{H}^{δ})$	$CV (X_{E_{r}}^{δ})$	$CV (X_{E}^{δ})$
0.1	0.130421	1.920703	0.073124	1.602787
0.2	0.142560	1.843925	0.078810	1.572594
0.3	0.147524	1.763606	0.080502	1.519729
0.4	0.149481	1.699093	0.080850	1.470136
0.5	0.149967	1.648351	0.080633	1.427614
0.6	0.149666	1.608136	0.080151	1.391891
0.7	0.148926	1.575914	0.079541	1.361974
0.8	0.147934	1.549847	0.078870	1.336872
0.9	0.146802	1.528612	0.078174	1.315749

(1) If we only need monotonicity, then G^δ, E_r^δ, H^δ and E^δ have better performance for measuring uncertainty of an image information system.

(2) If we only consider the dispersion degree, then E_r^δ has better performance for measuring uncertainty of an image information system.

7.2 Association analysis

In statistics, Pearson correlation coefficient is a measure of the strength of a linear correlation between two variables or two data sets.

Given two data sets X = {x₁, ⋯ , x_n} and Y = {y₁, ⋯ , y_n}. Pearson correlation coefficient between X and Y, denoted by r (X, Y) or r_XY, is defined as follows: $r (X, Y) or r_{XY} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}} \sqrt{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}},$ where $\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ , $\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}$ .

Obviously, $- 1 \leq r (X, Y) \leq 1 .$

If r (X, Y) =0, then there is no correlation between X and Y; if r (X, Y) >0, then the correlation between X and Y is positive; if r (X, Y) <0, then the correlation between X and Y is negative. Particularly, r (X, Y) =1 indicates completely positive correlation between X and Y, and r (X, Y) = -1 means completely negative correlation between X and Y.

The closer the absolute value of Pearson correlation coefficient r is to 0, the smaller the degree of correlation between variables; conversely, the closer the absolute value of Pearson correlation coefficient r is to 1, the greater the degree of correlation between variables. Generally speaking, the degree of correlation can be classified as follows: when |r|=1, this correlation is called as complete correlation; when 0.7 ≤ |r|<1, this correlation is called as height correlation; when 0.4 ≤ |r|<0.7, this correlation is called as moderate correlation; when 0 < |r|<0.4, this correlation is called as low correlation; when r = 0, this correlation is called as no correlation.

Example 7.2.

Pick $δ_{1} = \sqrt{0.1}$ , $δ_{2} = \sqrt{0.2}$ , ⋯, $δ_{9} = \sqrt{0.9}$ . Denote $\begin{matrix} X_{G}^{A} = {G_{δ_{1}} (A), G_{δ_{2}} (A), \dots, G_{δ_{9}} (A)}, \\ X_{E}^{A} = {E_{δ_{1}} (A), E_{δ_{2}} (A), \dots, E_{δ_{9}} (A)}, \\ X_{E_{r}}^{A} = {(E_{r})_{δ_{1}} (A), (E_{r})_{δ_{2}} (A), \dots, (E_{r})_{δ_{9}} (A)}, \\ X_{H}^{A} = {H_{δ_{1}} (A), H_{δ_{2}} (A), \dots, H_{δ_{9}} (A)} . \end{matrix}$

Then $\begin{matrix} r (X_{G}^{A}, X_{G}^{A}) & = r (X_{E_{r}}^{A}, X_{E_{r}}^{A}) = r (X_{H}^{A}, X_{H}^{A}) \\ = r (X_{E}^{A}, X_{E}^{A}) = 1, \end{matrix}$

$\begin{matrix} r (X_{G}^{A}, X_{E_{r}}^{A}) = r (X_{E_{r}}^{A}, X_{G}^{A}) = 0.999918, \\ r (X_{G}^{A}, X_{H}^{A}) = r (X_{H}^{A}, X_{G}^{A}) = - 0.999918, \\ r (X_{G}^{A}, X_{E}^{A}) = r (X_{E}^{A}, X_{G}^{A}) = - 1, \\ r (X_{E_{r}}^{A}, X_{H}^{A}) = r (X_{H}^{A}, X_{E_{r}}^{A}) = - 1, \\ r (X_{E_{r}}^{A}, X_{E}^{A}) = r (X_{E}^{A}, X_{E_{r}}^{A}) = - 0.999918, \\ r (X_{E}^{A}, X_{H}^{A}) = r (X_{H}^{A}, X_{E}^{A}) = 0.999918 . \end{matrix}$

The results are shown in Table 3.

Table 3
r-values of sixteen pairs of measure values sets for measuring uncertainty of the (U, A) with different δ-values

r $X_{G}^{A}$ $X_{H}^{A}$ $X_{E_{r}}^{A}$ $X_{E}^{A}$

$X_{G}^{A}$ 1.0000 -0.9999 0.9999 -1.0000

$X_{H}^{A}$ -0.9999 1.0000 -1.0000 0.9999

$X_{E_{r}}^{A}$ 0.9999 -1.0000 1.0000 -0.9999

$X_{E}^{A}$ -1.0000 0.9999 -0.9999 1.0000

r	$X_{G}^{A}$	$X_{H}^{A}$	$X_{E_{r}}^{A}$	$X_{E}^{A}$
$X_{G}^{A}$	1.0000	-0.9999	0.9999	-1.0000
$X_{H}^{A}$	-0.9999	1.0000	-1.0000	0.9999
$X_{E_{r}}^{A}$	0.9999	-1.0000	1.0000	-0.9999
$X_{E}^{A}$	-1.0000	0.9999	-0.9999	1.0000

“ $r (X_{G}^{A}, X_{E_{r}}^{A}), r (X_{E}^{A}, X_{H}^{A}) > 0.7$ ” mean that $X_{G}^{A}$ and $X_{E_{r}}^{A}$ , $X_{E}^{A}$ and $X_{H}^{A}$ are height positive correlative. Thus G^δ and E_r^δ, E^δ and H^δ are height positive correlative.

“ $r (X_{G}^{A}, X_{E}^{A} = r (X_{H}^{A}, X_{E_{r}}^{A} = - 1$ ” mean that $X_{G}^{A}$ and $X_{E}^{A}$ , $X_{H}^{A}$ and $X_{E_{r}}^{A}$ are completely negative correlative. Thus G^δ and E^δ, H^δ and E_r^δ are completely negative correlative.

8 An application

In this section, we give an application of the proposed measurement for attribute reduction.

Definition 8.1. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then P is called a δ-consistent subset of A, if $R_{P}^{G} (δ) = R_{A}^{G} (δ)$ .

Definition 8.2. Let (U, A) be an image information system. Given a ∈ P ⊆ A and δ ∈ (0, 1]. Then a is called δ-independent in P, if $R_{P}^{G} (δ) \neq R_{P - {a}}^{G} (δ)$ .

Definition 8.3. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then P is called a δ-independent subset of A, if for any a ∈ P, a is δ-independent in P.

Definition 8.4. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then P is called a δ-reduct of A, if P is both δ-consistent and δ-independent.

In this paper, the set of all δ-coordination subsets (resp., all δ-reducts) of A is denoted by co^δ (A) (resp., red^δ (A)).

Obviously,

$P \in {red}^{δ} (A) \Leftrightarrow P \in {co}^{δ} (A) and \forall C \subset P, Q \notin {co}^{δ} (A) .$

Lemma 8.5. Given P, Q ∈ I^U×U. Suppose P ⊆ Q. If for any i, |S_P (x_i) | = |S_Q (x_i) |, then P = Q.

Proof. Denote $M (P) = (p_{ij})_{n \times n}, M (Q) = (q_{ij})_{n \times n},$ where p_ij = P (x_i, x_j) , q_ij = Q (x_i, x_j).

Since P ⊆ Q, we have $\forall i, j, q_{ij} - p_{ij} \geq 0 .$

Note that |S_P (x_i) | = p_i1 + p_i2 + ⋯ + p_in and |S_Q (x_i) | = q_i1 + q_i2 + ⋯ + q_in . Then by |S_P (x_i) | = |S_Q (x_i) |, we obtain that $(q_{i 1} - p_{i 1}) + (q_{i 2} - p_{i 2}) + \dots + (q_{in} - p_{in}) = 0 .$

Thus ∀ i, q_i1 - p_i1 = q_i2 - p_i2 = ⋯ = q_in - p_in = 0 .

Hence P = Q.□

Theorem 8.6. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then $P \in {co}^{δ} (A) \Leftrightarrow H^{δ} (P) = H^{δ} (A) .$

Proof.

. This is obvious.

. Suppose H^δ (P) = H^δ (A). Then, we have

$- \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{n} = - \sum_{i = 1}^{n} \frac{1}{n} {log}_{2} \frac{| S_{R_{A}^{G} (δ)} (x_{i}) |}{n} .$

So $\sum_{i = 1}^{n} {log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{| S_{R_{A}^{G} (δ)} (x_{i}) |} = 0 .$

Note that $R_{A}^{G} (δ) \subseteq R_{P}^{G} (δ)$ . Then ∀ i, $S_{R_{A}^{G} (δ)} (x_{i}) \subseteq S_{R_{P}^{G} (δ)} (x_{i})$ . This implies that $\forall i, {log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{| S_{R_{A}^{G} (δ)} (x_{i}) |} \geq 0 .$

So ∀ i, ${log}_{2} \frac{| S_{R_{P}^{G} (δ)} (x_{i}) |}{| S_{R_{A}^{G} (δ)} (x_{i}) |} = 0 .$

By Lemma 8.5, $R_{P}^{G} (δ) = R_{A}^{G} (δ) .$

Hence $P \in {co}^{δ} (A) .$ □

Theorem 8.7. Let (U, A) be an image information system. Given P ⊆ A and δ ∈ (0, 1]. Then

P ∈ red^δ (A) ⇔ H^δ (P) = H^δ (A) and ∀ a ∈ P, H^δ (P - {a}) ≠ H^δ (A) .

Proof. It can be proved by Theorem 8.6.□

Below, we give reduction algorithm in an image information system based on δ-information entropy.

Algorithm 3 uses H to obtain the feature which is added to the the current selected coordinated set in each loop. This algorithm terminates when the addition of any remaining feature does not decrease the evaluating function. For a dimensionality of |A|, the time complexity for computing δ-information entropy is |A|, the worst search time for a reduct will result in |A| (|A|+1)/2 evaluations of the evaluation function. The overall time complexity of Algorithm 3 is O (|A|²).

9 Comparison and discussion

In this section, we make comparison and discussion with literatures [40, 45] so as to see the innovation of this article more clearly.

(1) All three articles consider images as the informational values and use Gaussian kernel.

(2) This paper and literatures [40] are based on granular computing. Thus, the research ideas of two articles are the same and the obtained results are similar.

(3) The differences of three articles are as below.

a) The studied information systems are different: This article considers an image information system and literatures [40, 45] study a hybrid information system with images.

b) The constructed information granules are different: This article only construct information granules that are formed by images as information values and literatures [40, 45] construct information granules that are formed from various information values. It can be said that dealing with image information values is more difficult.

c) The obtained results are different: This article give information structures and uncertainty in an image information system and literatures [45] obtain dynamical updating fuzzy rough approximations under the variation of attribute values.

d) This paper not only gives the uncertainty measurement for an image information system, but also makes effectiveness analysis of the given measures. literatures [40] only proposes the uncertainty measurement for a hybrid information system with images.

10 Conclusions

An image information system is an information system where each of its information values is an image. Based on the idea of granular computing, we construct information granules as shown in Figure 8.

Fig. 8

Initialization of information granules in an image information system.

Gaussian kernel has been used to extract a fuzzy T_cos-equivalence relation on the object set of a given image information system. Dependence between two information structures has been depicted. By using information structures, uncertainty measurement for an image information system have been investigated. Effectiveness analysis has been done about the proposed measurements. Theoretical and effectiveness analysis illustrate the fact that granulation measures and entropy measures can be applied to measuring uncertainty of an image information system. An application of the proposed measurement for attribute reduction has been given. In future work, we will study three-way decisions in an image information system.

Footnotes

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by National Natural Science Foundation of China (11971420), Natural Science Foundation of Guangxi (2018GXNSFDA294003, 2018GXNSFDA281028, 2018GXNSFAA294134), Guangxi Science and Technology Program (2017AD23056),Guangxi Higher Education Institutions of China (Document No. [2018] 35 and [2019] 52), Special Scientific Research Project of Young Innovative Talents in Guangxi (2019AC20052), Key Laboratory of Software Engineering in Guangxi University for Nationalities (2020-18XJSY-03), Research Project of Institute of Big Data in Yulin (YJKY03), Engineering Project of Undergraduate Teaching Reform of Higher Education in Guangxi (2017JGA179) and the project of improving basic scientific research ability of young and middle-aged teachers in Guangxi Universities (2020KY14013).

References

Bianucci

, Cattaneo

and Ciucci

, Entropies and cocentropies of coverings with application to incomplete information systems, Fundamenta Informaticae 75 (2007), 77–105.

Beaubouef

and Petry

F.E.

, Fuzzy rough set techniques for uncertainty processing in a relational database, International Journal of Intelligent Systems 15 (2000), 389–424.

Beaubouef

, Petry

F.E.

and Arora

, Information-theoretic measures of uncertainty for rough sets and rough relational databases, Information Sciences 109 (1998), 185–195.

Blaszczynski

, Slowinski

and Szelag

, Sequential covering rule induction algorithm for variable consistency rough set approaches, Information Sciences 181(5) (2011), 987–1002.

Cornelis

, Jensen

, Martin

G.H.

and Slezak

, Attribute selection with fuzzy decision reducts, Information Sciences 180 (2010), 209–224.

Cament

L.A.

, Castillo

L.E.

, Perez

J.P.

, Galdames

F.J.

and Perez

C.A.

, Fusion of local normalization and Gabor entropy weighted features for face identification, Pattern Recognit 47(2) (2014), 568–577.

Duntsch

and Gediga

, Uncertainty measures of rough set prediction, Artificial Intelligence 106 (1998), 109–137.

Delgado

and Romero

, Environmental conflict analysis using an integrated grey clustering and entropy-weight method: a case study of a mining project in Peru, Environmental Modelling Software 77 (2016), 108–121.

Dai

J.H.

and Tian

H.W.

, Entropy measures and granularity measures for set-valued information systems, Information Sciences 240 (2013), 72–82.

10.

Hempelmann

C.F.

, Sakoglu

, Gurupur

V.P.

and Jampana

, An entropy-based evaluation method for knowledge bases of medical information systems, Expert Systems with Applications 46 (2016), 262–273.

11.

Q.H.

, Xie

Z.X.

and Yu

D.R.

, Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, Pattern Recognition 40 (2007), 3509–3521.

12.

Q.H.

, Zhang

, Chen

D.G.

, Pedrycz

and Yu

D.R.

, Gaussian kernel based fuzzy rough sets: model, uncertainty measures and applications, International Journal of Approximate Reasoning 51 (2010), 453–471.

13.

Jensen

and Shen

, Semantics-preserving dimensionality reduction: rough and fuzzy rough based approaches, IEEE Transactions on Snowledge and Data Engineering 16 (2004), 1457–1471.

14.

Lin

T.Y.

, Granular computing on binary relations I: data mining and neighborhood systems, In: Rough Sets In Knowledge Discovery, A. Skowron and L. Polkowski (eds), Physica-Verlag (1998) 107–121.

15.

Lin

T.Y.

, Granular computing on binary relations II: rough set representations and belief functions, In: Rough Sets In Knowledge Discovery, A. Skowron and L. Polkowski (eds), Physica-Verlag (1998), 121–140.

16.

C.R.

, Duan

G.D.

and Zhong

F.J.

, Rotation invariant texture retrieval considering the scale dependence of gabor wavelet, IEEE Transactions on Image Processin 24(8) (2015), 2344–2354.

17.

Liang

J.Y.

and Qu

K.S.

, Information measures of roughness of knowledge and rough sets for information systems, Journal of Systems Science and Systems Engineering 10 (2002), 95–103.

18.

Liang

J.Y.

, Shi

Z.Z.

, Li

D.Y.

and Wierman

M.J.

, The information entropy, rough entropy and knowledge granulation in incomplete information systems, International Journal of General Systems 35 (2006), 641–654.

19.

Moser

, On the T-transitivity of kernels, Fuzzy Sets and Systems 157 (2006), 1787–1796.

20.

Moser

, On representing and generating kernels by fuzzy equivalence relations, Journal of Machine Learning Research 7 (2006), 2603–2630.

21.

, Zhang

, Leung

and Song

, Granular computing and dual Galois connection, Information Sciences 177 (2007), 5365–5377.

22.

Navarrete

, Viejo

and Cazorla

, Color smoothing for RGBD data using entropy information, Applied Soft Computing 46 (2016), 361–380.

23.

Pawlak

, Rough sets, International Journal of Computer Information Science 11 (1982), 341–356.

24.

Pawlak

, Rough Sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht 1991.

25.

Qian

Y.H.

, Liang

J.Y.

, Pedrycz

and Dang

C.Y.

, An accelerator for attribute reduction in rough set theory, Artificial Intelligence 174 (2010), 597–618.

26.

Qian

Y.H.

, Liang

J.Y.

, Wu

W.Z.

and Dang

C.Y.

, Knowledge structure, knowledge granulation and knowledge distance in a knowledge base, International Journal of Approximate Reasoning 50 (2009), 174–188.

27.

Qian

Y.H.

, Liang

J.Y.

, Wu

W.Z.

and Dang

C.Y.

, Information granularity in fuzzy binary GrC model, IEEE Transactions on Fuzzy Systems 19(2) (2011), 253–264.

28.

Shannon

C.E.

, A mathematical theory of communication, The Bell System Technical Journal 27 (1948), 379–423.

29.

Shawe-Tayor

and Cristianini

, Kernel methods for patternn analysis, Cambridge University Press, 2004.

30.

Swiniarski

R.W.

and Skowron

, Rough set methods in feature selection and recognition, Pattern Recognition Letters 24 (2003), 833–849.

31.

Slowinski

and Vanderpooten

, A generalized definition of rough approximations based on setilarity, IEEE Transactions on Knowledge and Data Engineering 12 (2000), 331–336.

32.

Thangavel

and Pethalakshmi

, Dimensionality reduction based on rough set theory: A review, Applied Soft Computing 9 (2009), 1–12.

33.

Wierman

M.J.

, Measuring uncertainty in rough set theory, International Journal of General Systems 28 (1999), 283–297.

34.

Wang

and Yue

H.Bo.

, Entropy measures and granularity measures for interval and set-valued information systems, Soft Computing 20 (2016), 3489–3495.

35.

Xie

S.D.

and Wang

Y.X.

, Construction of tree network with limited delivery latency in homogeneous wireless sensor networks, Wireless Personal Communications 78(1) (2014), 231–246.

36.

W.H.

, Zhang

X.Y.

and Zhang

W.X.

, Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems, Applied Soft Computing 9 (2009), 1244–1251.

37.

Yao

Y.Y.

, Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111 (1998), 239–259.

38.

Yao

Y.Y.

, Information granulation and rough set approximation, International Journal of Intelligent Systems 16 (2001), 87–104.

39.

Yao

Y.Y.

, Probabilistic approaches to rough sets, Expert Systems 20 (2003), 287–297.

40.

G.J.

, Information structures and uncertainty measures in a hybrid information system with images, Soft Computing 23 (2019), 12961–12979.

41.

Yang

, Yan

, Zhang

and Tang

, Biliear analysis for kernel selection and nonlinear feature extraction, IEEE Transactions on Neural Networks 8 (2007), 1442–1452.

42.

Zadeh

L.A.

, Fuzzy sets, Information and Control 8 (1965), 338–353.

43.

Zadeh

L.A.

, Fuzzy logic equals computing with words, Fuzzy Systems, IEEE Transactions 4(2) (1996), 103–111.

44.

Zadeh

L.A.

, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90 (1997), 111–127.

45.

Zeng

A.P.

, Li

T.R.

, Hu

, Chen

H.M.

and Luo

, Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values, Information Sciences 378 (2017), 363–388.

46.

Zeng

A.P.

, Li

T.R.

, Liu

, Zhang

J.B.

and Chen

H.M.

, A fuzzy rough set approach for incremental feature selection on hybrid information systems, Fuzzy Sets and Systems 258 (2015), 39–60.

47.

Zhang

W.X.

and Qiu

G.F.

, Uncertain decision making based on rough sets, Tsinghua University Publishers, Beijing 2005.

48.

Zhang

and Zhang

, Theory and application of problem solving-theory and application of granular computing in quotient spaces, Tsinghua University Publishers, Beijing 2007.

Information structures and uncertainty in an image information system

Abstract

Keywords

1 Introduction

2 Preliminaries

2.1 Fuzzy sets and fuzzy relations

2.2 Image information systems

Table 1 An image information system

4 The fuzzy T cos -equivalence relation induced by an image information system

5 Information structures in an image information system

5.1 Some concepts of information structures in an image information system

5.2 Properties of information structures in an image information system

6 Measuring uncertainty of an image information system

6.1 Granulation measurement for an image information system

6.2 Entropy measurement for an image information system

6.3 Information amounts in an image information system

7.1 Dispersion analysis

Table 3 r-values of sixteen pairs of measure values sets for measuring uncertainty of the (U, A) with different δ-values r X G A X H A X E r A X E A X G A 1.0000 -0.9999 0.9999 -1.0000 X H A -0.9999 1.0000 -1.0000 0.9999 X E r A 0.9999 -1.0000 1.0000 -0.9999 X E A -1.0000 0.9999 -0.9999 1.0000

9 Comparison and discussion

10 Conclusions

Footnotes

Acknowledgments

References

Table 1
An image information system

4 The fuzzy T_cos-equivalence relation induced by an image information system