Transformations and information granularity of knowledge structures in set-based granular computing

Abstract

Granular computing is a relatively new platform for constructing, describing and processing information or knowledge. For crisp information granulation, the universe is decomposed into granules by binary relations on the universe, say, preorder, tolerance and equivalence relations. A knowledge structure is composed of all information granules induced by a relation that corresponds to the granulation. This paper establishes a novel theoretical framework for the measurement of information granularity of knowledge structures. First, two new relations between knowledge structures are introduced through the use of their respective Boolean relation matrices, where the granular equality relation is defined based on an orthogonal transformation with the transformation matrix being a permutation matrix, and the granularly finer relation is presented by combining the classical finer relation and the orthogonal transformation. Then, it is demonstrated that the simplified knowledge structure base with the granularly finer relation is a partially ordered set, which can be represented by a Hasse diagram. Subsequently, an axiomatic definition of information granularity is proposed to satisfy the constraints regarding these two relations. Moreover, a general form of the information granularity is given, and some existing measures are proved to be its special cases. Finally, as an application of the proposed measure, the attribute significance measure is developed based on the information granularity.

Keywords

Granular computing information granularity knowledge structure granular equality relation granularly finer relation

1 Introduction

Granular computing, the term first proposed by Zadeh in 1997 [1], has emerged as one of the information processing paradigms in the domain of computational intelligence and human-centric systems [2, 3]. According to Zadeh [1], granulation is one of three basic concepts that underlie human cognition besides organization and causation. More concretely, granulation involves decomposition of whole into parts, organization involves integration of parts into whole, and causation relates to association of causes with effects. Therefore, granular computing, a well-defined theory built on a solid foundation, provides an efficient tool for dealing with imprecision, uncertainty and partial truth. Nowadays, granular computing is usually loosely regarded as an umbrella term to cover theories, methodologies, techniques, and tools that make use of granules in complex problem solving [4 –12].

The basic components of granular computing are granules, such as subsets, groups, classes and clusters of a universe. Each object is associated with a granule which is informally a clump of objects drawn together by indistinguishability, similarity, proximity or functionality [1 , 13–15]. In granular computing, the objects within a granule have to be dealt with as a whole rather than individually. As pointed out by Lin [16, 17], information granulation is a collection of granules constructed by a given granulation strategy which is mainly based on binary relations. According to the absence or presence of fuzziness in the granules, information granulation can be generally categorized into two groups: crisp information granulation and fuzzy information granulation. In crisp information granulation, granules are crisp, i.e., they have sharply defined boundaries. Basic ideas of crisp information granulation have appeared in many related fields, such as interval analysis [18], rough set theory [19] and Dempster–Shafer theory [20]. While in almost all of human reasoning and concept formation, granules are fuzzy rather than crisp, which could be reflected in the theory of fuzzy information granulation [1 , 21–27]. Fuzzy information granulation underlies the basic concepts of linguistic variable, fuzzy if-then rule and fuzzy graph and also underlies the remarkable human ability to make rational decisions in conditions of imprecision, partial knowledge, partial certainty and partial truth. Li et al. [22] investigated uncertainty measurement for fuzzy relation information systems by using fuzzy information structures. Song et al. [25] employed knowledge distances to construct algebraic lattices, which are useful in characterizing the hierarchies on fuzzy information granulations. Yang et al. [26] introduced a fuzzy knowledge distance measure to distinguish hierarchical rough approximation spaces of a fuzzy concept.

Lin [16, 17] suggested that a granular structure is a mathematical structure of the collection of granules, in which the inner structure of each granule is visible, and the interactions among granules are detected by the visible structures. Furthermore, a granular structure base is a family of all granular structures on the same universe. In granular computing, one often needs to measure the uncertainty of a granular structure for a given data set, which is called the information granularity. Actually, this term, which first appeared in 1979 [27], comes much earlier than the term granular computing. Granularity, together with granule and granulation, is a basic concept involved in human intelligence. In a broad sense, information granularity denotes an average granulation degree of information granules hierarchically. In Ref. [12], it is ascertained that (i) the level of granularity of information granules becomes crucial to the problem description, but (ii) there is no universal level of granularity of information. The essence of information granularity is revealed by its axiomatic definition constrained with a partial order relation among granular structures [23, 28]. This paper is an attempt to unify previous works on measuring the information granularity of knowledge structures.

Modes of crisp information granulation play important roles in a wide variety of methods, approaches and techniques. In these modes, granules are crisp, and thus in this paper, they are uniformly called (crisp) set-based granular computing. Generally, crisp information granules are captured by an arbitrary binary relation on the universe of discourse. To guarantee that each granule, usually the successor neighborhood [29 –34], is nonempty, we always assume that the relation is at least reflexive. Although a serial relation can fulfil the condition, the reflexivity is commonly considered according to practical demands. Such a granular structure is called a binary neighborhood system [16 , 35]. In this paper, our attention is paid to the issue of information granularity in a neighborhood system space. Up to date, this problem has already been faced in the literature. Several forms of the information granularity were proposed according to various views and targets, which were effectively applied in feature selection, rule extraction, decision tree construction, decision making, etc [36 –42]. For example, Liang et al. [43, 44] proposed the knowledge granulation and rough entropy (also called the co-entropy [45, 46]) in both complete and incomplete data sets. Similar measures were also introduced in ordered data sets [47, 48]. In these three types of data sets, three kinds of granules are utilized in equivalence, tolerance and preorder granular structures, respectively. Moreover, Qian et al. [49, 50] presented the combination granulation with an intuitive knowledge content nature to measure the size of information granulation. In all these aforementioned forms of the information granularity, a certain measure of cardinality, which counts the number of elements in the information granule, establishes a sound descriptor of information granularity. And in this paper, we follow this convention in the present research.

Had noticed that the partial order relation between granular structures is of great significance in characterizing the monotonicity of information granularity, Qian et al. [28, 51] proposed an axiomatic definition of information granularity in a neighborhood system space. In their axiomatic definition, the granulation partial order relation, an extension of the traditional rough one, plays a critical role. There are two points we want to restate here about the granulation partial order relation: (i) arrangement of granules and (ii) cardinality of a granule. More specifically, for two granular structures, if the cardinalities of granules for one are all less than or equal to the cardinalities of a sequence of the other, then the former is granulation finer than the latter. In this way, many granular structures which are incomparable by the rough partial order relation might be compared. Thus their information granularity should be subject to the constraint that a finer knowledge structure implies a smaller information granularity. Along this line of research, Zhu [52] proposed an improved axiomatic definition of information granularity based on a kind of special partial order relation. The granularity of a partition was defined by Yao and Zhao [53] as the expected granularity of all blocks of the partition with respect to the probability distribution.

In this paper, we intend to generalize the original partial order relation in a more cautious manner. Two granular structures are perceived to be granularly equal only if one could be transformed to the other by an arrangement of elements. This linear transformation is, in essence, a special case of orthogonal transformation, where the transformation matrix is a permutation matrix. Furthermore, a granular structure is accepted to be granularly finer than another one if there exists a granular structure granularly equal to the former such that it is finer than the latter in the classical sense. We also show that the simplified knowledge structure base with the granularly finer relation is a partially ordered set (poset for short) but not always a lattice. Based on these two relations, a revised axiomatic definition of information granularity is introduced. A general form of the information granularity is proposed, and some commonly used terms are proved to be special instances of the proposed measure. The application to attribute significance measure is presented to measure the significance of attributes by using information granularity. In summary, the main contributions of this paper are as follows:

We put forward the granular equality relation and granularly finer relation originated from arrangements of objects, which is a novel research perspective in granular computing.

We develop a general form of information granularity constructed by a monotonic function, starting from its axiomatic definition.

We provide a practical application of information granularity to measure the attribute significance and identify the most important attribute.

An outline of this paper is as follows. Section 2 reviews some basic concepts relating to binary relations and characterizations of their matrix representations. In Section 3, through a nondegenerate linear transformation, the granular equality relation and granularly finer relation between knowledge structures are defined. It is proved that the granularly finer relation is a partial order relation on the simplified knowledge structure base whose Hasse diagram is visualized. Section 4 introduces an axiomatic definition of the information granularity of knowledge structures, whose general form is presented. In Section 5, an application of information granularity in measuring the significance of attributes is presented. Section 6 concludes this paper and indicates further research directions.

2 Binary relation and its matrix representation

In this section, we introduce the matrix of a relation and its usage in characterizing some specific relations.

Let U be a finite and nonempty set called the universe of discourse, and R ⊆ U × U be a binary relation on U. For x, y ∈ U, we shall say that x and y are R-related whenever the ordered pair (x, y) ∈ R, which is often written in the equivalent form x R y. If (x, y) ∈ R holds, then x is a predecessor of y, and y is a successor of x. Denote R_s (x), the successor neighborhood of x with respect to R [54, 55], consisting of all and only successors of x, i.e., $R_{s} (x) = {y \in U : (x, y) \in R} .$ (1) In granular computing, the two-tuple (U, R) is called a granular structure (also termed an approximation space by the rough set community), and the set R_s (x) is an information granule induced by R. A collection of all granular structures from the same universe is referred to as a granular structure base and is denoted by (U, R ).

There are several important properties that a relation on a set may or may not have. Suppose that R is a relation on U. Then it is said to be [56, 57]

reflexive iff (x, x) ∈ R for all x ∈ U;

symmetric iff (x, y) ∈ R implies (y, x) ∈ R for all x, y ∈ U;

transitive iff (x, y) ∈ R and (y, z) ∈ R together imply (x, z) ∈ R for all x, y, z ∈ U.

Given two relations Q and R on U, the composition of them, denoted by Q ∘ R, is given by

\begin{matrix} Q \circ R = {(x, z) \in U^{2} : & \exists y \in U, such that \\ (x, y) \in Q and (y, z) \in R} . \end{matrix}

By the definition of the composition of relations, we know that R is transitive iff R ∘ R ⊆ R.

In reality, since any object is trivially indiscernible with itself, it is, a fortiori, similar to itself [58, 59]. The reflexivity seems quite necessary to express other types of relations. Furthermore, combined with the other two properties, R is called [57]

a preorder relation if R is reflexive and transitive;

a tolerance relation if R is reflexive and symmetric;

an equivalence relation if R is reflexive, symmetric and transitive.

Another commonly used relation is the so-called partial order relation which is reflexive, antisymmetric and transitive.

The identity relation I = {(x, x) : x ∈ U} and the universal relation E = {(x, y) : x, y ∈ U} are two special cases of equivalence relations. Note that the identity relation relates every element to itself and only itself while the universal relation relates every element to all items in U. Clearly, for any reflexive relation R, we always have:

I ⊆ R ⊆ E,

x ∈ R_s (x) for all x ∈ U, and

{R_s (x) : x ∈ U} constitutes a covering of U. That is, R_s (x) ≠ ∅ , ∀ x ∈ U and ⋃_x∈UR_s (x) = U.

If R is a reflexive (preorder, tolerance) or an equivalence relation, then (U, R) is called a reflexive (preorder, tolerance) or an equivalence granular structure. Furthermore, if all relations in R are reflexive, preorder, tolerance or equivalence relations, then (U, R ) is called a reflexive, preorder, tolerance or an equivalence granular structure base.

The relation R can also be conveniently represented by a Boolean matrix M_R with each of its entry r_ij being set to 1 if (x_i, x_j) ∈ R and to 0 otherwise [60]. Matrix M_R is called the matrix of the relation R relative to the same ordering for its rows and columns. If the cardinality of U, denoted by |U|, is n, then M_R is a square matrix of order n. For Q, R ∈ R , we have Q ⊆ R iff M_Q ≤ M_R, which means q_ij ≤ r_ij for all 1 ≤ i, j ≤ n. The matrix of the composition of Q and R, denoted by M_Q∘R ≜ M_{Q
^•}M_R = (w_ij), is determined by $w_{ij} = ⋁_{k = 1}^{n} (q_{ik} \land r_{kj}), \forall 1 \leq i, j \leq n,$ (2) where ∨ and ∧ are the Boolean operators or and and, respectively.

Using these terminologies and notations, we can equivalently express the properties of R in terms of that of M_R. For example, R is

reflexive iff M_I ≤ M_R;

symmetric iff $M_{R}^{T} = M_{R}$ , where A ^T denotes the transpose of matrix A;

transitive iff M_R
^•M_R ≤ M_R.

It is worth mentioning that the operator • is developed to compute the composition of two relations (in the sense of this paper), which works based on Boolean operators. However, the usual manipulation of numbers or matrices is performed based on ordinary (not Boolean) arithmetic. Therefore, for computational convenience, a novel characterization of the transitivity of a relation using the common multiplication · of matrices is revealed by the lemma below.

Lemma 2.1. Let U be the universe of discourse, |U| = n and R be a relation on U. Then, R is transitive iff M_R · M_R ≤ nM_R.

Proof. Denote $M_{R} \cdot M_{R} = (r_{ij}^{(2)})$ , then $r_{ij}^{(2)} = \sum_{k = 1}^{n} r_{ik} \times r_{kj}$ .

The proof breaks down into two cases.

If r_ij = 1, it is trivial to show that $r_{ij}^{(2)} \leq n$ .

If r_ij = 0, then by the transitivity of R, there is no k ∈ {1, 2, …, n} such that r_ik × r_kj = 1 (or else it would contradict the transitivity of R). Hence, we have $r_{ij}^{(2)} = 0$ .

Summarizing both cases, we obtain M_R · M_R ≤ nM_R.

If r_ik = r_kj = 1, then $r_{ij}^{(2)} = \sum_{h = 1}^{n} r_{ih} \times r_{hj} \geq 1$ . By assumption, we have n × r_ij ≥ 1, which implies r_ij = 1. Thus, R is transitive. □

3 Granular equality relation and granularly finer relation between knowledge structures

In this section, after introducing the concept of knowledge structures, we define two novel kinds of relations between them, the granular equality and granularly finer relations, which are fundamental concepts in this paper.

In the following discussion, we always suppose that R is at least reflexive.

3.1 Granular equality relation between knowledge structures

From a general relation R on U, we can derive a knowledge structure K (R) = (R_s (x₁) , R_s (x₂) , …, R_s (x_n)) with each of its component being an information granule. Thus, there is a one-to-one correspondence between the knowledge structure K (R) on U and the granular structure (U, R). Similarly, corresponding to (U, R ), the knowledge structure base, denoted by K ( R ), is a collection of all knowledge structures on U. For example, K (I) = ({x₁} , {x₂} , …, {x_n}) and $K (E) = (\underset{n times}{\underset{︸}{U, U, \dots, U}})$ are two equivalence knowledge structures.

Before presenting the notion of the granular equality relation, let us return to the matrix of a relation. In determining M_R, we only require that the rows and columns are arranged in the same ordering. For example, given K (R) = ({x₁} , {x₁, x₂} , {x₁, x₃}), a preorder knowledge structure on U = {x₁, x₂, x₃}, we have, relative to the ordering x₁, x₂, x₃, $M_{R} = (\begin{matrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{matrix}) .$ However, if we arrange the ordering of U to x₂, x₃, x₁, then $M_{R}^{*} = (\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{matrix}) .$ In fact, M_R and $M_{R}^{*}$ both reflect the granular structure of R on U, although different in forms. What is the relation between M_R and $M_{R}^{*}$ ? To solve this problem, we must build up first the connection between x₁, x₂, x₃ and x₂, x₃, x₁. There exists a nondegenerate linear transformation (from the following analysis, also an orthogonal transformation) between them: $(\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}) = (\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix}) (\begin{matrix} x_{2} \\ x_{3} \\ x_{1} \end{matrix}) ≜ M_{f} (\begin{matrix} x_{2} \\ x_{3} \\ x_{1} \end{matrix}),$ where M_f also corresponds to the bijection f : U → U as follows:

Then, we can confirm that $M_{R}^{*} = M_{f}^{T} \cdot M_{R} \cdot M_{f} = M_{f}^{- 1} \cdot M_{R} \cdot M_{f}$ . That is to say, matrices M_R and $M_{R}^{*}$ are, in fact, congruent and similar. A permutation of the set U is a bijective function from U to itself and also a bijective function corresponds to a Boolean matrix P whose column sums and row sums are all 1 (therefore, P is an orthogonal matrix whose inverse equals its transpose). Throughout this paper, we denote by P the set of all such permutation matrices. As is well known, there are n ! permutations of n items, and hence P is finite. Actually, P is a group under multiplication ·, and it is isomorphic to the symmetric group S_n, consisting of all permutations of U. For example, if |U|=3, we have $\begin{matrix} P & = {(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}), (\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}), (\begin{matrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{matrix}), \\ (\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{matrix}), (\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix}), (\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{matrix})}, \end{matrix}$ which is isomorphic to S₃ represented by {(1) , (1, 2) , (1, 3) , (2, 3) , (1, 3, 2) , (1, 2, 3)} in cycle notation.

The foregoing discussion provides us with the transformation method among knowledge structures. We say two knowledge structures are granularly equal if one could be changed to the other through such a transformation. More formally, we have the following definition.

Definition 3.1. Let K ( R ) be a knowledge structure base on U, and K (Q) , K (R) ∈ K ( R ). Define a relation ≅ on K ( R ) as follows: K (Q) ≅ K (R) iff there exists P ∈ P such that M_Q = P ^T · M_R · P. We say that K (Q) is granularly equal to K (R) if K (Q) ≅ K (R).

If K (Q) ≅ K (R), then the multisets [61] {|Q_s (x) | : x ∈ U} and {|R_s (x) | : x ∈ U} are equal (notice that here we do not mean that |Q_s (x) | = |R_s (x) |, ∀ x ∈ U). However, the inverse does not hold in general, as illustrated by a counterexample in Remark 3.1.

In the aforementioned example, $M_{R}^{*}$ can induce the knowledge structure K (Q) = ({x₁, x₃} , {x₂, x₃} , {x₃}). According to Definition 3.1, we have K (Q) ≅ K (R). Given K (S) = ({x₁, x₂} , {x₂} , {x₂, x₃}), after some tedious manipulation, we can get K (S) ≅ K (R). It can be verified that no other knowledge structure is granularly equal to K (R) except K (Q), K (R) and K (S). The granular equality of K (S) and K (Q) is guaranteed by the following theorem.

Theorem 3.1. The relation ≅ between knowledge structures is an equivalence relation.

Proof. It is immediate from Definition 3.1.□

By Theorem 3.1, the quotient set K ( R )/≅ is a partition of K ( R ). Select an arbitrary knowledge structure from every equivalence class as a whole, and denote it by K(U). Obviously, we have K(U)⊆ K( R).

Theorem 3.2. Let K (Q) , K (R) be two knowledge structures on U. If K (Q) ≅ K (R), then Q preserves the properties of reflexivity, symmetry and transitivity, if any, of R.

Proof. By the supposition that K (Q) ≅ K (R), then there exists P ∈ P such that M_Q = P ^T · M_R · P.

Assume that R is reflexive, then we have M_R ≥ M_I. Furthermore, M_Q = P ^T · M_R · P ≥ P ^T · P = M_I, i.e., Q is reflexive.

Assume that R is symmetric, then we have $M_{R}^{T} = M_{R}$ . Therefore, $M_{Q}^{T} = P^{T} \cdot M_{R}^{T} \cdot P = M_{Q}$ , i.e., Q is symmetric.

Assume that R is transitive, then by Lemma 2.1, we have M_R · M_R ≤ nM_R. Thus, $\begin{matrix} M_{Q} \cdot M_{Q} & = P^{T} \cdot M_{R} \cdot P \cdot P^{T} \cdot M_{R} \cdot P \\ = P^{T} \cdot M_{R} \cdot M_{R} \cdot P \\ \leq n P^{T} \cdot M_{R} \cdot P \\ = {nM}_{Q} . \end{matrix}$ Hence, again by Lemma 2.1, Q is transitive. □

Corollary 3.1. Let K (Q) , K (R) be two knowledge structures on U and K (Q) ≅ K (R). Then, K (Q) is a reflexive (preorder, tolerance) or an equivalence knowledge structure iff K (R) is a reflexive (preorder, tolerance) or an equivalence knowledge structure.

If two knowledge structures are granularly equal, then by Theorem 3.2 and Corollary 3.1, they are of the same type. In other words, through the orthogonal transformation proposed in this paper, the new generated knowledge structure maintains the property of the original one. Furthermore, in granular computing, we do not distinguish these two knowledge structures here because they are of the same type and their information granularity (shall be introduced in Section 4) are identical. In this sense, K(U) is a simplification of K ( R ), and therefore is called the simplified knowledge structure base on U. That is, from the viewpoint of information granularity, we need only to concentrate our attention on K(U), not necessarily on K ( R ).

3.2 Granularly finer relation between knowledge structures

First, let us recall the definition of the classic finer relation between knowledge structures.

Definition 3.2. [28, 51] Let (U, R ) be a granular structure base, Q, R ∈ R , and K (Q) , K (R) be their corresponding knowledge structures. Define a partial order relation ⪯ as follows: K (Q) ⪯ K (R) iff Q_s (x) ⊆ R_s (x) for all x ∈ U iff M_Q ≤ M_R. We say that K (Q) is finer than K (R) or K (R) is coarser than K (Q) if K (Q) ⪯ K (R). Additionally, we say that K (Q) is strictly finer than K (R) or K (R) is strictly coarser than K (Q), denoted by K (Q) ≺ K (R), if K (Q) ⪯ K (R) and K (Q) ≠ K (R). Clearly, K (I) and K (E) are the finest and the coarsest knowledge structures in the knowledge structure base K ( R ).

Remark 3.1. In [28, 51], an arrangement of the information granules in K (R) is used as the first step to better characterize the degree of fineness or coarseness between knowledge structures. Continued from the preceding example, if we take K (O) = ({x₁} , {x₁, x₃} , {x₁, x₂}), then obviously K (O) is a sequence of K (R). The relation associated with K (O) is not reflexive at all. It is intended to exchange positions of x₂ and x₃, which is reflected in the ordering only for rows but not for columns in K (O). Another strategy utilizes the set-size nature of information granules. If we take K (T) = ({x₁, x₂} , {x₁, x₂} , {x₃}), an equivalence knowledge structure, then clearly, the corresponding information granules of K (R) and K (T) have the same sizes. However, these two knowledge structures are not of the same type and are surely not granularly equal. For the sake of caution, we had better not require the equality of their information granularity, but it does not mean that the information granularity cannot be identical to each other.

Definition 3.3. Let K ( R ) be a knowledge structure base on U, and K (Q) , K (R) ∈ K ( R ). Define a relation ≾ as follows: K (Q) ≾ K (R) iff there exists P ∈ P such that M_Q ≤ P ^T · M_R · P. We say that K (Q) is granularly finer than K (R) or K (R) is granularly coarser than K (Q) if K (Q) ≾ K (R). Additionally, we say that K (Q) is strictly granularly finer than K (R) or K (R) is strictly granularly coarser than K (Q), denoted by $K (Q) \underset{⋨}{≺} K (R)$ , if K (Q) ≾ K (R) and K (Q) ≇ K (R).

From Definition 3.3, we can get the following easily.

If K (Q) ⪯ K (R), then K (Q) ≾ K (R). Namely, the relation ⪯ is a subset of the relation ≾.

If K (Q) ≾ K (R), then there exists a knowledge structure K (S) such that K (Q) ⪯ K (S), where K (S) is granularly equal to K (R).

If K (Q) ≅ K (R), K (R) ≾ K (S), and K (S) ≅ K (T), then K (Q) ≾ K (T).

Items (1) and (2) indicate the relations between ⪯ and ≾, and Item (3) suggests that relation ≾ is compatible with ≅.

Theorem 3.3. The relation ≾ is a partial order relation on K(U).

Proof. We prove that ≾ is reflexive, antisymmetric and transitive on K(U) as follows.

Since M_I ∈ P , the reflexivity of ≾ is obvious (choose M_I for P).

If K (Q) ≾ K (R) and K (R) ≾ K (Q), then there exist P₁, P₂ ∈ P such that $M_{Q} \leq P_{1}^{T} \cdot M_{R} \cdot P_{1}$ and $M_{R} \leq P_{2}^{T} \cdot M_{Q} \cdot P_{2}$ . Then we take P = P₂ · P₁ ∈ P , thus $(q_{ij}) ≜ M_{Q} \leq P_{1}^{T} \cdot M_{R} \cdot P_{1} \leq P^{T} \cdot M_{Q} \cdot P ≜ (q_{ij}^{'}) .$ Thus, $\sum_{i, j} q_{ij} \leq \sum_{i, j} q_{ij}^{'}$ . In fact, the multiset of all entries ${q_{ij}^{'}}$ is an arrangement of the multiset {q_ij}, that is, ${q_{ij}^{'}} = {q_{ij}}$ . This forces M_Q = P ^T · M_Q · P, which leads to the conclusion that $M_{Q} = P_{1}^{T} \cdot M_{R} \cdot P_{1}$ , i.e., K (Q) ≅ K (R).

If K (Q) ≾ K (R) and K (R) ≾ K (S), then there exist P₁, P₂ ∈ P such that $M_{Q} \leq P_{1}^{T} \cdot M_{R} \cdot P_{1}$ and $M_{R} \leq P_{2}^{T} \cdot M_{S} \cdot P_{2}$ . Taking P = P₂ · P₁ ∈ P , it holds that M_Q ≤ P ^T · M_S · P, namely, K (Q) ≾ K (S).

Thus the relation ≾ is a partial order relation on K(U). □

In light of Theorem 3.3, the pair ( K (U) , ≾) is a poset. However, (K ( R ) , ≾) is not necessarily a poset because ≾ is not always a partial order relation on K ( R ) (By K (Q) ≾ K (R) and K (R) ≾ K (Q), we cannot obtain K (Q) = K (R)).

Remark 3.2. For K (Q) , K (R) ∈ K ( R ), if K (Q) ≾ K (R), then there exists a sequence K′ (R) of K (R), where $K^{'} (R) = (R_{s} (x_{1}'), R_{s} (x_{2}^{'}), \dots, R_{s} (x_{n}^{'}))$ and ${x_{1}, x_{2}, \dots, x_{n}} = {x_{1}', x_{2}^{'}, \dots, x_{n}^{'}}$ such that $| Q_{s} (x_{i}) | \leq | R_{s} (x_{i}^{'}) |, \forall 1 \leq i \leq n$ . That is to say, the granularly finer relation ≾ is a subset of the granulation finer relation defined in [28, 51]. If we further assume that $K (Q) \underset{⋨}{≺} K (R)$ , then there additionally exists j ∈ {1, 2, …, n} such that $| Q_{s} (x_{j}) | < | R_{s} (x_{j}^{'}) |$ .

Remark 3.3. For K (Q) , K (R) ∈ K ( R ), if K (Q) ≾ K (R), then there exists a mapping φ from R to ${\bar{n}}^{n}$ , where $\bar{n} = {1, 2, \dots, n}$ and φ (R) = (|R_s (x₁) |, |R_s (x₂) |, …, |R_s (x_n) |) _↘ (here x_↘ denotes the vector obtained from x by arranging its components in nonincreasing order) such that φ (Q) ≤ φ (R). That is to say, the granularly finer relation ≾ is a subset of the finer relation defined in [52]. If we further assume that $K (Q) \underset{⋨}{≺} K (R)$ , then φ (Q) < φ (R).

3.3 Hasse diagram of simplified knowledge structure base

As stated in Section 3.2, ( K (U) , ≾) is a finite poset. In this section, it is visualized using a Hasse diagram, which displays the order relations through a hierarchical structure. In the subsequent discussion, we assume that the reader is familiar with the basic notions on ordered structures (for more information, one can refer to, e.g., [56, 62]).

In ( K (U) , ≾), we say that K (Q) is covered by K (R), or equivalently, K (R) covers K (Q) if $K (Q) \underset{⋨}{≺} K (R)$ and there is no K (S) ∈ K (U) such that $K (Q) \underset{⋨}{≺} K (S) \underset{⋨}{≺} K (R)$ . For a finite poset, the cover relation determines the partial order relation, and vice versa.

Every knowledge structure in K(U) is depicted as a node in a Hasse diagram. If K (Q) is covered by K (R), then we connect the nodes representing K (Q) and K (R) by an increasing line segment, i.e., we join both nodes and put the node of K (Q) below the node of K (R). If $K (Q) \underset{⋨}{≺} K (R)$ , then there exists at least one path connecting the two nodes corresponding to K (Q) and K (R) laid in the lower and upper layers, respectively. By the fact that ( K (U) , ≾) is bounded with the minimum element K (I) and the maximum element K (E), K (I) and K (E) are located at the bottom and top of the Hasse diagram, respectively.

If U = {x₁, x₂}, there are four reflexive knowledge structures on U, which are K (Q) = ({x₁} , {x₂}), K (R) = ({x₁, x₂} , {x₂}), K (S) = ({x₁} , {x₁, x₂}) and K (T) = ({x₁, x₂} , {x₁, x₂}). The relations between them are presented in Fig. 1(a), meaning that K (Q) ≾ K (R) ≾ K (T), K (Q) ≾ K (S) ≾ K (T) and K (R) ≅ K (S). For briefness, we denote the label of each node, for example, ({x₁, x₂} , {x₂}) by (12, 2). The same abbreviations are employed in the discussion below.

Fig. 1

Hasse diagrams of simplified knowledge structure bases of all types on U = {x₁, x₂}.

Because K (R) ≅ K (S), according to the above analysis, we can choose one representative into the simplified knowledge structure base. The Hasse diagram of the simplified reflexive or preorder knowledge structure base on U is shown in Fig. 1(b). While the Hasse diagram of the simplified tolerance or equivalence knowledge structure base on U is given in Fig. 1(c).

As can be seen from Fig. 1(b) and (c), both of them are chains, and therefore they are lattices, obviously. Then, a natural question arises: can all the simplified knowledge structure bases with the granularly finer relation form a lattice? A negative answer to this question is given by the following examples.

Example 3.1. Given U = {x₁, x₂, x₃}, draw the Hasse diagrams for ( K (U) , ≾) of all types. Then determine whether each of them is a lattice or not.

The ordered structures of simplified reflexive, preorder, tolerance and equivalence knowledge structure base ( K (U) , ≾) are illustrated by Hasse diagrams shown in the subfigures of Fig. 2, respectively.

Fig. 2

Hasse diagrams of simplified knowledge structure bases of all types on U = {x₁, x₂, x₃}.

Clearly, the ordered sets in Fig. 2(c) and (d) are chains of lengths 3 and 2, respectively, and therefore they are lattices. Now let us turn our attention to the other two posets. Denote by K (Q), K (R), K (S) and K (T) the knowledge structures that are marked in Fig. 2(b). Thus, we can get that the infimum of K (S) and K (T) does not exist, nor does the supremum of K (Q) and K (R). Therefore, the simplified preorder knowledge structure base is not a meet semilattice, neither a join semilattice, let alone a lattice. Likewise, the ordered structure given in Fig. 2(a) is not a lattice either.

Example 3.2. Given U = {x₁, x₂, x₃, x₄, x₅}, draw the Hasse diagram for ( K (U) , ≾) derived from the equivalence knowledge structure base. Then determine whether it is a lattice or not.

The Hasse diagram of the simplified equivalence knowledge structure base on U is depicted in Fig. 3. It can be easily verified that ( K (U) , ≾) is not a lattice.

In fact, there are 52 equivalence knowledge structures on U originally. After simplification, only 7 kinds of equivalence knowledge structures are left (the numbers of the granularly equal knowledge structures are listed in the right column of Fig. 3). That is, we only need to investigate these 7 knowledge structures, which significantly decreases the complexity. Operators ∪, ∩ , - and ≀ were introduced in-between knowledge structures, which can effectively achieve composition, decomposition and transformation of knowledge structures [28, 51]. It is proved that (K ( R ) , ∪ , ∩) is a distributive lattice and (K ( R ) , ∪ , ∩ , ≀) is a complemented lattice, which indicates that the poset (K ( R ) , ⪯) is a lattice from the algebraic point of view. While, as illustrated above, the poset ( K (U) , ≾) is not necessarily a lattice.

Fig. 3

Hasse diagram of the simplified equivalence knowledge structure base on U = {x₁, x₂, x₃, x₄, x₅}.

4 Information granularity of knowledge structures

In this section, an axiomatic definition of the information granularity of knowledge structures is proposed based on the concepts of granular equality relation and granularly finer relation.

Definition 4.1. Let K ( R ) be a knowledge structure base on U and G be a mapping from K ( R ) to $ℝ$ defined by $K (R) \overset{G}{⟼} G (R), \forall R \in R$ . For all K (R) ∈ K ( R ), G (R) is called an information granularity of K (R) if it satisfies the following properties:

(Nonnegativity) G (R) ≥0;

(Invariability) ∀K (Q) ∈ K ( R ), if K (Q) ≅ K (R), then G (Q) = G (R);

(Monotonicity) ∀K (Q) ∈ K ( R ), if K (Q) ≾ K (R), then G (Q) ≤ G (R).

Moreover, G (R) is called a strict information granularity of K (R) if it satisfies (1), (2) and (3’):

(Strict monotonicity) ∀K (Q) ∈ K ( R ), if $K (Q) \underset{⋨}{≺} K (R)$ , then G (Q) < G (R).

The first property indicates that the information granularity G (R) requires nonnegativity for an arbitrary knowledge structure K (R) ∈ K ( R ). The second condition says that if two knowledge structures are granular equal, then they must have the same information granularity. The third constraint expresses that the (strict) information granularity is (strictly) monotone nondecreasing with respect to the (strictly) granularly finer relation. The main differences lie in the relations defined between knowledge structures when compared with the existing axiomatic definitions of information granularity. As we have already stressed in this paper, the information granularity is approached in a more cautious way.

In a broad sense, the information granularity of K (R) indicates an average measure of information granules induced by R. Define a mapping $G' : K (U) \to ℝ$ whose object-value correspondences inherit from G. Then G′ is, in essence, an order-preserving mapping from ( K (U) , ≾) to $(ℝ, \leq)$ .

Theorem 4.1. (Extremum). Let K ( R ) be a knowledge structure base on U, K (R) ∈ K ( R ) and G (R) be the information granularity of K (R). Then, G (R) achieves its minimum value if R = I and achieves its maximum value if R = E.

Proof. For any reflexive relation R on U, we have I ⊆ R ⊆ E, i.e., K (I) ⪯ K (R) ⪯ K (E), from which it follows that K (I) ≾ K (R) ≾ K (E). By the monotonicity of G (R), we can conclude that G (I) ≤ G (R) ≤ G (E).□ If G (R) is a strict information granularity of K (R), then G (R) achieves its minimum value iff R = I and achieves its maximum value iff R = E.

Note that G (R) is bounded with G (I) ≤ G (R) ≤ G (E). If G is a nonnegative constant function, it meets all three conditions but is meaningless in practice. Thus, we can always set G (I) < G (E). The following transform can be applied if one additionally requires that G′ (I) =0 and G′ (E) =1, $G' (R) = \frac{G (R) - G (I)}{G (E) - G (I)} .$ (3)

Several different kinds of measures of knowledge structures have been given in the existing literature. In the following, we show that they are special forms of the information granularity in the sense of Definition 4.1.

Let (U, R) be a granular structure. The granulation of knowledge of K (R) is defined as [43, 44] $GK (R) = \frac{1}{| U |} \sum_{x \in U} \frac{| R_{s} (x) |}{| U |} .$ (4) By the matrix representation of relation R, GK (R) can be equivalently expressed [63] in the form of $GK (R) = \frac{1}{| U |^{2}} \sum_{i, j} r_{ij} .$ (5) In formal concept analysis, Huang et al. [64] proposed the information granulation of a subcontext (U, Q, R_Q) as $IG (Q) = \frac{1}{| U |} \sum_{x \in U} \frac{| x^{' Q' Q} |}{| U |},$ (6) where R_Q is a binary relation between the object set U and the attribute set Q, and x^′Q′Q is the extension of object concepts [10]. Taking into consideration that x^′Q′Q is object-oriented information granules of (U, Q, R_Q), we know Eq. 6 is consistent with Eq. 4.

Theorem 4.2. Knowledge granulation GK (R) is an (strict) information granularity of K (R).

Proof. Let us prove that GK (R) satisfies all properties of an (strict) information granularity.

$GK (R) = \frac{1}{| U |^{2}} \sum_{i, j} r_{ij} \geq \frac{1}{| U |^{2}} \sum_{i} r_{ii} = \frac{1}{| U |} > 0$ .

For all K (Q) ∈ K ( R ), if K (Q) ≅ K (R), then ∑_i,jq_ij = ∑_i,jr_ij, and subsequently GK (Q) = GK (R).

For all K (Q) ∈ K ( R ), if K (Q) ≾ K (R), then ∑_i,jq_ij ≤ ∑_i,jr_ij, and subsequently GK (Q) ≤ GK (R).

For all K (Q) ∈ K ( R ), if $K (Q) \underset{⋨}{≺} K (R)$ , then ∑_i,jq_ij < ∑_i,jr_ij, and subsequently GK (Q) < GK (R).

Summing up the above results, we obtain GK (R) is an (strict) information granularity of K (R).□

Inspired by this formula, we can define a general form of an information granularity based on the set-size character of information granules.

Definition 4.2. Let (U, R) be a granular structure. For knowledge structure K (R), IG (R) is defined by $IG (R) = λ \sum_{x \in U} f (| R_{s} (x) |),$ (7) where λ > 0 and f (·) is a nonnegative (strictly) monotone nondecreasing function on [1, |U|].

Theorem 4.3. IG (R) is an (strict) information granularity of K (R).

Proof. Similar to the proof of Theorem 4.2.□ Specially, we have the following.

If $λ = \frac{1}{| U |}$ , $f (t) = \frac{t}{| U |}$ , then IG (R) = GK (R).

If $λ = \frac{1}{| U |}$ , $f (t) = \frac{t (t - 1)}{| U | (| U | - 1)}$ , then IG (R) = CG (R), the combination granulation of the knowledge structure K (R) [49, 50]: $CG (R) = \frac{1}{| U |} \sum_{x \in U} \frac{C_{| R_{s} (x) |}^{2}}{C_{| U |}^{2}} .$ (8)

If $λ = \frac{1}{| U |}$ , f (t) = log t, then IG (R) = E_r (R), the rough entropy of the knowledge structure K (R) [43 , 48]: $E_{r} (R) = \frac{1}{| U |} \sum_{x \in U} log | R_{s} (x) | .$ (9)

Example 4.1. Given U = {x₁, x₂, x₃, x₄}, the Hasse diagram for the simplified equivalence knowledge structure base ( K (U) , ≾) is presented in Fig. 4. Thus, ( K (U) , ≾) is a (distributive) lattice.

Fig. 4

Information granularity of knowledge structures in K(U) induced from f₃.

For simplicity, we take $λ = \frac{1}{| U |}$ in all cases. Then, for different fs, one can obtain different information granularity of the same knowledge structure. The results are summarized in Table 1 in terms of some typical functions. Furthermore, the information granularity of knowledge structures in K(U) induced from f₃, the rough entropy (the unit is the bit), is also illustrated in Fig. 4 to give a visual explanation.

Table 1

Information granularity of knowledge structures in K(U) induced from different functions

K(U)	$f_{1} (t) = \frac{t}{\| U \|}$	$f_{2} (t) = \frac{t (t - 1)}{\| U \| (\| U \| - 1)}$	f₃ (t) = log t	$f_{4} (t) = exp (\frac{t}{\| U \|} - 1)$	$f_{5} (t) = \frac{4}{π} arctan \frac{t}{\| U \|}$	$f_{6} (t) = 1 - \frac{1}{t}$
①	0.2500	0.0000	0.0000	0.4724	0.3119	0.0000
②	0.3750	0.0833	0.5000	0.5394	0.4511	0.2500
③ ≜K (Q)	0.5000	0.1667	1.0000	0.6065	0.5903	0.5000
④ ≜K (R)	0.6250	0.3750	1.1887	0.7022	0.6925	0.5000
⑤	1.0000	1.0000	2.0000	1.0000	1.0000	0.7500

From Table 1, one can observe that the granularly finer knowledge structure has a smaller information granularity, but not the other way around. For example, K (Q) ≾ K (R) does not hold, but IG (Q) ≤ IG (R) is true in all these cases. In fact, if the poset ( K (U) , ≾) is not a chain and | K (U) | = | IG (U) |, then ( K (U) , ≾) cannot be isomorphic to ( IG (U) , ≤), where IG(U) denotes the collection of all information granularity of knowledge structures in K(U).

The result is extended to a boarder class as follows.

Below let $F : ℝ^{n} \to ℝ$ be an n-ary function satisfying:

F (0, 0, …, 0) ≥0;

F is symmetric, i.e., F (t₁, t₂, …, t_n) = F (t_p(1), t_p(2), …, t_p(n)) for every (t₁, t₂, …, t_n) and every permutation (p (1) , p (2) , …, p (n)) of (1, 2, …, n);

F is monotone nondecreasing in all its arguments, i.e., F (s₁, s₂, …, s_n) ≤ F (t₁, t₂, …, t_n) if s_i ≤ t_i for all 1 ≤ i ≤ n.

Definition 4.3. Let (U, R) be a granular structure. For knowledge structure K (R), IG′ (R) is defined by $IG' (R) = F (f (| R_{s} (x_{1}) |), f (| R_{s} (x_{2}) |), \dots, f (| R_{s} (x_{n}) |)) .$

Theorem 4.4. IG′ (R) is an information granularity of K (R).

Proof. It suffices to show that IG′ (R) satisfies all items in Definition 4.1.

By the monotonicity of F and nonnegativity of f, we have $\begin{matrix} F (f (| R_{s} (x_{1}) |), f (| R_{s} (x_{2}) |), \dots, f (| R_{s} (x_{n}) |)) \\ \geq F (0, 0, \dots, 0) \geq 0 . \end{matrix}$

For all K (Q) ∈ K ( R ), if K (Q) ≅ K (R), then {|Q_s (x) | : x ∈ U} = {|R_s (x) | : x ∈ U}. By the symmetry of F, we have $\begin{matrix} F (f (| Q_{s} (x_{1}) |), f (| Q_{s} (x_{2}) |), \dots, f (| Q_{s} (x_{n}) |)) \\ = F (f (| R_{s} (x_{1}) |), f (| R_{s} (x_{2}) |), \dots, f (| R_{s} (x_{n}) |)) . \end{matrix}$

\begin{matrix} F (f (| [x_{1}]_{Q} |), f (| [x_{2}]_{Q} |), \dots, f (| [x_{n}]_{Q} |)) \\ \leq F (f (| [x_{p (1)}]_{R} |), f (| [x_{p (2)}]_{R} |), \dots, f (| [x_{p (n)}]_{R} |)) \\ = F (f (| [x_{1}]_{R} |), f (| [x_{2}]_{R} |), \dots, f (| [x_{n}]_{R} |)) . \end{matrix}

□

Furthermore, if F is strictly monotone nondecreasing, then IG′ (R) is a strict information granularity of K (R).

Specially, if $F (t_{1}, t_{2}, \dots, t_{n}) = λ \sum_{i = 1}^{n} t_{i}$ , then IG′ (R) = IG (R).

Moreover, if we choose suitable F and f, we can confirm that $\frac{1}{| U |} \sqrt[n]{\prod_{x \in U} | R_{s} (x) |}$ and ${(\sum_{x \in U} \frac{1}{| R_{s} (x) |})}^{- 1}$ are both information granularity of the knowledge structure K (R), which are known as the geometric average [65] and the harmonic average of the strength of patterns induced by R, respectively.

5 Application of information granularity to attribute significance measure

The attribute significance measure is usually used to measure the importance of attributes and used in many applications. In this section, we carry out a method of measuring the attribute significance based on the information granularity.

For algorithmic reasons, the information regarding the objects is supplied in the form of an information system, whose separate rows refer to distinct objects, and whose columns refer to different attributes considered, and entries of the information system are attribute-values, called descriptors.

A decision system [19] is a two-tuple (U, C ∪ D), where U is a finite and nonempty set of objects, C and D are condition attributes and decision attributes, respectively. Let a (x) denote the value of sample x at attribute a. According to the descriptors, we can induce binary relations on U. For example, if all descriptors are numerical, then, ∀A ⊆ C ∪ D, $A_{s}^{pre} (x) = {y \in U : a (x) \leq a (y), \forall a \in A},$ (10) $A_{s}^{tol} (x) = {y \in U : | a (x) - a (y) | \leq Δ, \forall a \in A},$ (11) $A_{s}^{equ} (x) = {y \in U : a (x) = a (y), \forall a \in A}$ (12) are preorder, tolerance and equivalence information granules of x with respect to A, respectively. From the construction method, it follows that the relation R_A generated by A is uniquely determined by the attribute set A.

The joint information granularity of K (R_A) and K (R_B) is $IG (A, B) = IG (R_{A} \cap R_{B}),$ (13) where the knowledge structure corresponding to the relation R_A ∩ R_B is in fact generated by the attribute set A ∪ B, i.e., (A ∪ B) _s (x) = A_s (x) ∩ B_s (x) , ∀ x ∈ U. Because K (R_A ∩ R_B) is finer than both K (R_A) and K (R_B), we have $IG (A, B) \leq min {IG (A), IG (B)} .$ From this point, one can see that information granularity and information entropy [43, 44], a pair of dual concepts, perform in different ways.

Moreover, the conditional information granularity of A relative to B is $IG (B | A) = IG (A) - IG (A, B) .$ (14) From the above formula, we have the following statements.

If B =∅, then IG (B | A) =0.

If K (R_B) ⪯ K (R_A), then IG (B | A) = IG (A) - IG (B) and IG (A | B) =0.

For a decision system (U, C ∪ D), we usually consider the conditional information granularity IG (D | A), where A ⊆ C. For the three commonly used forms of information granularity, we have $GK (D | A) = \frac{1}{| U |} \sum_{x \in U} \frac{| A_{s} (x) | - | A_{s} (x) \cap D_{s} (x) |}{| U |},$ (15) $CG (D | A) = \frac{1}{| U |} \sum_{x \in U} \frac{C_{| A_{s} (x) |}^{2} - C_{| A_{s} (x) ⋂ D_{s} (x) |}^{2}}{C_{| U |}^{2}},$ (16) $E_{r} (D | A) = \frac{1}{| U |} \sum_{x \in U} log \frac{| A_{s} (x) |}{| A_{s} (x) \cap D_{s} (x) |} .$ (17)

Based on the notations introduced above, we give the significance of a with respect to B (a ∉ B) relative to D: $sig (a, B, D) = IG (D | B) - IG (D | B \cup {a}) .$ (18) The variable sig (a, B, D) describes an increase in the information granularity by adding a to B relative to D. In Eq. 18, if B =∅ we simply write it as sig (a, D).

A simple example about Play Golf? on Saturday mornings [66] is elaborated below to substantiate the conceptual arguments. The dataset is shown in Table 2, each instance being characterized by four different aspects of weather conditions: outlook (o), temperature (t), humidity (h) and windy (w).

Table 2

Play Golf? in response to weather conditions

Day	Outlook	Temperature	Humidity	Windy	Play Golf?
1	sunny	hot	high	false	no
2	sunny	hot	high	true	no
3	overcast	hot	high	false	yes
4	rainy	mild	high	false	yes
5	rainy	cool	normal	false	yes
6	rainy	cool	normal	true	no
7	overcast	cool	normal	true	yes
8	sunny	mild	high	false	no
9	sunny	cool	normal	false	yes
10	rainy	mild	normal	false	yes
11	sunny	mild	normal	true	yes
12	overcast	mild	high	true	yes
13	overcast	hot	normal	false	yes
14	rainy	mild	high	true	no

First, we can compute the equivalence classes generated by U/R_o: ${1, 2, 8, 9, 11}, {3, 7, 12, 13}, {4, 5, 6, 10, 14}$ and thus the rough entropy of o is $E_{r} (o) = \frac{5 \times log 5 + 4 \times log 4 + 5 \times log 5}{14} = 2.2299 bits .$ Similarly, we have $\begin{matrix} E_{r} (t) = 2.2507 bits, & E_{r} (h) = 2.8074 bits, \\ E_{r} (w) = 2.8221 bits, & E_{r} (d) = 2.8671 bits . \end{matrix}$

Secondly, the equivalence classes generated by U/R_{o,d} are ${1, 2, 8}, {3, 7, 12, 13}, {4, 5, 10}, {6, 14}, {9, 11}$ and thus the joint rough entropy of o and d is $E_{r} (o, d) = \frac{4 \times log 2 + 6 \times log 3 + 4 \times log 4}{14} = 1.5364 bits .$ Similarly, we have $\begin{matrix} E_{r} (t, d) = 1.3396 bits, & E_{r} (h, d) = 2.0189 bits, \\ E_{r} (w, d) = 1.9300 bits . \end{matrix}$

Then, the conditional rough entropy of o relative to d is $E_{r} (d | o) = E_{r} (o) - E_{r} (o, d) = 0.6935 bits .$ Similarly, we have $\begin{matrix} E_{r} (d | t) = 0.9111 bits, & E_{r} (d | h) = 0.7885 bits, \\ E_{r} (d | w) = 0.8922 bits . \end{matrix}$

Subsequently, by E_r (d | ∅) = log 14 - E_r (d) =0.9403 bits, we can calculate the significance of o relative to d: ${sig}_{E_{r}} (o, d) = E_{r} (d | \emptyset) - E_{r} (d | o) = 0.2467 bits .$ Similarly, we have $\begin{matrix} {sig}_{E_{r}} (t, d) = 0.0292 bits, \\ {sig}_{E_{r}} (h, d) = 0.1518 bits, \\ {sig}_{E_{r}} (w, d) = 0.0481 bits . \end{matrix}$

We can thus conclude that “outlook” is the most important among these four weather conditions, which is in accordance with the result of ID3 algorithm.

Analogously, one can obtain the attribute significance measures by employing the granulation of knowledge and the combination granulation. The detailed results are displayed in Fig. 5.

Fig. 5

Attribute significance measures acquired by different information granularity.

From Fig. 5, we can see that the most important weather condition for playing golf outside is always “outlook”, although the attribute significance measures are different based on distinct forms of information granularity.

6 Conclusions

In real-life situations, crisp information granulation is an important format of information granulation, and set-based granular computing plays a fundamental role in human reasoning and problem solving. In this setting, the universe is clustered into granules by binary relations. To examine the average granulation degree of information granules, we introduce the granular equality relation and granularly finer relation in this paper. The former is defined through an orthogonal transformation with the transformation matrix being a permutation matrix, while the latter is defined based on the traditional finer relation and the orthogonal transformation. Moreover, the former is proved to be an equivalence relation on the knowledge structure base, while the latter is a partial order relation on the simplified knowledge structure base. By the simplified knowledge structure base, we mean a simplification of the whole knowledge structure base by choosing one representative element from every equivalence class of the knowledge structure base with respect to the granular equality relation. We emphasize again that although the granularly finer relation is a subset of the finer relations presented in Refs. [28 , 52], it does not mean that our approach is inferior nor superior to the others. In fact, the granularly finer relation overcomes the restriction of the classical finer relation and the looseness of the other existing finer relations. Thus, it is, from a conservative viewpoint, the most suitable relation between knowledge structures. Moreover, the (strict) information granularity of a knowledge structure is defined by a mapping that satisfies nonnegativity, invariability and (strict) monotonicity. A general form of the information granularity is given based on the set-size nature of information granules, and several existing concepts are shown to be its special cases. Finally, the attribute significance measure is proposed by using the information granularity which can serve as an alternative measure to feature selection.

Note that this paper is built upon the assumption that all objects in the universe are equipped with the same weight. However, due to the frequency of occurrences or preferences of decision makers, different weights are assigned to the objects sometimes. If so, only the objects with the same weight can be permutated, otherwise, there is no need to emphasize the weights of objects. To solve this problem, we can pretreat the set of permutation matrices by deleting some elements corresponding to arrangements of objects with different weights. For example, for U = {x₁, x₂, x₃}, if x₁ and x₂ are assigned with the same weight which differs from that of x₃, then x₁ and x₂ can be permutated if needed and we can take $\begin{matrix} P & = {(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}), (\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix})} . \end{matrix}$ In the future, we will consider the applications of the proposed measures in feature selection and decision tree induction. On the other hand, our work is built within the framework of crisp information granulation, which has much less capability than fuzzy information granulation in reflecting human reasoning and concept formation. The research on the information granularity of fuzzy information granules is our future work.

Footnotes

Acknowledgments

The author is grateful to the anonymous referees for their excellent comments on the initial manuscript. This research was supported by the National Natural Science Foundation of China (Grant no. 61806182), the Scientific Research Fund for Young Teachers of Zhengzhou University (Grant no. 32220326), the Research Base Program of New Disciplines in Economics and Management of Zhengzhou University (Grant no. 101/32610168) and the Training Project for Young Backbone Teachers of Colleges and Universities of Henan Province.

References

Zadeh

L.A.

, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90(2) (1997), 111–127.

Bargiela

and Pedrycz

, Granular Computing: An Introduction, Kluwer Academic Publishers, Boston, 2003.

Pedrycz

, Granular Computing: Analysis and Design of Intelligent Systems, CRC Press, Boca Raton, 2013.

Akram

, Luqman

and Al-Kenani

A.N.

, Certain models of granular computing based on rough fuzzy approximations, Journal of Intelligent and Fuzzy Systems (39) (2020), 2797–2816.

Cabrerizo

F.J.

, Herrera-Viedma

and Pedrycz

, A method based on PSO and granular computing of linguistic information to solve group decision making problems defined in heterogeneous contexts, European Journal of Operational Research 230(3) (2013), 624–633.

Cabrerizo

F.J.

, Morente-Molinera

J.A.

, Pedrycz

, Taghavi

and Herrera-Viedma

, Granulating linguistic information in decision making under consensus and consistency, Expert Systems with Applications 99 (2018), 83–92.

Cabrerizo

F.J.

, Al-Hmouz

, Morfeq

, Martínez

M.Á.

, Pedrycz

and Herrera-Viedma

, Estimating incomplete information in group decision making: A framework of granular computing, Applied Soft Computing 86 (2020), 105930.

, Mei

, Xu

and Qian

, Concept learning via granular computing: A cognitive viewpoint, Information Sciences 298 (2015), 447–467.

Wang

, Huang

and Li

, Uncertainty measurement based on information fusion of three-source datasets: A granular computing viewpoint, Journal of Intelligent and Fuzzy Systems 36(2) (2019), 1475–1490.

10.

W.Z.

, Leung

and Mi

J.S.

, Granular computing and knowledge reduction in formal contexts, IEEE Transactions on Knowledge and Data Engineering 21(10) (2009), 1461–1474.

11.

Yang

, Zhong

, Lang

, Qian

and Dai

, Granular matrix: A new approach for granular structure reduction and redundancy evaluation, IEEE Transactions on Fuzzy Systems 28(12) (2020), 3133–3144.

12.

Yao

J.T.

, Vasilakos

A.V.

and Pedrycz

, Granular computing: Perspectives and challenges, IEEE Transactions on Cybernetics 43(6) (2013), 1977–1989.

13.

Bargiela

and Pedrycz

, Toward a theory of granular computing for human-centered information processing, IEEE Transactions on Fuzzy Systems 16(2) (2008), 320–330.

14.

Qian

, Cheng

, Wang

, Liang

, Pedrycz

and Dang

, Grouping granular structures in human granulation intelligence,383}, Information Sciences 382– (2017), 150–169.

15.

Skowron

and Stepaniuk

, Information granules: Towards foundations of granular computing, International Journal of Intelligent Systems 16(1) (2001), 57–85.

16.

Lin

T.Y.

, Granular computing on binary relations I: Data mining and neighborhood systems, in: L. Polkowski and A. Skowron (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications, Springer-Verlag, Berlin, 1998, pp. 107–121.

17.

Lin

T.Y.

, Granular computing on binary relations II: Rough set representations and belief functions, in: L. Polkowski and A. Skowron (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications, Springer-Verlag, Berlin, 1998, pp. 121–140.

18.

Moore

R.E.

, Methods and Applications of Interval Analysis, SIAM, Philadelphia, 1979.

19.

Pawlak

, Rough sets, International Journal of Computer and Information Sciences 11(5) (1982), 341–356.

20.

Shafer

, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976.

21.

, Xie

and Yu

, Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, Pattern Recognition 40(12) (2007), 3509–3521.

22.

, Zhang

, Ge

, Xie

, Zhang

and Wen

C.F.

, Uncertainty measurement for a fuzzy relation information system, IEEE Transactions on Fuzzy Systems 27(12) (2019), 2338–2352.

23.

Qian

, Liang

, Wu

W.Z.

and Dang

, Information granularity in fuzzy binary GrC model, IEEE Transactions on Fuzzy Systems 19(2) (2011), 253–264.

24.

Qian

, Li

, Liang

, Lin

and Dang

, Fuzzy granular structure distance, IEEE Transactions on Fuzzy Systems 23(6) (2015), 2245–2259.

25.

Song

, Yang

, Song

, Yu

and Yang

, Hierarchies on fuzzy information granulations: A knowledge distance based lattice approach, Journal of Intelligent and Fuzzy Systems 27(3) (2014), 1107–1117.

26.

Yang

, Wang

, Zhang

and Wang

, Knowledge distance measure for the multi-granularity rough approximations of a fuzzy concept, IEEE Transactions on Fuzzy Systems 28(4) (2020), 706–717.

27.

Zadeh

L.A.

, Fuzzy sets and information granularity, in: M.M. Gupta, R.K. Ragade and R.R. Yager (Eds.), Advances in Fuzzy Set Theory and Applications, North-Holland, Amsterdam, 1979, pp. 3–18.

28.

Qian

, Zhang

, Li

, Hu

and Liang

, Set-based granular computing: A lattice model, International Journal of Approximate Reasoning 55(3) (2014), 834–852.

29.

W.S.

and Hu

B.Q.

, Approximate distribution reducts in inconsistent interval-valued ordered decision tables, Information Sciences 271 (2014), 93–114.

30.

W.S.

and Hu

B.Q.

, Dominance-based rough set approach to incomplete ordered information systems,347, Information Sciences 346{– (2016), 106–129.

31.

, Yu

, Liu

and Wu

, Neighborhood rough set based heterogeneous feature subset selection, Information Sciences 178(18) (2008), 3577–3594.

32.

, Li

, Zhang

and Xie

, Knowledge structures in a knowledge base, Expert Systems 33(6) (2016), 581–591.

33.

Lin

, Hu

, Liu

, Chen

and Duan

, Multi-label feature selection based on neighborhood mutual information, Applied Soft Computing 38 (2016), 244–256.

34.

Wang

, Huang

, Shao

, Hu

and Chen

, Feature selection based on neighborhood self-information, IEEE Transactions on Cybernetics 50(9) (2020), 4031–4042.

35.

Chen

, Lin

T.Y.

and Xie

, Knowledge approximations in binary relation: Granular computing approach, International Journal of Intelligent Systems 28(9) (2013), 843–864.

36.

Cabrerizo

F.J.

, Ureña

, Pedrycz

and Herrera-Viedma

, Building consensus in group decision making with an allocation of information granularity, Fuzzy Sets and Systems 255 (2014), 115–127.

37.

Dai

, Xu

, Wang

and Tian

, Conditional entropy for incomplete decision systems and its application in data mining, International Journal of General Systems 41(7) (2012), 713–728.

38.

Deng

, Yang

and Hu

, Feature selection in decision systems based on conditional knowledge granularity, International Journal of Computational Intelligence Systems 4(4) (2011), 655–671.

39.

W.S.

and Hu

B.Q.

, Dominance-based rough fuzzy set approach and its application to rule induction, European Journal of Operational Research 261(2) (2017), 690–703.

40.

Sun

, Xu

and Tian

, Feature selection using rough entropy-based uncertainty measures in incomplete decision systems, Knowledge-Based Systems 36 (2012), 206–216.

41.

Wang

and Xia

, Invariant characteristics of knowledge structures in a knowledge base under homomorphisms and their uncertainty measures, Journal of Intelligent and Fuzzy Systems 35(5) (2018), 5689–5705.

42.

Zhang

, Li

, Zhang

and Xie

, Information structures and uncertainty in an image information system, Journal of Intelligent and Fuzzy Systems 40(1) (2021), 295–317.

43.

Liang

and Shi

, The information entropy, rough entropy and knowledge granulation in rough set theory, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 12(1) (2004), 37–46.

44.

Liang

, Shi

, Li

and Wierman

M.J.

, Information entropy, rough entropy and knowledge granulation in incomplete information systems, International Journal of General Systems 35(6) (2006), 641–654.

45.

Bianucci

, Cattaneo

and Ciucci

, Entropies and co–entropies of coverings with application to incomplete information systems, Fundamenta Informaticae 75(1–4) (2007), 77–105.

46.

Zhu

and Wen

, Information-theoretic measures associated with rough set approximations, Information Sciences 212 (2012), 33–43.

47.

, Che

, Zhang

, Guo

and Yu

, Rank entropy-based decision trees for monotonic classification, IEEE Transactions on Knowledge and Data Engineering 24(11) (2012), 2052–2064.

48.

W.H.

, Zhang

X.Y.

and Zhang

W.X.

, Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems, Applied Soft Computing 9(4) (2009), 1244–1251.

49.

Qian

and Liang

, Combination entropy and combination granulation in rough set theory, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 16(2) (2008), 179–193.

50.

Qian

, Liang

and Wang

, A new method formeasuring the uncertainty in incomplete information systems, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 17(6) (2009), 855–880.

51.

Qian

, Liang

and Dang

, Knowledge structure, knowledge granulation and knowledge distance in a knowledge base, International Journal of Approximate Reasoning 50(1) (2009), 174–188.

52.

Zhu

, An improved axiomatic definition of information granulation, Fundamenta Informaticae 120(1) (2012), 93–109.

53.

Yao

and Zhao

, A measurement theory view on the granularity of partitions, Information Sciences 213 (2012), 1–13.

54.

W.Z.

and Zhang

W.X.

, Constructive and axiomatic approaches of fuzzy approximation operators, Information Sciences 159(3–4) (2004), 233–254.

55.

Yao

Y.Y.

, Relational interpretations of neighborhood operators and rough set approximation operators, Information Sciences 111(1–4) (1998), 239–259.

56.

Blyth

T.S.

, Lattices and Ordered Algebraic Structures, Springer-Verlag, London, 2005.

57.

Pei

, Pei

and Zheng

, Topology vs generalized rough sets, International Journal of Approximate Reasoning 52(2) (2011), 231–239.

58.

Abo-Tabl

E.A.

, A comparison of two kinds of definitions of rough approximations based on a similarity relation, Information Sciences 181(12) (2011), 2587–2596.

59.

Slowinski

and Vanderpooten

, A generalized definition of rough approximations based on similarity, IEEE Transactions on Knowledge and Data Engineering 12(2) (2000), 331–336.

60.

Johnsonbaugh

, Discrete Mathematics, 7th Edition, Prentice Hall, Upper Saddle River, New Jersey, 2008.

61.

Blizard

W.D.

, Multiset theory, Notre Dame Journal of Formal Logic 30(1) (1988), 36–66.

62.

Grätzer

, Lattice Theory: Foundation, Birkhäuser Verlag, Basel, 2011.

63.

Jing

, Li

, Luo

, Horng

S.J.

, Wang

and Yu

, An incremental approach for attribute reduction based on knowledge granularity, Knowledge-Based Systems 104 (2016), 24–38.

64.

Huang

, Li

and Dias

S.M.

, Attribute significance, consistency measure and attribute reduction in formal concept analysis, Neural Network World 26(6) (2016), 607–623.

65.

'Slezak

, Approximate entropy reducts, Fundamenta Informaticae 53(3–4) (2002), 365–390.

66.

Quinlan

J.R.

, Induction of decision trees, Machine Learning 1(1) (1986), 81–106.