Abstract
In this paper, one of the variants of so-called post-hoc cluster analysis problem is considered: the characteristics by which the clusters were constructed are ranked according to the degree of their influence on the final cluster partition. Wherein, using the pairwise coefficients proposed in the paper, we estimate the possibility of replacing a pair of characteristics by one of them (the degree of dominance of one characteristic over another), and the strength of the kind of mutual influence of the characteristics that is associated with the existing cluster structure. The latter influence we call a cluster connection. The most meaningful results were obtained, assuming that the addition or elimination of any forming characteristic leads to the grinding or consolidation of clusters, which is true for the algorithms of hierarchical classification.
Keywords
Introduction
Algorithms for dividing observations into groups according to the degree of their proximity are in demand today in virtually any statistical study. If, in doing so, the objects of each of the obtained groups turn out to be closer in some sense than the objects taken from different groups, then the process is called clustering. In this case the family of groups being formed is referred to as a cluster partition, and the groups themselves are called clusters. A cluster method called discriminant analysis is one of the most practical, see Abdi (2007); Chance and Rossman (2013). It implies studying a certain training set. Then the rule constructed on the set is transferred to newly arrived objects. When solving such a problem we can talk about a process of classifying of new objects. So, the solution of the problem of discriminant analysis consists of two steps. First, we produce the clustering of the training set. Then, we construct a classifying rule based on it.
It often turns out that the number of those characteristics of the objects under study should be reduced. For example, it is done to obtain more comprehensive mathematical models and classification rules for practical usage. This leads to the need to estimate the relative information value of the characteristics. Such estimations can be derived from the results already obtained on the training set (post-hoc analysis of the characteristics, see, for example, Dronov & Dementjeva, 2012).
The post-hoc analysis of the characteristics can also be useful to solve some practical issue. Let’s assume that some of the characteristics of a newly arrived object to be unmeasurable (the absence of complicated equipment, lack of time, human factor). Such situations are frequent in medicine, if it is necessary to perform diagnostics with incomplete data. Investigating of such situation is a sphere of our major interest.
Let us consider the problem in a more accurate way. Let some finite set
It should be noted that it is possible to abandon the using of any cluster algorithm for building the proper partition, and assume that it is constructed, say, by peer evaluation. Moreover, when considering medical applications in which, for example, various diagnoses are possible clusters, such proper partition can be considered completely error-free. Indeed, the necessary clarifications in the diagnoses are usually made after enough time has passed since the primary examination of the patients.
Our method for estimating information values is to compare the cluster partitions constructed only by one of the forming characteristics (individual partitions) with the proper partition. The smaller difference of these partitions will mean the greater importance of the characteristic under study.
We believe that the characteristics that received greater importance by the method selected will be more important when constructing a prognostic rule. Thus, the set
One of the possible ways to implement the plan is a clustering algorithm, the high reliability of the results of which for the data class under study is not in doubt. We assume that if a complete set of forming characteristics is submitted to the entry of the algorithm, then it gives a proper partition
Undoubtedly, skipping the values of some characteristics can strongly change the result. But if the excluded characteristic is sufficiently closely related to the rest, its absence is compensated by the remaining data, and the cluster partition may not even change at all. Here, one must consider a special kind of connection between the characteristics, which differs, for example, from the correlation one, since, most often, the required type of connection is essentially nonlinear. Below, we call this type of relationship a cluster connection. It shows the ability to replace one of the characteristics with another when building clusters.
In addition, the influence of one of the characteristics can be absorbed by the influence of the second, against which the first one simply turns out unimportant. For example, the proximity of the values of the second characteristic means the closeness of the values of the first, but not vice versa. In this situation, we will say that the first characteristic is cluster dominated by the second, and the second is dominant. Cluster dominance means the possibility of replacing the aggregation of the two characteristics with the dominant one in the algorithm. This domination can certainly be considered as one of the forms of cluster connection.
One of the key issues that arise when trying to study cluster connection and the cluster dominance of characteristics is the problem of the location of the interaction of the two characteristics on the same scale with the individual contributions of each of them to clustering. The contribution of the interaction on this scale can be located almost arbitrarily (even to be zero if the characteristics in question act, in some sense, in opposite directions). Therefore, without additional assumptions, it is apparently impossible to achieve any significant progress in solving the problem. Due to this, it was decided to make an additional assumption on the method of operation of the main cluster algorithm.
We will call objects close in the aggregate of characteristics, which can consist of a single characteristic, if using this set of characteristics, the selected cluster algorithm assigns them to the same cluster. Thus, the concept of closeness of objects is determined by the algorithm used. So, by the assumptions made about it, the closeness is defined when using the arbitrary fixed set of their characteristics. If we use the only one characteristic for constructing clusters by the closeness in it, then the resulting cluster partition will be called an individual partition by this characteristic.
Basic Assumption. Objects are close in the aggregate of the characteristics if and only if they are close in each of them separately.
This assumption will be further used only when carrying out rigorous proofs. A qualitative interpretation of the coefficients introduced below is possible even if it is violated, which will be discussed each time.
The proper and the joint cluster partitions.
The validity of the basic assumption leads to the fact that the introduction of a new characteristic into the main algorithm can lead only to a more detailed clustering. Namely, earlier clusters can only break into smaller ones if the objects contained in them have essentially different values of the new characteristic. Thus, we believe that as the characteristics are introduced into the algorithm, we will generally get smaller clusters, and hence a hierarchy of cluster partitions will arise. Of course, the class of admissible cluster algorithms is thereby greatly reduced, but many practical problems (for example, medical ones or the classification of animals for subspecies in zoology) use just such algorithms.
It is clear also, that if the basic assumption is fulfilled, then instead of the algorithm, which in general is a kind of black box, it can be assumed that all individual cluster partitions by each of the forming characteristics (
Of course, the proper partition in some sense can be sufficiently arbitrary, whereas in the performance of the basic assumption of the work, only parallelepiped-like sets with edges parallel to the axes determined by the forming characteristics can be obtained as clusters of the partitioning in all characteristics (joint partition). Figure 1 illustrates this situation for the case of two characteristics. In it, the areas
In most cases, the difference between a proper and a joint partition can be neglected, and assume that this is the same partition. Thus, in Fig. 1, clusters
Our main objective is to define the coefficients, using numerical values and properties of which, it is possible to estimate the degree of connection between the characteristics under the proper cluster partition and the possibility of their mutual replacement in the process of constructing the cluster partition. To do this, we introduce the concept of cluster connection and the related concept of cluster dominance of characteristics based on the study of individual cluster partitions.
Introducing the coefficients with the required properties, we will be able to propose a method for ranking the forming characteristics by the degree of their importance in the classification of objects and to estimate the losses when they cannot be used, and so to obtain new ways of solving the post-hoc analysis problem and a problem of reducing dimension of the data.
In this section, we consider different separations of the main finite set
We are going to estimate the degree of the differences between the partitions of the main set. For this, we use the metric
To determine the value of
where
In Dronov and Dementjeva (2012), another formula is also proposed to calculate this metric, which is often more convenient for applications. Let
We consider a special case, important for further considerations, when one of the two cluster partitions will be smaller than the other. Let
Consider two cluster partitions
The assertion of the theorem can be interpreted, for example, as follows. We consider a metric space
Consider two of the forming characteristics –
This coefficient can we call the coefficient of the cluster dominance of the characteristic
Cluster dominance.
If the basic assumption is violated, then the intersection of individual clustered partitions does not have to coincide with the result of the combined action of the two characteristics. In this case, we will mean
Now for any cluster partition
Thus, the value of the coefficient of the cluster dominance turns out to be greater than 1/2 if and only if
The analogue of Eq. (2) for the case of rejecting the basic assumption is
where
Note that while defining
If
Here, of course,
In going from an individual partition by
Suppose that the number of elements of the main set
where [
Here is an easy consequence of Lemma 1.
and the equality is achieved if and only if
If If for any characteristic
The second and third assertions of the Theorem 2 imply
Although the possibility for a characteristic
Let’s find the largest and smallest of its values. Of course, it is the sum of the squares of natural numbers in the sum giving the number of elements
Now we can define a coefficient of the cluster connection between two forming characteristics
This coefficient takes values from 0 to 1, and, the larger it is, the stronger the cluster connection between the characteristics. It is interesting to note that the value
Four clusters in the joint partition and the increasing cluster strength.
So, cluster connection between the characteristics we understand as the ability of one of them to replace the other in the process of constructing the proper cluster partition
Another specification of the degree of cluster connection between the characteristics is how much the number of clusters
where,
Since
We note once more that the numerator of the fraction Eq. (5) estimates the closeness of the cluster partition by the interaction of
In the denominator, here is the maximum possible value of the metric
An example illustrating differences in connection types
The strongest cluster connection.
Here is some illustration of the concept of the cluster connection. In Fig. 3, there are four clusters in the joint partition
Cluster connection, the strength of which is determined by the value of the coefficient
Clusters in the individual cluster partitions
Specifications of cluster partitions
Pairwise coefficients
Consider in this section one practical example. We use the results of examination of 20 patients of Altai medical center. For this example, we take 5 characteristics (the number of leukocytes
Considering the basic assumption fulfilled, we construct all necessary intersections of the partitions corresponding to the interactions, and the correct cluster partition, which considers all the characteristics. The specifications of these partitions are collected in Table 3. In it,
On these data, we estimate the absolute cluster strength of the characteristics. Note that the values of the corresponding coefficients can be used for post-hoc ranking, and their relative magnitude determines the degree of importance of the relevant indicators for classification problems. We get
Then, using the Eq. (5) we find the coefficients of the cluster connection strength between the forming characteristics. Their values are given in Table 4.
The presence of large coefficients shows a strong connection between the characteristics in the cluster sense, which explains the absence of cluster dominance and small values of the coefficients of absolute cluster strength. If we set the task of reducing the dimension to 2, then the coefficients
It should be noted again that our purpose was to define not the most significant characteristics for clustering, but, on the contrary, to identify those whose absence would not affect too much influence. The characteristic here is
Let’s finish the example by comparing the distances from
Discussion and conclusion
The concept of cluster connection introduced in the paper is oriented precisely to an estimation of the degree of interaction between forming characteristics (even not necessarily numerical ones) when constructing a cluster partition of the set of objects under study. Probably, it better corresponds to the very nature of cluster analysis problems than the common statistical types of connections. The concept of cluster dominance allows us to solve the post-hoc problem in a new way, by making the ranking of the forming characteristic by descending the proposed coefficients of their absolute cluster strength.
Excluding those characteristics, the cluster strength of which is not high, we come to one of the variants to reduce the dimensionality of the data. In this case, it is also possible to exclude one of the characteristics of an equal (and sufficiently large) cluster force. To do this, it is possible to use such data analysis procedures as isolating correlation pleiades or constructing a dependency tree of the characteristics (see, for example, Dronov, 2015, 167–169), where the values of the coefficients
Nevertheless, it should be emphasized that the resulting option of reducing the dimension is rather a by-product of our work. The main task was precisely to decide on the degree of reliability of classification in incomplete information. So our main object was to find not the strongest, but the weakest characteristic in the cluster sense, the loss of which slightly affects the classification of objects.
When deriving formulas in the article, the assumption on the way of construction of the joint cluster partitions which was called the basic assumption, was usually assumed to be true. In fact, it unambiguously allows us to determine the cluster partition by any set of characteristics if individual partitions are known. We admit that dealing with the practical issues of cluster analysis this is not always the case. Nevertheless, for all defined coefficients we have proposed their variants, the possibility of using which does not depend on the validity of the basic assumption. Moreover, according to the authors’ conviction, the Eqs (2), (5) and (6) can be used with some degree of approximation even if the basic assumption is violated. After all, as a rule, the smaller clusters are obtained in clustering, the greater is the classifying force of the corresponding characteristic or the group of them.
The results of the work were partially announced in Dronov and Evdokimov (2017).
