Abstract
A bad partition in an ensemble will be removed by a cluster ensemble selection framework from the final ensemble. It is the main idea in cluster ensemble selection to remove these partitions (bad partitions) from the selected ensemble. But still, it is likely that one of them contains some reliable clusters. Therefore, it may be reasonable to apply the selection phase on cluster level. To do this, a cluster evaluation metric is needed. Some of these metrics have been recently introduced; each of them has its limitations. The weak points of each method have been addressed in the paper. Subsequently, a new metric for cluster assessment has been introduced. The new measure is named Balanced Normalized Mutual Information (BNMI) criterion. It balances the deficiency of the traditional NMI-based criteria. Additionally, an innovative cluster ensemble approach has been proposed. To create the consensus partition considering the elected clusters, a set of different aggregation-functions (called also consensus-functions) have been utilized: the ones which are based upon the co-association matrix (CAM), the ones which are based on hyper graph partitioning algorithms, and the ones which are based upon intermediate space. The experimental study indicates that the state-of-the-art cluster ensemble methods are outperformed by the proposed cluster ensemble approach.
Introduction
Partitioning has been used in various applications such as optimization [1, 2], bioinformatics [3–5], domain adaptation [6–8], healthcare [9] and so on. Therefore, it is an important task; also, because of at least two reasons it is a challenging task: (1) it is unsupervised and there is no clear way to verify how good a partition is and (2) a cluster can be viewed in different perspectives (like spherical, linear, unstructured, hierarchical and so on). Therefore, it is widely considered to be more challenging than supervised classification [10–18].
Inspired by combining classifier [9, 19] and supervised ensemble learning [20], cluster ensemble emerged being applied in various fields such as bioinformatics [21–23], multimedia [24] and datamining [25–33]. Clustering ensemble is a framework where it tries to use information of a number of primary partitions in producing an aggregated partition usually named consensus partition. The mentioned primary partitions, as a whole, are considered to be an ensemble of clusterings.
The cluster ensemble involves 2 steps. In the first step, it produces a number of the weak base clusterings [34–42]. To obtain a number of weak base clusterings, a simple basic clustering algorithm, like kmeans, can be used with different initializations or different parameters [34–42], or a simple basic clustering algorithm, like kmeans can be applied on different data perspectives [43, 44], or different data resamples [45, 46]. In the proposed method, all of these mentioned mechanisms have been used.
In the subsequent step of the cluster ensemble, a partition, called consensus partition, should be extracted from the created ensemble in the previous step such that all of them have the maximum similarity with it. The mapping which takes the ensemble as input and gives consensus partition as output is named consensus-function. Evidence accumulation clustering (EAC), as an exemplary consensus-function first illustrated by Jain and Fred, transforms the ensemble to a CAM and then applies a single linkage hierarchical clustering algorithm in order to find the consensus partition [47]. In Fig. 1, an appropriate taxonomy of various consensus-function is indicated.

Different consensus-functions.
In clustering ensemble, there are two general categories of methods. The first category contains some works that concentrate on the first abovementioned step and the second category contains some other works that concentrate on the second abovementioned step, as they are two independent problems. We are among the former category. Methods of the first category are the ones that try to prepare the consensus function with a suitable ensemble of partitions. An ensemble of partitions is considered to be suitable if any of its partitions is high-quality. Therefore, some of these methods propose to produce a large number of partitions (referred to as primitive ensemble) and then, select a subset of them to be used as our final ensemble in construction of a consensus partition (this type of clustering ensemble is sometimes referred to as clustering ensemble selection). To do so, many approaches and metrics have been proposed to determine quality of a partition. It is highly likely that a good cluster that is a member of a bad partition. Therefore, it is unexpectedly removed from the ensemble by a clustering ensemble selection strategy. Therefore, some others propose to evaluate each cluster in the primitive ensemble separately. These methods are referred to as cluster ensemble selection. Some of the cluster evaluation metrics can be named as follows: (1) Normalized Mutual Information (NMI) measure [48], (2) MAX measure [49], (3) ENMI measure [38] and (4) APMM measure [50].
To evaluate a cluster, the Normalized Mutual Information measure [48] has many weaknesses that have been described in [49]. One of its different editions called MAX as well is shown that is not flawless. To improve the MAX, the APMM has been proposed [50]. All of the mentioned criteria have suffered from the complement cluster effect in some way. Therefore, this paper introduces an innovative criterion for evaluating a cluster, meanwhile it aims at solving the complement cluster effect. It estimates the averaged similarity of that cluster with the other ones through removing the effect of its complement cluster. We will deal with all these methods and their weak points and justify how our method solves their problems latter in the Section 4. We will discuss in detail what the mentioned so-called improper effect of complement cluster is and how it affects the consensus partition in the Section 4.
In the current article, an innovative cluster ensemble selection approach has been proposed. The co-association based consensus-functions are one category of the applied consensus functions. As the method of EAC, from a clusters’ subset, cannot derive the co-association matrix, therefore, via clusters subset the Extended EAC (E-EAC) is applied to form the matrix of co-association. The employed consensus-functions second category is based upon partitioning algorithms with hyper-graph. The subsequent class of employed consensus-functions looks at the selected clusters as a novel space of features and employs a simple clusterer algorithm to find the final aggregated partition.
The stability of a cluster has been introduced to be a method to evaluate a basic cluster quality. Since then, it has been incrementally used in the literature [51]. To validate a cluster, resampling explained in [52] and afterward generalized in [53] has been used. If a cluster has a high stability value, it demonstrates that various calls of the clusterer algorithm upon a dataset probably reproduce that cluster frequently [53].
Up to now, many metrics to evaluate a partition quality have been introduced [54]. Naldi et al. [31] have employed 6 clustering assessment measures to evaluate the clustering worth and incorporated them into a final evaluation value. Wang et al. [55] have employed rough-set concept to choose an improved sub-set of clustering members. Lu et al. [56] have introduced an innovative covariance-based diversity criterion. Akbari et al. [57] have calculated a matrix of pair-wise diversity measurement. After that, they used a clustering algorithm, which is hierarchical, to take both diversity and quality in consideration. But metrics to evaluate a cluster have recently emerged.
Some cluster assessment metrics have been introduced based on clustering stability [58–63]. A clustering ensemble framework has been introduced in which the most stable partitions are elected [64]. The sum of NMI (S-NMI) has been widely considered to be a partition evaluator [47, 65]. While an almost diminutive ensemble has been chosen, it has outperformed the simple ensemble methods.
Azimi introduces an algorithm that dynamically chooses a different subset of primitive partitions for the datasets with different complexity levels [66]. Extending his work, Alizadeh tried to select adaptively a subset of the primitive ensemble [67]. To handle the weakness of the traditional NMI criterion [48], they introduced MAX criterion for their cluster evaluation measure [49]. To handle a shared weakness in both MAX and NMI metrics, they introduced another metric, i.e. APMM metric [50].
Zhong et al have proposed a clustering ensemble framework that is somewhat similar to the proposed cluster ensemble framework. In [68], Zhong et al. also provide a cluster evaluating and weighting scheme, which is not based on the complement cluster. The main novelty of their work is a new cluster evaluation measure and weighting clusters correspondingly. In fact, our suggested cluster ensemble framework can be considered essentially dissimilar to their clustering ensemble framework. In contradiction of their clustering ensemble framework, our clustering ensemble framework does not require the main dataset attributes and also we do not presume any distribution in dataset. Their only disadvantage to our method is their need to original features of dataset during integration of basic partitions into the consensus partition.
Fuzzy clustering ensemble is also a new approach in clustering ensemble where the base partitions are fuzzy [34]. Although some of these works do cluster selection, they first transform fuzzy clusters into crisp clusters; therefore, indeed, they are not real fuzzy cluster ensemble [36]. While some of them make use of information of different partitions according to their quality in consensus partition generation [40], but some other works use both cluster selection and weighting [37]. As we use crisp clustering ensemble, we neglect these methods as our direct rivals.
Proposed cluster ensemble framework
In this part, the details of the proposed cluster-ensemble framework have been explained. The proposed cluster ensemble outline like MAX [49] and APMM [50] utilizes a cluster selection strategy. The most stable clusters are selected to be given to the consensus-function. The stability value of a cluster can be computed based on BNMI. The proposed cluster ensemble framework has been presented in Fig. 2. The proposed Cluster Ensemble Selection based on BNMI is denoted by the

The suggested CESBNMI cluster ensemble framework.
Based on Fig. 2, the proposed Extraction of an ensemble of the size A stability value has been assigned to each basic cluster obtained in previous step. The stability for the cluster The 3rd step will be discussed at the subsection 4.3. The selection should be accomplished in such a way that the most fixed clusters are selected to participate in the consensus-function. The last step, i.e. the consensus-function step, is done using a method that is among one of the following categories: (1) CAM-based consensus-functions (section 4.4), (2) HyperGraph consensus-functions, and (3) intermediate-based consensus-functions (it views the space of clusters as a novel data space and applies an algorithm to cluster the data projected in the new space). This step will be discussed at the subsection 4.4.
A set of
Cluster stability
Different definitions have been presented for the stability [49, 54]. It has been defined as a cluster that almost always appears in a clustering. It means that if an algorithm is run to partition a data 10 times and a set of data points emerges as a cluster about more than 5 times, the cluster can be named as a stable cluster. These clusters are favorable to be selected in the final consensus-function [68]. At this time, suppose that the cluster stability
where
The
The above method has been described by a toy example depicted in Fig. 3. Figure 3 depicts this method which is also called the NMI-Cluster evaluation method. The partition in the Fig. 3 (a) depicts a clustering in the ensemble. The partition in the Fig. 3 (c) depicts a clustering in the reference set. We want to evaluate the cluster

Then MAX-Cluster evaluator has been introduced as follows [70]. The

After introducing the MAX-Cluster evaluator, Alizadeh et al. have introduced the APMM-Cluster evaluator to solve the weak points of the previous criteria. It has been defined as follows [70]. The
Figure 5 depicts this method which is also called the APMM-Cluster evaluation method.

After introducing the APMM-Cluster evaluator, this paper has introduced the BNMI-Cluster evaluator to solve the weak points shared between all of the previous criteria. The BNMI-Cluster evaluator has been defined as follows. The
where ϑ is an element of {
Figure 6 depicts this method which is also called the BNMI-Cluster evaluation method. After defining a NMI criterion for computing the NMI between a partition and a cluster, we define stability of a cluster

Consider a dataset with n = 1000 data samples. Let’s assume we have an ensemble of 100 primary partitions. In 50 primary partitions, we have a cluster {1, 2, 3} and another cluster {4, 5, 6}. In other 50 primary partitions, we have a cluster {1, 2, 4} and another cluster {3, 5, 6}. Let’s assume we use the main ensemble, i.e. 100 primary partitions, as our reference set. We want to calculate stability of the clusters {1, 2, 3}. The NMI cluster evaluator says the stability for the cluster {1, 2, 3} is 0.7597. Let’s reconsider the same example with one difference when dataset contains n = 10000 data samples. The NMI cluster evaluator says the stability for the cluster {1, 2, 3} is 0.7783! This is called the effect of complement cluster. This can be severe in real examples where the number of clusters are large. This is a little less severe in MAX cluster evaluator; but it still exists. MAX cluster evaluator also suffers from difficulty in managing outliers. The effect of complement cluster is a little less severe in ENMI cluster evaluator; but it still exists. Although the effect of complement cluster is more or less solved by APMM cluster evaluator, it has caused another problem. It totally ignores the shape of clusters overlapping with the target cluster. For example, if two of more reliable clusters divide our target cluster equally, it is worse than if two (or more) two of unreliable clusters divide it equally. While it is solved by BNMI cluster evaluator.
The third step is accomplished as follows. The clusters are first sorted based on their stabilities. Then, the top 50% of the most stable clusters have been selected to participate in the final ensemble.
Consensus-function
The phase that collects the chosen clusters to the consensus or final partition is the last section of clustering ensemble. One favourable strategy is to determine the chosen clusters as inputs of the CSPA, MCLA, and HGPA [65]. The abovementioned algorithms’ output is the resultant partition of the ensemble. For clarification, consider Fig. 7. Four partitions,

Four partitions partition1, partition2, partition3 and partition4 derived from an asssuptive toy dataset by kmeans.

The binary representation of the clusters extracted from partitions presented in Figure 7. For example
As another popular technique to produce the consensus partition using the selected clusters, the clusters can be viewed as a data new perspective, and a clusterer, like fuzzy c-means (FCM) clustering algorithm, can be used to cluster the new data. For example, again consider the example of Fig. 7. The partitions are divided to 11 clusters illustrated in the Fig. 8 as before. Then, the clusters of Fig. 8 are determined as the new data in a novel feature space and a FCM clustering algorithm is applied to generate the final ensemble partition from them.
Thus, three methods are employed to generate final clusters from the chosen clusters: (a) HyperGraph Based Models, (b) Simple Clustering Based Models, and (c) Co-Association Based Model. Firstly, it is important to build the CAM and then to apply a hierarchical clusterer. E-EAC is used to build the CAM from the chosen clusters. In the method of EAC, the
where
As this is mentioned before, the last step is accomplished based on one of the techniques presented in the Fig. 9. Therefore, there are 50 different clustering ensemble approaches in this paper. The mentioned different clustering ensemble approaches are presented in Fig. 10.

Different consensus-functions used in the paper.

Different methods for clustering ensemble.
In the current section, the evaluation metrics to assess a clustering output have been introduced at the first sub-section. The benchmark datasets have been introduced in the next sub-section. The initialization of the algorithms’ parameters has been discussed in the third sub-section. The experimental results have been presented in the last sub-section of this section.
Assessment metrics
Having generated the consensus clustering, an important problem is its assessment approach. The clustering assessment problem is very crucial in spite of classification. The NMI can be largely considered to be one of the most well-known clustering assessment criteria. The output/final clustering is the one which is attained via using any approach of clustering upon the dataset. It is clarified that upon the dataset, after using a clusterer method, the attained partition is the output clustering/partition for the abovementioned clusterer method. After clusterer method termination, we are able to employ the dataset real labels to assess how suitable clusterer method performed the assignment of clustering. Between two clusterings,
Accuracy metric, is another alternative to assess a partition if the clusters’ number and their actual labels are clear. To calculate the final clustering algorithm performance regarding to accuracy, first one should re-label the attained clusters in that kind of strategy that have maximum correspondence with the ground real labels and after that calculating the accurately classified samples percentage. Therefore, after solving the problem of correspondence, the rate of error can be considered between the known clusters and derived labels. To solve the matching problem with minimum weight bipartite, the Hungarian algorithm is used. It has been indicated that it is able to solve the label correspondence problem effectively [71].
The last applied measure to assess a clustering algorithm is F-measure (F-score) in this paper. The output clustering and dataset real labels are employed in computation formula of F-measure, to calculate the clustering algorithm final performance regarding to F-measure, after the clustering algorithm termination. The F-measure is calculated by applying both recall measures and the precision. As a weighted average of the recall and precision, the F-measure can be explained, where its worst score is at 0 and its best value at 1 [72]. In this study, based upon Equation 15, the F-measure formula applied.
One of the serious problems in artificial intelligence communities is the selection of a suitable benchmark. The datasets of the UCI machine learning data repository have been widely used for clustering problems [73]. Therefore, the authors have also used some of them into this manuscript because they can be considered to be suitable benchmark datasets for clustering problems.
Experimentations have been done on 23 typical datasets. Two of the used datasets are huge ones (datasets with huge number of records, i.e. (a) MNIST dataset and (b) MNIST 1vs2 dataset where it contains 2000 samples of class one and 2000 samples of class two randomly chosen from MNIST dataset), one of them is a high dimensional spare one, i.e. Lymphoma dataset, one of them is a high-dimensional and huge one, i.e. MNIST dataset, some of them have high number of clusters, some of them have low number of clusters, some of them are challenging datasets to the clustering problem; therefore, the actual numbers of clusters, samples and features in the employed datasets are diverse enough to make fair conclusions from the experimental results of the paper. Table 1 provides some information about the employed datasets [73].
Fundamental characteristics of the employed benchmarks.
Fundamental characteristics of the employed benchmarks.
All of the experimentations have been conducted on the standardized versions of datasets, i.e. the version whose features are linearly transformed in such a way that have the averages equal to 0 and the standard deviations equal to 1. The hand-made datasets used in the paper can be considered to be the most challenging datasets for clustering task. They have been depicted in Fig. 11.

(a) Halfring dataset, (b) 2-Spiral dataset, (c) Aggregation dataset, (d) Flame dataset, (e) 3-Spiral dataset, (f) Open Flame dataset.
For making the results more conclusive, the experimentations, have been reported as averaged on 100 independent performance calculations. Inside all experiments, there are 30 independent basic clusterings attained by 30 k-means clustering algorithm independent runs with various initializations (for seed-points) and various random values as the number of clusters in the range 2 to
After selecting a sub-set of high quality clusters, to extract the consensus clustering, the different consensus-functions employ the number of the real clusters in the data set, i.e. the third column in Table 1, as their reference for the number of the consensus clusters.
As it is known, the soft clustering algorithm (i.e. FCM here) appoints every object toward all clusters with various values of membership. To determine the consensus clustering from an FCM clustering algorithm output as consensus-function, every object is appointed toward the cluster with the most value of membership.
Experimental results
The value of NMI regarding to using various ratios, from the steadiest clusters, in final ensemble is explained in Fig. 12-a. The results are over Wine dataset. Since it is clear that from Fig. 12, the value of NMI enhances as ratio of the most stable clusters which participate in the final ensemble enhances (before 50%). But, it after 50%, does not enhance, or even in some cases, it declines.

(a) The stable clusters rate that are chosen applying the method of BNMI is presented by the horizontal axis. The value of NMI for Wine dataset is presented by the vertical axis. (b) The stable clusters rate that are chosen applying the method of BNMI is presented by the horizontal axis. The averaged value of NMI over all used datasets in Table 1 is illustrated by the vertical axis. (c) The stable clusters rate that are chosen employing the method of BNMI is presented by horizontal axis. The averaged accuracy over all applied datasets in Table 1 is illustrated by the vertical axis (d) The stable clusters rate that are chosen applying the method of BNMI is indicated by the horizontal axis. The averaged F-measure of all applied datasets in Table 1, is shown by the vertical axis (e) The stable clusters rate that are chosen applying the method of NMI, is illustrated by the horizontal axis. The averaged accuracy of all applied datasets in Table 1, is presented by the vertical axis. (f) The stable clusters rate that are chosen applying the method of ENMI is presented by the horizontal axis. The averaged accuracy of all employed datasets in Table 1, is indicated by the vertical axis. (g) The stable clusters rate that are chosen applying the method of APMM, is illustrated by the horizontal axis. The averaged accuracy of all applied datasets in Table 1, is presented by the vertical axis. (h) The stable clusters rate that are selected using the method of MAX is illustrated by the horizontal axis. The averaged accuracy of all applied datasets in Table 1 is presented by the vertical axis.
The value of NMI concerning to using various ratios from the steadiest clusters in final ensemble is generated by employing further datasets. It indicates that for any of the other datasets with the same experiment like the one for wine dataset (represented in Fig. 12.a) is generated. It is resulted in approximately the same consequences for all cases (datasets); in the final ensemble, the value of NMI enhances as ratio of the steadiest clusters which participate in the final ensemble enhances up to 50%. Whereas it, after 50%, declines in all cases. But inside the Yeast dataset case, in the final ensemble, the value of NMI enhances as ratio of the steadiest clusters which participate in the final ensemble enhances up to 30% and it, after 30%, declines. By the way, in Yeast dataset, the ensemble with 50% of the most stable clusters still is better than the ensemble with all clusters.
The averaged value of NMI concerning to applying various ratios from the steadiest clusters in final ensemble of all employed datasets is illustrated in Fig. 12-b. It is obvious that, in the final ensemble, the value of NMI enhances as ratio of the steadiest clusters which participate in the final ensemble enhances up to 50% in all cases. But it after 50%, does not enhance, and even declines in a number of methods. Nevertheless, the supposed technique when 50% of the steadiest clusters are taken into final ensemble, hits its maximum via using the algorithm named Average-Linkage as consensus-function with the method of EEAC when the equation of ItoU is used to construct the matrix of co-association.
The averaged value of accuracy regarding to using various ratios from the steadiest clusters in last ensemble of all used datasets is illustrated in Fig. 12-c. The accuracy rates presented in Fig. 12-c confirm the NMI rates presented in Fig. 12-b. It is obvious that the value of accuracy enhances as ratio of the steadiest clusters which participate in the final ensemble enhances up to 50% in nearly in all cases. Therefore, the same 50% of the steadiest clusters still becomes the best value. The same algorithm of Average-Linkage as consensus-function with the method of EEAC, whenever the method of ItoU is used, construct co-association matrix which is the best option. In terms of various ratios of the steadiest clusters in final ensemble, the averaged F-measure over all employed datasets is illustrated by Fig. 12.d. The conclusion is the same as Fig. 12.c and Fig. 12.b for the F-measure results presented in Fig. 12.d.
The same experimentations of Fig. 12.c in terms of averaged accuracy are repeated by NMI [48] criterion as cluster evaluator. It clarifies that the averaged value of accuracy (averaged over all of the used datasets) regarding to using various ratios from the steadiest clusters into the decisive ensemble are presented in Fig. 12.e. In Fig. 12.e, no longer, the measure of stability is used for BNMI. It uses the measure of NMI to estimate a cluster stability. The result of Fig. 12.e again approves that the aggregator of Average-Linkage is better than all aggregators when the equation of ItoU is used for constructing co-association matrix. It is worth mentioning that the best option for ratio of the steadiest clusters which participate in the final ensemble is still 50%.
The same experimentations of Fig. 12.c and Fig. 12.e are repeated by ENMI [38], MAX [49] and APMM [50] criteria as cluster evaluator in terms of averaged accuracy. It indicates that the averaged value of accuracy (averaged over all of the used datasets) regarding to using various ratios from the steadiest clusters in last decisive ensemble are reported in Fig. 12.f, Fig. 12.g and Fig. 12.h respectively when MAX [49], ENMI [38], and APMM [50] criteria are used as cluster evaluator criterion. In Fig. 12.f, Fig. 12.g and Fig. 12.h, the stability measure employed is no longer BNMI. They use ENMI [38], MAX [49] and APMM [50] criteria respectively to estimate a cluster stability. The results presented in Fig. 12.h, Fig. 12.f and Fig. 12.g, again confirm that the aggregator of Average-Linkage is better toward all aggregators whenever the equation of ItoU is used to construct the matrix of co-association. It is worth mentioning that the best option for ratio of the steadiest clusters which participate in the final ensemble is still 50%.
While comparing the results presented by Fig. 12.h, Fig. 12.f, Fig. 12.e, Fig. 12.g and Fig, 12.c, it can be concluded that the BNMI usage as a measure to assess a cluster in our clustering ensemble framework, causes a superior final ensemble partition instead of other measures. It can be also inferred that employing the algorithm of Average-Linkage as aggregator with the method of EEAC, for consensus-function is the best option. Additionally, it is inferred that the consequences will be superior whenever the technique of ItoU is used to make co-association matrix.
Generally, for making a decisive comparison among the five used cluster evaluation criteria, i.e. NMI, MAX, ENMI, APMM, and BNMI, consider Table 2. In order to attain the consequences displayed in Table 2, we use the Average-Linkage algorithm as aggregator along with EEAC method and employ ItoU technique for making the matrix of co-association. It is obvious, the BNMI, APMM, ENMI and MAX are completely better than NMI. In addition to the BNMI based cluster ensemble almost outperforms the ENMI, APMM and MAX based clustering ensembles. Therefore, the study infers that employing BNMI as the evaluator of cluster is the best option. Friedman’s ANOVA test on Table 2 is depicted by Fig. 13. The p-value is 1.3e∧-27 that indicates a significant difference.
Accuracy of clusterings generated by cluster-selection employing the evaluator of NMI-based cluster, the ENMI-based cluster evaluator, the APMM-based cluster evaluator, and the MAX-based cluster evaluator verified via paired t-test [78] with 95% p-value

Friedman’s ANOVA test on Table 2. The p-value is 1.29773e∧-27.
The base clustering algorithms and some of the state of the art clustering ensemble approaches has been compared in Table 3 in terms of accuracy. It is obvious that there is no dominant method. An ensemble of size 100 produced by kmeans algorithm is used for this experimentation. The seed points have been randomly reinitialized. The average linkage algorithm has been used as consensus-function to derive the consensus results. Since the last five columns are cluster selection based approaches, the ensemble sizes are 50% of the best clusters of ensemble there. The number of clusters in each partition of ensemble is a random number in
Experimental results in terms of Accuracy
The last five columns of Table 3 are the results of the ensemble methods which use only a sub-set of primary clusters. The 7th, 8th, 9th, 10th, and 11th columns show the results of the ensembles which use the traditional NMI, MAX, APMM, and ENMI and proposed BNMI to validate the base clusters. The results of these columns indicate although these approaches use a sub-set of the primary clusters, they usually outperform the full ensemble. Also, comparing these five columns indicates the superiority of BNMI based stability to the NMI, MAX, APMM, and ENMI based stability.
The performance of clustering ensemble that is based on the BNMI method has been presented in comparison to those of the state-of-the-art clustering ensemble methods in Table 4. In Table 4, the paired t-test [75–78] have been employed to validate the conclusions with a 95% confidence interval. In the last row, several sets of triples have been shown in the format of A-B-C. The number A shows the number of datasets that the proposed method is superior to the given method validated by paired t-test with a p-value of 0.05 (i.e. the confidence level is 95%). The number B shows the number of datasets that the proposed method is neither significantly worse nor better than the given method validated by paired t-test with a p-value of 0.05. The number C shows the number of datasets that the given method is superior to the proposed method validated by paired t-test with a p-value of 0.05.
Experimental results of the proposed method comparing with those of the state of the art approaches. Sign “-” indicates the method mentioned in that column is significantly worse than the proposed method (validated by paired t-test). Sign “+” indicates the method mentioned in that column is significantly better than the proposed method (validated by paired t-test). Sign “±” indicates the method mentioned in that column is neither significantly worse nor better than the proposed method (validated by paired t-test). t-test is validated by a p-value of 0.05. The results are in terms of F-measure
Different rates of noisy data points have been added to wine dataset. In each noise level, the performances of the cluster selection by NMI measure [48], the cluster selection by MAX measure [49], the cluster selection by APMM measure [50], the cluster selection by ENMI measure [38], and the cluster selection by BNMI measure [48] have been computed in terms of NMI. These results have been depicted by Fig. 14.

Noise Analysis. Horizontal axis depicts rate of the added noisy data samples.
According to this subsection, 10 actual benchmark datasets have been used: MNIST (with 5000 data points, 10 classes and 784 features) [74], USPS (with 11000 data points, 10 classes and 256 features) [79], Multiple Features (MF) (with 2000 data points, 10 classes and 649 features), Semeion (with 1593 data points, 10 classes and 256 features), Image-Segmentation (IS) (with 2310 data points, 7 classes and 19 features), Optical-Digit-Recognition (ODR) (with 5620 data points, 10 classes and 64 features), Forest-Cover-Type (FCT) (with 3780 data points, 7 classes and 54 features), Landsat-Satellite (LS) (with 6435 data points, 6 classes and 36 features), Letter-Recognition (LR) (with 20000 data points, 26 classes and 16 features) and ISOLET (with 7797 data points, 26 classes and 617 features) [73].
The basic partitions are generated via the algorithm of k-means clustering. The clusters number in every ensemble member has been randomly chosen from the set {
The NMI values of our suggested clustering ensemble method is evaluated and compared with those of ten state-of-the-art algorithms; it proves that the suggested method is better than all of state-of-the-art algorithms verified via paired t-test [78] with 95% confidence level.
The runtime of the different state-of-the-art clusterer ensembles indicated by Table 6 at different data sizes has been analyzed and computed in Fig. 15. Upon the dataset of LR, all those experiments have been performed. Every clusterer ensemble algorithm has been employed for clustering a randomly selected subsample of the mentioned dataset and meanwhile the runtime of that algorithm is recorded. Apparently, the best (fastest) method is MCLA. But the suggested method because of its massive calculation during cluster assessment, is among the other methods. Whereas, it is not among the quickest methods, it is acceptable (it means that the suggested method is tractable). The suggested method time complexity severely depends on the size of reference set. Thus, definitely the suggested algorithm runtime will decline, if we determine a set with separate small reference, along with the ensemble. The time order of second step in
The suggested approach performances (i.e. the NMI values) in comparison to those of ten state-of-the-art approaches verified via paired t-test [78] with 95% confidence level
The suggested approach performances (i.e. the NMI values) in comparison to those of ten state-of-the-art approaches verified via paired t-test [78] with 95% confidence level

Runtime of different clustering ensembles at different sizes of data-set.
This study recommends a measure to decide whether a cluster should participate in a cluster ensemble. In this paper, a clustering ensemble method is developed based upon cluster selection. A measure called BNMI is devised to test the stability of clusters, and it is simply used to choose a sub-set of the steadiest clusters. Then, the output clustering of the ensemble is obtained based on the selected clusters. A large number of consensus-functions have been used to find consensus partition, after choosing a sub-set of clusters. Experimentally, this study indicates that the average-linkage hierarchical clusterer is most efficient consensus-function. One of the main contributions of the article has been the new method of EAC matrix construction. It has been called E-EAC.
The empirical results in some datasets of the UCI repository in this study claim that the suggested method often outshines the state of the art ones. In addition, the experiments display the usefulness of suggested method in comparison to the state of the art cluster ensembles in terms of the robustness and the quality. Additionally, it is inferred that whenever the equation of ItoU is used to construct co-association matrix, the consequences will be superior. Finally, this study infers that the best option is to employ BNMI as the cluster assessment. Additionally, it is inferred that the best option for consensus-function is to apply the algorithm named average-linkage on E-EAC-based co-association extracted by ItoU equation. The consensus-partition performance obtains its maximum whenever ratio of the steadiest clusters participating in final ensemble reaches at 50%.
