An efficient density-based clustering with side information and active learning: A case study for facial expression recognition task

Abstract

Data clustering is one of the most important tasks in machine learning and data mining, which aims to discover natural structure of the data, identify relationships between observations inside data sets, or detect outliers. Clustering is traditionally seen as part of unsupervised learning, but in many situations, side information about the clusters may be available in addition to the values of the features. For example, the cluster labels of some observations may be known (called seeds) or certain observations may be known to belong (or not) to the same cluster (pairwise constraints). Clustering algorithms using such information are called semi-supervised algorithms. A problem is that although many semi-supervised clustering algorithms have been presented in literature over the last decades, each of them usually uses one kind of side information. In this work, we aim to propose a new semi-supervised density based clustering which integrates effectively both kinds of side information, and embeds an active learning strategy in the process of finding clusters, named MCSSDBS. In order to evaluate our proposed method and demonstrate its effectiveness compared with a state-of-the-art semi-supervised density-based clustering (SSDBSCAN), a series of experiments is carried out on both synthetic and real world data sets. First is experiments primarily conducted on 6 data sets from UCI repository. Then, especially for the facial expression recognition task, our tests are performed on two facial data sets: A popular one in literature – the extended Cohn Kanade Data set (CK+), and our own new facial data set collected from volunteers in Vietnam – named ITI facial expression data set. Comparative results conducted show that our method can boost the performance of clustering process.

Keywords

Semi-supervised clustering density-based clustering active learning side information facial expression recognition

1. Introduction

Clustering is the problem of partitioning a data set $X$ into $k$ clusters such that points in the same cluster are similar and those in different clusters are not similar. The primary objective of clustering is to discover natural structure of the data. Clustering is also used for identifying relationships between observations inside data sets or detecting outliers. Despite the fact that this problem has a long history with so many algorithms of clustering introduced, it is always an interesting topic in machine learning and data mining community due to a rapid increase in data from different sources such as Facebook, Amazon, Google, to mention a few. There are some kinds of clusterings including partition-based, density-based, graph-based, etc. Among them, density-based clustering has widely used because it can detect clusters with arbitrary shape and noises.

In the past two decades, clustering with side information (known as semi-supervised clustering) has received a great deal of attention [1]. We can note here the semi-supervised K-means [2, 3], semi-supervised Fuzzy-C means [4], semi-supervised spectral clustering [5], semi-supervised density based clustering [6, 7, 8], semi-supervised graph based clustering [11], to mention a few. They showed that clustering can benefit from some users’ knowledge to boost the performance of clustering. Such information can generally be either constraints or seeds. Constraints involve must-link and cannot-link constraints in which the must-link constraint (ML) between two observations $x$ and $y$ means that $x$ and $y$ should be in the same cluster, and the cannot-link constraint (CL) means that $x$ and $y$ should not be in the same cluster. For seed side information, a small set of labeled data will be provided for semi-supervised clustering algorithms. In practice, this side information is available or can be collected from users.

To begin with, clustering with constraints can be divided into two main families: either 1) metric learning-based methods: the constraints are used to learn a metric/objective function or 2) constraint-based method: the constraints are used as hints to guide the algorithm to a find useful solutions [1].

Given a constraint set, the metric learning methods are first trained to satisfy the constraints so that, after training step, data objects associated by a must-link constraint should be close and data objects linked by a cannot-link constraint should be well separated in the learning space.

In constraint-based approaches, two families of methods can be found: on the one hand, algorithms with a strict enforcement, which find the best feasible clustering respecting all the constraints, and, on the other hand, algorithms with partial enforcement, which find the best clustering while maximally respecting the constraints. To this aim, several techniques have been proposed so far in the literature: modifying the clustering objective function so that it includes a term of constraint satisfiability, enforcing all constraints to be satisfied during the assignment step in the clustering process, or initializing clusters and inferring clustering constraints based on neighborhoods derived from labeled example.

Secondly, in seed based clustering, a set of seed can be used for initializing cluster centers, such as in K-Means and Fuzzy C-Means, automatically evaluating parameters in semi-supervised density-based clustering (SSDBSCAN) [6], helping in the label propagation process and learning distance in seed based hierarchical clustering (HISSCLU) [8], or identifying connected components in semi-supervised graph based clustering (SSGC) [11].

Pursing this further, the active learning algorithm aims to obtain high quality using as few labeled objects as possible and minimizes the queried users to get label. Active learning is very benefit in many modern machine learning problems where data may be abundant but labels are expensive or time consuming to obtain [9]. While active learning algorithms for supervised classification has a long history, the problem of active learning for semi-supervised clustering only appears when the research of integrating prior knowledge in clustering proposed in 2001 [2]. For constraint clustering, there have been many works proposed recently. The idea principle is to choose the most useful constraints such that they not only boost the clustering performance but also minimize user questions. In [15], Basu et al. proposed a method based on min-max strategy to constrained K-Means clustering. In [12], Vu et al. proposed a graph-based method to collect constraints for any kind of semi-supervised clustering algorithms, in [16, 17], Abin and Beigy introduced a method based a Kernel, sequential approach for K-Means, DBSCAN. The research of active learning for seed-based clustering, we mention here the work in [13, 10]. This is a method to collect the seed based on min-max strategy (named S-Min-Max). The idea of of S-Min-Max is to build a set of seed which can cover the distribution of input data.

To wrap it all up, in this work, we integrate both pairwise constraints and seeds information, additionally embed an active learning phase in the clustering process to establish a novel algorithm, named MCSSDBS. This work is an extension of our preliminary research [14], presenting a new application in facial expression recognition topic. It is noteworthy that while there exists a number of constraint-based clustering and seed-based clustering algorithm, there is no semi-supervised clustering using both kinds of such information. Besides, as a part of the current direction research in evaluation of citizens’ satisfaction in public services, we collect our own facial expression data set (named ITI facial expression data set) from local volunteers, and then apply our new algorithm to facial expression recognition task. Comparative results conducted from a series of data sets show the effectiveness of our new method.

The rest of this paper is organized as follows. A review of related semi-supervised density-based clustering algorithms is introduced in Section 2. Section 3 presents our proposed semi-supervised density-based clustering algorithm, named MCSSDBS. Section 4 introduces our experiment setups, presents the experiments that have been conducted on benchmark data sets from UCI as well as facial expression data sets. Finally, Section 5 concludes the paper and discusses several lines of future research.

2. Related work

2.1 Density-based clustering

The notion of density based clustering is introduced by Ester et al. in 1996 when they propose algorithm DBSCAN [18]. Given a data set $X$ , an integer value MinPts, and a real number of $\epsilon$ . We briefly present definitions of DBSCAN hereafter.

.

(core object) An object $p\in X$ is call core-object w.r.t. $\epsilon$ and MinPts if $|N_{\epsilon}(p)|\geqslant\textit{MinPts}$ , where $|N_{\epsilon}(p)|=|\{\forall q\in X:d(q,p)\leqslant\epsilon\}|$ .

.

(density reachable) A point $p$ is density-reachable (directly or transitively) from a point $q$ w.r.t. $\epsilon$ and MinPts if there is a chain of points $p_{1},\ldots,p_{n},p_{1}=q,p_{n}=p$ such that $p_{i+1}\in N_{\epsilon}(pi)$ .

.

(density connected) A point $p$ is density-connected to a point $q$ w.r.t. $\epsilon$ and MinPts if there is a point $o$ such that both, $p$ and $q$ are density-reachable from $o$ w.r.t. $\epsilon$ and MinPts.

.

(cluster) Let X be a database of points. A cluster C w.r.t. $\epsilon$ and MinPts is a non-empty subset of X satisfying the conditions as follows:

•
$\forall$ $p$ , $q$ : if $p$ $\in$ C and $q$ is density-reachable from $p$ w.r.t. $\epsilon$ and MinPts, then $q$ $\in$ $C$ .
•
$\forall$ $p$ , $q$ $\in$ C: $p$ is density-connected to $q$ w.r.t. $\epsilon$ and MinPts.

The advantage of DBSCAN is that it can detect clusters with arbitrary shape and identify noises. However, as these parameters are set once for all clusters, DBSCAN can only separate clusters well for the data sets with the same density. In some cases where the density among clusters differ widely and the clusters are not totally separated by sparse regions (take one as shown in Fig. 1 as an example), DBSCAN fails to discover exactly the cluster structure. In Fig. 2, DBSCAN extracts only the two clusters: one merges the two denser clusters A and B, and another is C with MinPts $=$ 5, $\epsilon$ $=$ 0.9. By changing $\epsilon$ to a suitable value, DBSCAN can separate A and B, but then, it marks the cluster C as noise (Fig. 3)

Figure 1.
Three clusters: A, B, and C in a data set sample.

Figure 2.
DBSCAN clustering with MinPts $=$ 5, $\epsilon$ $=$ 0.9, A and B are detected as one cluster, C is another.

Figure 3.
In this case, there is no global set of parameters MinPts and $\epsilon$ for DBSCAN to extract exactly 3 clusters A, B, and C in the data set sample with different density.

2.2 Semi-supervised density-based clustering

SSDBSCAN extends the original DBSCAN algorithm by using a small set of labeled data to cope with the problem of finding clusters in distinct densities data. SSDBSCAN overcomes this problem by using seeds to compute an adapted radius $\epsilon$ for each cluster. Thus, SSDBSCAN has only one parameter MinPts while the $\epsilon$ parameter being evaluated from the set of provided seeds. To this aim, the data set is represented as a weighted undirected graph where each vertex corresponds to an unique data objects and each edge between objects $p$ and $q$ has a weight determined by the rDist() measure described hereafter.

The rDist(p,q) measure indicates the smallest radius value for which $p$ and $q$ are core points and directly density connected with respect to MinPts. Therefore, rDist() can be formalized as follows:

$\textit{rDist(p,q)}=\textit{max}\{\textit{cDist(p),cDist(q), d(p,q)}\}$ (1)

where $d()$ is the metric used in the clustering and $\forall$ o $\in$ X, cDist(o) is the minimal radius such that $o$ is a core-point and has MinPts nearest-neighbors.

Given a set of seeds $D$ , SSDBSCAN algorithm proceeds as follows. Using the previous distance rDist(), it is possible to construct a density-based cluster $C$ that contains the first seed point $p$ , by first adding $p$ to $C$ and then iteratively adding the next closest point in term of rDist() distance to $C$ . The process continues until there is a point $q$ that has a different label from $p$ . At that time, the algorithm backtracks to the point $o$ with the largest rDist() before adding $q$ . The current expansion stops and includes all points up to but excluding $o$ , having a cluster $C$ containing $p$ . Conceptually, this is the same as constructing a minimum spanning tree (MST) for a complete graph where the set of verticals equal $X$ and the edge weights are given by rDist().

The complexity of SSDBSCAN is $O(mN^{2}+\textit{mNlogN})$ in which $m$ is number of seeds and $N$ is the number of data.

3. Proposed method

This section presents our proposed method, namely MCSSDBS. Contrary to other semi-supervised clustering algorithms in literature, we try to integrate seed, must-link, and cannot-link constraints in the process of finding clusters. Moreover, we argue that choosing the largest edge in the set of all density-connection paths as separation between two clusters (as in SSDBSCAN) may not be the best solution. Our proposal, instead, uses an active learning phase to tackle with this problem. The main steps of MCSSDBS are presented in Algorithm 3 as follows.

[h]MCSSDBS[1] A set of data X, a set of seed D, and a set of constraints (must-link and cannot-link) Clusters and outliers (option)

Construct the graph G from data w.r.t new_rDist() $x\in D$ Constructing a part of cluster from x follow MST algorithm in which using CL constraints and using Algorithm 4 to identify the cut point D $=$ D – {x}

Constructing final clusters by using must-link constraints (Option) w.r.t new_rDist() add outliers to the nearest clusters

The first step of MCSSDBS (Line 1) is also the construction of graph with the weights calculated by rDist(), but with embedded must-link and cannot-link constraints, we have the new_rDist(), in which the cDist(o) in Eq. (1) now is the minimal radius such that $o$ is a core-point and has MinPts nearest-neighbors which must satisfy must-link and cannot-link constraints. It means that constraints are used in the identify nearest-neighbors process.

Figure 4.

Explain steps 2–5 in Algorithm 1: MCSSDBS enlarging process on our data set sample. In our proposed algorithm, the cut point is detected as the edge 7 instead of the edge 16 in SSDBSCAN. The point A, therefore, belongs to the same cluster as Seed 1.

[h] Identifying the cut point[1] a path in graph the cut point Classify all the edges of the path in descending order according to their new_rDist() value each sorted edge Ask the expert if the relation between the vertices is a must-link or cannot-link constraint the expert answer is CLChoose the edge new_rDist() value as the cut point. the cut point

The second step (Lines 2–5) is the enlarging clusters from seeds. The important key here is identifying the cut point. In contrast with SSDBSCAN, we use the active learning phase (Line 3 in Algorithm 3) expressed in Algorithm 4. Our hypothesis is that the longest edge may not be the an appropriate cut point. For clarity of these steps, we explain our MCSSDBS enlarging process on our data sample in Fig. 4. Starting from the Seed 1, the algorithm constructs a graph according to new_rDist() values. The Seed 1 connects to other observations in the following edge sequence as {1, 2, 3, …}. The process stops when another seed with different label (Seed 2 in our example) is added to the tree. Here, SSDBSCAN looks for the currently largest edge, that is the edge 16, so that it cut this edge, and put the point A into the same cluster as the Seed 1. However, in fact the point A should belong to the same cluster as the Seed 2. In our algorithm, MCSSDBS, the cut point should be the edge 7. It is also note that in case we cannot get any CL label from users (the response of users may be “We do not know” for all questions), the largest cut will be used to decide the cut point as in SSDBSCAN.

In the third step (Line 6), ML constraints are used one more time at the last step to merge isolate points such that those have a link with clusters. Finally, the remaining points which do not belong to any clusters can be seen as outliers. Line 7 is an option that the outliers can be added to their nearest clusters with respect to new_rDist() values.

The algorithm complexity: The complexity of SSDBSCAN is $O(mN^{2}+\textit{mNlogN)}$ in which m is number of seeds and $N$ is the number of data. The complexity of steps for finding the cut point is calculated as follows. Given a graph with $N$ vertices, the minimum spanning tree of this graph has $N-1$ edges, so in the worst case, the complexity for finding the cut point for each kind of seed is O(NlogN), and hence making O(mNlogN) time complexity for this step. Finally, the complexity of MCSSDBS is $O(mN^{2}+\textit{2mNlogN})$ .

4. Experiment and results

4.1 Experiment setup

To evaluate the effectiveness of our new method, a series of experiments are conducted. Firstly, we use data sets from UCI that is the traditional data sets for evaluation of clustering algorithm in literature. We then apply MCSSDBS to facial expression problem on the extend Cohn-Kanada (CK+) data set and our new facial expression data set collected from Vietnam citizens. At each time of running, constraints and seeds are randomly generated and reported results are averaged over 20 times.

Evaluation criteria: The Rand Index is used for calculating results of clustering as the previous papers used [20]. The Rand Index is a measure of the similarity between two data clusterings. Let $S=\{o_{1},o_{2},\ldots,o_{n}\}$ is a set of $n$ elements, and two partitions of S to compare, $X=\{x_{1},\ldots,x_{r}\}$ – a partition of $S$ into $r$ subsets, and $Y=\{y_{1},\ldots,y_{s}\}$ – a partition of S into $s$ subsets. In order to calculate the Rand Index, we have to count the numbers of pair failing in each of the four options that can be occurred as shown in Table 1.

Table 1
Comparing the pairs

	Pairs in same cluster (X)	Pairs in different clusters (X)
Pairs in same cluster (Y)	a	b
Pairs in different cluster (Y)	c	d

Then, the Rand Index is defined as:

$\textit{RI}=\frac{a+d}{a+b+c+d}=\frac{a+d}{\frac{n!}{2!(n-2)!}}$ (2)

The value of RI is a number between 0 and 1 for the original version, in which, RI $=$ 0 when the two clusterings have no similarities, while RI $=$ 1 when the clusterings are identical. In our experiments, we calculate RI in percentage. A higher the RI value indicates a the better performance of the clustering algorithm.

4.2 Experiments on UCI datasets

4.2.1 Description of UCI datasets

To evaluate the effectiveness of our algorithm, we compare MCSSDBS with SSDBSCAN using 6 data sets from UCI machine learning [19], named: Iris, Haberman, Soybean, Glass, Zoo, and Thyroid. The detail of these data sets are shown in Table 2.

Table 2
Details of UCI data sets

Data	#Objects	#Attributes	#Clusters
Iris	150	4	3
Haberman	306	3	2
Soybean	47	35	4
Glass	214	9	6
Zoo	101	16	7
Thyroid	215	v5	3

We fixed MinPts $=$ 3 in all experiments for MCSSDBS and SSDBSCAN. The labeled objects were randomly selected, independently of the number of classes in the data set as {5, 10, 15, 20, 25} percents of the whole data set respectively, except for Soybean, and Zoo data set, just {10, 15, 20, 25} percents of the total objects are picked as seeds to ensure this number is at least equal to the number of clusters. We also vary the number of pairwise constraints from 10 to 40, except for Soybean – the smallest data set, where the number of pairwise constraints is selected in a sequence of 5, 10, 15, and 20. The must-link constraint is formed if they belong to the same cluster, and cannot-link otherwise.

4.2.2 Experimental results

Figure 5.

Comparison results between MCSSDBS and SSDBSCAN on 6 widely-used data sets from UCI w.r.t Rand Index (the higher, the better).

The results for MCSSDBS and SSDBSCAN are an average over 20 runs, and depicted in Fig. 5a–f in which the blue lines with square markers show results of our proposed method MCSSDBS using both must-link (ML) and cannot-link (CL) constraints, while the pink dash lines with right-pointing triangle markers are results of MCSSDBS in case of not using constraints and the last one represents results of the algorithm compared, SSDBSCAN. Based on that, the following observations can be made:

•

A glance at the graphs reveals that MCSSDBS with or without using constraints (ML $+$ CL) clearly outperforms than SSDBSCAN in all experiments, indicating the benefit of using both seeds and constraints to build the clusters. Turning to the details, the improvement of MCSSDBS (ML $+$ CL) is more pronounced for Haberman (Fig. 5b) and Thyroid (Fig. 5f) data set, that gives the two biggest improvement over SSDBSCAN, of around 6% and 4% respectively.

•

Results in case of not using Constraints indicate that MCSSDBS validates our hypothesis that the longest edge may not be the best criterion in the case of density based clustering algorithms. MCSSDBS considers active learning in the iteration process, answering example-level queries whether or not the two observations belong to the same cluster, so that the more sufficient cut point is detected. Thus, even without using Constraints, MCSSDBS works pretty well compared to SSDBSCAN. Take Iris as an example, the given graph (Fig. 5a) illustrates the advantage of active learning phase in the iteration clustering process. With 5% of the whole data set are seeds (corresponding to 8 seeds), SSDBSCAN method reaches an RI accuracy of 89%. Contrary to SSDBSCAN, MCSSDBS-Without Constraints involves further approximately an average of 11.5 queries, and as the result, it gives an 3% improvement, obtaining the RI accuracy of 92%.

4.3 A case study for facial expression recognition task

4.3.1 Introduction to facial expression data sets

Facial expression: Facial expression plays the major role in non-verbal communication. According to Meharabian [21], 55% communicative cues can be judge by facial expression. Human facial expression recognition (FER) is one of the most active research areas in the field of Human Computer Interaction (HCI), medical applications, artificial intelligent based robotics and automated access control. Back in the 80’s, Ekman and Friesen [22] represent 6 basic face expressions that are Happy, Surprise, Disgust, Sad, Angry, and Fear. They are named as 6 universal emotions and are used by most researchers. Since that, the problem of classifying facial expressions in images have been widely studied in which many related techniques such as feature extraction, facial detection, image normalization, have been used in the literature. Contrary to classifying facial expressions task, facial expression using clustering is an unsupervised learning task which focuses on grouping facial expressions in 6 groups. This task can find in some real applications. In this paper, we use two facial expression data sets that are the extended Cohn-Kanade Dataset (CK+) and ITI facial expression data set collected from Vietnam citizens to test our new algorithm.

CK+ dataset: The CK+ data set, released in 2010, was developed for automated facial image analysis and synthesis and for perceptual studies. The database consists of 593 posed sequences from 123 subjects. CK+ dataset provides protocols and baseline results, and is widely used to compare the performance of difference models for facial expression recognition. We use 308 of the labeled expression images, containing 6 different facial expressions: anger, disgust, fear, happy, sadness, and surprise for experiments. The number of images belonging to each category is shown in Table 3.

ITI facial expression data set: In our current direction research, we want to use intelligent data analysis methods in real applications. One problem that we are studying is how to improve quality of public services such as government services, airport services, bank services, to mention a few. It is natural to record photos through camera systems and then extract facial parts to evaluate citizens’ feedbacks. We propose to use clustering algorithms to group facial expression image following its emotion. These results have been used for organizations to evaluate how their clients’ emotions are. For this purpose, we collect the ITI facial expression data in Vietnam.

ITI facial data set is collected in 2017 in Vietnam, aimed for automated facial image analysis and synthesis and for perceptual studies. Three hundred and fifty-four photos are collected in the real condition with different illumination, pose and resolution, with and without glasses from 28 volunteers: 7 women and 21 men. The expression in each photo was ratings over 2 subjects, then is given an emotion label. We use 354 labeled expression images, containing 6 different facial expressions: anger, disgust, happy, neutral, sadness, and surprise for experiments. The number of images belonging to each category is shown in Table 4. Figure 6 shows some examples of collected images in ITI data set. We also note that, in facial expression task, it is not difficult to get label from users for seeds and constraints compared with other domains such as annotating gene and disease, speech recognition task, or media labeling [9].

Table 3
Distribution of CK+ data set

Category	No. of images	Total images
Angry	45	308
Disgust	58
Fear	25
Happy	69
Sadness	28
Surprise	83

Table 4

Distribution of ITI data set

Category	No. of images	Total images
Angry	38	354
Disgust	18
Happy	132
Neutral	102
Sadness	38
Surprise	26

Figure 6.

Examples of ITI data set.

4.3.2 Facial extraction procedure

Image pre-processing: The image pre-processing procedure comes as a very important step in the FER system. In real working conditions, captured images often face illumination and pose problems. To deal with such the situations, and also in order to depict only a face expressing certain expression with uniform size and shape, we proposed an image pre-processing procedure consisting of following components: eye region localization using Viola and Jones method [23], orientation normalization, scale normalization based on Geometric face model [24], and brightness normalization with histogram equalization and Median filtering.

After eye region localization, the left eye center $(x_{1},y_{1})$ and right eye center $(x_{2},y_{2})$ of a given facial image would be identified. Then, the orientation or the amount of horizontal rotation of the face is given by:

$\displaystyle\theta=\tan^{-1}\frac{|y_{1}-y_{2}|}{|x_{1}-x_{2}|}$

The direction of the rotation (clock-wise or counter clock-wise) depends on the difference $(y_{1}-y_{2})$ .

We would also note that histogram equalization transforming the values in an intensity image so that the histogram of the output image is approximately flat. Median filtering is expected to (more or less) keep the edges well maintained while doing image smoothing. It is particularly useful for salt-and-pepper noise where it is highly probable that these noisy pixels will appear the beginning and at the end when sorting pixel neighborhoods.

Figure 7.

Comparison results between MCSSDBS and SSDBSCAN on CK+ data set.

Figure 8.

Comparison results between MCSSDBS and SSDBSCAN on ITI facial data set.

Facial data extraction: After the face is located and normalized, the next step is to extract and represent the facial changes caused by facial expression. In our work, we use 2D Gabor filters [25] to extract the features for a face representation. It works pretty much the same way as are conventional filters. Design of Gabor filters is accomplished by tuning the filter with a specific band of spatial frequency and orientation. As commonly setup in most practical case [26, 27], we design the filters with five frequencies (reciprocals of frequency) and eight orientations. Additionally, because Gabor filters suffer from the disadvantage of high dimensional feature space, these obtained vectors are then normalized to zero mean and unit variance and the size is reduced by using PCA [28].

4.3.3 Facial expression recognition results

Recognition accuracy: The performance of MCSSDBS for facial expression recognition on CK+ and ITI data sets are given in Figs 7 and 8 respectively. The blue line with square markers indicates experiments on MCSSDBS with a different numbers of seeds and pairwise constraints. These experiments verify its effectiveness for facial expression recognition, yielding 89.5% clustering accuracy on CK+ and approximately 79% on ITI facial data set with 25% data is used as seeds and 400 pairwise constraints.

Contribution of must-link and cannot-link constraints in MCSSDBS: To answer the question which Must-Link or Cannot-Link constraints are more beneficial in MCSSDBS clustering process, we performed experiments as illustrated in Figs 7 and 8. A number of constraints for the method is randomly generated in each time of running, but one is with only must-link constraints (red lines with downward-pointing triangle markers) and another is with a set of both must-link and cannot-link constraints (blue lines with square markers). As explained, in the first step of MCSSDBS, we use constraints in the construction of graph process. Here, if exists a cannot-link constraint between two data points, we will not calculate the distance between them. However, similarly to SSDBSCAN, MCSSDBS uses seeds to partition a graph into connected components. Thus, cannot-link constraints make a little advantages in partitioning step. Must-link constraints, on the other hand, are used in the finding nearest neighbor set for each point of data. The red lines with downward-pointing triangle markers in Figs 7 and 8 depict that MCSSDBS with only Must-Link constraints outperforms in all experiments. For example, on CK+ data set, the algorithm achieves a really high RI accuracy of 97.21% with 400 Must-Link constraints. It can also be seen that the more number of Must-Link constraints we have the more benefits we obtain. Such these results claim the key contribution of Must-Link constraints in our proposed algorithm.

5. Conclusion

In this paper, we have presented a density-based clustering algorithm, namely MCSSDBS, that extends our preliminary research [14] for semi-supervised clustering tasks. The special aspect of our algorithm is that it can integrate both kinds of side information: seeds and pairwise constraints. Moreover, the active learning phase was applied to help the algorithm in the iteration process detect the cut point.

Experiment results on UCI data sets show that our algorithm can boost the quality of clustering, compared with SSDBSCAN. To the problem of facial expression recognition, our proposed method is verified on the extend Cohn-Kanade data set and on our new facial expression data set collected from volunteers among Vietnam citizens. Given pretty good results prove that such the semi-supervised clustering MCSSDBS is a good choice for facial expression clustering task. Finally, in near future, we will continue to apply MCSSDBS for other problems in Vietnam.

Footnotes

Acknowledgments

This research is funded by Vietnam National University, Hanoi (VNU) under project number QG.17.43.

References

Basu

Davidson

and Wagstaff

K.L.

, Constrained Clustering: Advances in Algorithms, Theory, and Applications, Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, 1st ed., 2008.

Wagstaff

K.L.

Cardie

Rogers

and Schroedl

, Constrained K-means Clustering with Background Knowledge, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), 2001, pp, 577–584.

Basu

Banerjee

and Mooney

, Semi-supervised Clustering by Seeding, Proceedings of the Nineteenth International Conference on Machine Learning (ICML), 2002, pp. 27–34.

Bensaid

A.M.

Hall

L.O.

Bezdek

J.C.

and Clarke

L.P.

, Partially supervised clustering for image segmentation, Pattern Recognition 29(5) (1996), 859–871.

Mavroeidis

, Accelerating spectral clustering with partial supervision, Data Mining and Knowledge Discovery 21(2) (2010), 241–258.

Lelis

and Sander

, Semi-supervised Density-Based Clustering, Proceeding of IEEE International Conference on Data Mining, 2009, pp. 842–847.

Ruiz

Spiliopoulou

and Menasalvas

, Density-based semi-supervised clustering, Data Mining and Knowledge Discovery 21(3) (2010), 345–370.

Böhm

and Plant

, HISSCLU: A hierarchical density-based method for semi-supervised clustering, Proceedings of the 11th international conference on Extending database technology: Advances in database technology (EDBT’08), 2008, pp. 440–451.

Settles

, Active learning literature survey, Computer Sciences Technical Report 1648, University of WisconsinMadison, 2010.

10.

V.-V.

and Labroche

, Active seed selection for constrained clustering, Intelligent Data Analysis 21(3) (2017), 537–552.

11.

V.-V.

, An efficient semi-supervised graph based clustering, Intelligent Data Analysis 22(2) (2018).

12.

V.-V.

Labroche

and Bouchon-Meunier

, Improving constrained clustering with active query selection, Pattern Recognition 45(4) (2012), 1749–1758.

13.

V.-V.

Labroche

and Bouchon-Meunier

, Active learning for semi-supervised k-means clustering, Proc. 22nd IEEE International Conference on Tools with Artificial Intelligence, 2010, pp. 12–15.

14.

V.-V.

and Do

H.-Q.

, Density-based clustering with side information and active learning, The 9th International Conference on Knowledge and Systems Engineering, 2017, pp. 174–179.

15.

Basu

Banerjee

and Mooney

R.J.

, Active Semi-Supervision for Pairwise Constrained Clustering, in: Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), 2004, pp. 333–344.

16.

Abin

A.A.

and Beigy

, Active selection of clustering constraints: a sequential approach, Pattern Recognition 47(3) (2014), 1443–1458.

17.

Abin

A.A.

and Beigy

, Active constrained fuzzy clustering: A multiple kernels learning approach, Pattern Recognition 48(3) (2015), 953–967.

18.

Ester

Kriegel

H.-P.

Sander

and Xu

, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.

19.

Lichman

, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013.

20.

Rand

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, American Statistical Association 66(336) (1971), 846–850.

21.

Mehrabian

, Communication without Words, Psychology Today 1(2) (1968), 53–56.

22.

Ekman

and Friesen

W.V.

, Constants across Cultures in the Face and Emotion, Journal of Personality and Social Psychology 17(2) (1971), 124–129.

23.

Viola

and Jones

M.J.

, Robust real-time face detection, International Journal of Computer Vision 57 (2004), 137–154.

24.

Shih

F.Y.

and Chuang

C.-F.

, Automatic extraction of head and face boundaries and facial features, Information Science 158 (2004), 117–130.

25.

Daugman

, How Iris Recognition Works, IEEE Transactions on Circuits and Systems for Video Technology 14(1) (2004), 21–30.

26.

Fasel

and Luettin

, Automatic facial expression analysis: A Survey, Pattern Recognition 36(1) (2003), 259–275.

27.

Štruc

and Pavešic

, The Complete Gabor-Fisher Classifier for Robust Face Recognition, EURASIP Journal on Advances in Signal Processing, 2010.

28.

Jolliffe

I.T.

, Principal component analysis, Springer-Verlag, Berlin, 1986.

An efficient density-based clustering with side information and active learning: A case study for facial expression recognition task

Abstract

Keywords

1. Introduction

2. Related work

2.1 Density-based clustering

.

.

.

.

4.1 Experiment setup

Table 1 Comparing the pairs

4.2.1 Description of UCI datasets

Table 2 Details of UCI data sets

4.3.1 Introduction to facial expression data sets

Table 3 Distribution of CK+ data set

5. Conclusion

Footnotes

Acknowledgments

References

Table 1
Comparing the pairs

Table 2
Details of UCI data sets

Table 3
Distribution of CK+ data set