From low-level geometric features to high-level semantics: An axiomatic fuzzy set clustering approach

Abstract

In this paper, we developed a new method to extract semantic facial descriptions by using an Axiomatic Fuzzy Set (AFS)-based clustering approach. Landmark-based geometry features are first used to represent facial components, and then we developed a new feature selection algorithm to select salient features based on feature similarities defined in AFS. Finally, the AFS-based clustering technique was used to extract the high-level semantic concepts. Extensive experiments showed that the proposed method can achieve much better results than the conventional clustering approaches like K-means and Fuzzy c-means clustering (FCM).

Keywords

Face representation semantic description AFS learning feature selection fuzzy clustering

1 Introduction

Content-based image retrieval (CBIR) aims to find images from an image database that closely matches a query given by users. Initially, in CBIR systems a user submits a query image, and the CBIR systems would extract local features, such as color, texture, etc. [1]. And then they try to find possible matches of these local features in the given database. In fact, a more desirable CBIR system from user’s perspective can be achieved by improving its capability to process semantic query, since in many cases the accurate query image may be not available or reliable. The semantic-based image retrieval system allows the user to input a query in terms of natural language expression [2].

Face image retrieval (FIR) is a special type of CBIR, where the goal is to find similar human faces to a query either in a form of image or semantic description. Currently only a very limited amount of work have been done on semantic-based FIR systems due to its difficulty. One main challenge is to find a robust and reliable method which can automatically determine the semantic description of a face image only based on the low-level features and this is known as the “semantic gap” problem [3]. Among existing FIR systems, one existing approach is to use a probabilistic method based on local low-level features, such as color, texture, and high-level semantic labels to re-rank the database [4]. Another approach begins with 24 manually marked key points and associated keywords that characterize each face. Singular value decomposition is applied to create a Latent Semantic Space allowing face images to be retrieved by a semantic query [5]. However such method is not appropriate for large databases as all face images in the database need to be annotated manually. Another approach is to utilize fuzzy-based method for the semantic retrieval of face images. More specifically, it uses fuzzy sets to bridge the semantic-gap between low-level visual features and high-level descriptions and the fuzzy c-means method (FCM) is used to calculate the similarities between different subjects [6]. It is fascinating to use fuzzy sets to narrow the semantic-gap since it is easier to use fuzzy descriptions to represent high level semantic descriptions, such as “big eyes”, “long nose”, etc. In all these works, the high level semantic concept descriptions are required and they are given partially or fully as ground truth information. Such requirement hinders the applicability of these systems since it is hard to obtain the ground truth information for semantic descriptions objectively.

Traditional clustering methods, such as K-means, FCM, Gaussian Mixture Model (GMM), etc., are used for semantic-based clustering [6, 7]. However these approaches are more likely to produce poor results due to the lack of relation with semantics or simple adoption of Euclidean distance [8]. In order to overcome the weakness of those clustering methods, Ren and Liu proposed a novel approach called AFSC [9]. It used a landmark detector to extract facial components with landmark points, such as eye and nose, and then selected salient features based on similarity principle and finally an AFS-based fuzzy clustering method [10, 11] was used to extract semantic descriptions by clustering those facial components. As a by-product of AFS theory, the fuzzy rules were extracted to represent semantic description for defined facial components. However, we found two shortcomings in that work. First, the features used are mainly based on overall size or area and such features contain little shape information. Second, the feature selection algorithm they used is trying to find similar features and discard dissimilar ones. In this case, they may loss part of information and lead to large biases in the clustering process.

In this paper, we are following Ren and Liu’s framework, using facial landmark to detect facial components and clustering them with AFS-based fuzzy clustering method. However, we utilize more features related to local shape information and also propose a new feature selection algorithm based on similarities of features. Also the relationship between the number of features and clustering results is investigated. Finally the clustering results based on the proposed approach are compared with FCM, K-means, and AFSC, and it confirms the superiority of the proposed approach to these approaches. In addition, the achieved stable semantic descriptors possess better semantic descriptions in terms of an area metric and other criteria.

The remainder of this paper is organized as follows: Section 2 briefly reviews related work on facial landmarks detection and AFS theory. Section 3 details the proposed semantic facial descriptor extraction method. Experimental results and the effect of parameter k are discussed in Section 4. And the paper is concluded in Section 5.

2 Related works

2.1 Facial landmarks detection

In order to extract semantic descriptions for each facial component, landmarks on a face are required. Automatic facial landmark detection is thus necessary for our work. The representative classic approaches are Active Appearance Models (AAMs) [12] and elastic graph matching [13, 14]. Recent works focused on local part detectors, known as Constrained Local Models (CLMs) [15]. There are two major tasks in detecting the face landmarks. The first one is face detection, which aims to find face(s) in an image and extract its location. One popular approach is the Viola and Jones face detector [16]. The second task is facial landmarks detection, which aims to extract facial components, such as eyes, nose, mouth and face contour from detected face by putting landmarks on each component. Some recent facial landmarks detectors are reported in [17 –19]. Among them, Zhu and Ramanan’s approach [17] is considered as one of the best approaches for facial landmarks detection.

Even through the performance of this model is sufficient for some applications, there are only 68 facial landmark points for frontal face which are not sufficient to describe semantic details of each component. Therefore, Liang and Liu [20] extended the frontal face models from 68 landmark points to 130 landmark points, which can cover the facial components in regions instead of only lines in order to provide better shape information. The results show their model is significantly more accurate on detecting facial landmarks for frontal face images.

2.2 AFS theory

It is common for human to describe a face with semantic concepts, such as “a face with large eyes, small nose and round face contour”, while features dealt with by computer always are low-level visual features (e.g. color, texture, and curvature, etc.). This well-known semantic-gap between these two level features is the major challenge for semantic-based image analysis, especially in face image retrieval. On the other hand, fuzzy sets have been proven to be effective in semantic image analysis [21, 22]. It motivates us to utilize fuzzy model to bridge the semantic gap between feature space and high-level concepts.

Fuzzy set theory, which was originally proposed by Zadeh [23], provides a general way to acquire linguistic IF-THEN rules. In conventional fuzzy theory, the membership functions are often given by personal intuition manually and the logic operations are equipped by a kind of triangular norms, or shortly t-norm which is chosen in advance and independent of the distribution of raw data. However, the large-scale intelligence systems in real-world applications are usually very complex, containing such a large number of concepts that it is difficult to define the membership functions manually.

AFS (axiomatic fuzzy set) algebra was proposed by Liu [10 , 24–30], and it has been experimentally proved to be a powerful approach for semantic concept extraction and interpretations for fuzzy attribute [9 , 32]. The aim of AFS is to explore the possibility of fuzzy set theory and probability theory working in concert, so that uncertainty of randomness and fuzziness for a concept can be treated in a unified and coherent manner. The main idea behind AFS is to transit the extracted information from the observed data into membership functions and implement their logic operations. Researches in [30, 33] defined the membership functions based on the fuzzy logic operations by taking both fuzziness and randomness in account. One advantage for AFS-based clustering algorithm is that it can calculate the membership degree between the feature vectors extracted from the images and the linguistic terms that characterize the semantics. An example to demonstrate the approach of computing the membership functions was presented in [27]. The research monograph [30] offers a comprehensive introduction of AFS theory and its applications.

3 AFS-based semantic facial description

Our approach starts with an automatical landmark detection and then we extract geometric features from the landmark points, including both global and local features. For the sake of removing redundant features and improving efficiency, we propose an unsupervised feature selection approach. Then a fuzzy clustering method is applied to processed data. With the information of clusters, we extract semantic concepts via fuzzy membership. Figure 1 framework shows the whole framework of the approach.

3.1 Feature extraction from landmark points

By noting that all the features (Fig. 2) used in [9] are global features in terms of size, we argue that global features are not enough for the consistent of shape in each cluster. Besides the global features used in their approach, we need some local features that can depict the shape of facial components. Then we define the “star” features for all facial components, in which each feature is the Euclidean distance between the eye’s center and one of the boundary points as shown in Fig. 3 featureExtraction. Note that the “star” feature is similar to one of the global features, centroid distance. But the later one is the sum of Euclidean distance of all features in “star” for eye. And one can also notice from Fig. 3 featureExtraction that the “star” feature we defined for nose is totally different with the centroid distance of nose. Instead of using all boundary points, we only picked up some key points related to the shape of nose. With those local features, we can guarantee the shape is consistent in each cluster, then we also need all the global features to guide the clustering process with expectation to cluster all eyes into three classes eventually with different size. So we combine the five features in Ren’s approach and the proposed “star” feature, in total, 17 features are used in this paper.

3.2 Unsupervised feature selection based on fuzzy similarity

After we combined the global and local features, it is obvious that there are some redundant features, especially for the 12 “star” features. One may easily realize we can simply use f₁, f₄, f₇ and f₁₀ to represent the “star” feature of eye. But we found it is hard to find similar useful subset for nose or other components. Then we propose an unsupervised feature selection algorithm. The proposed feature selection algorithm is based on fuzzy similarity in AFS theory. For feature α and β, the similarity between them is defined as follows: ${SI}_{α β} = \frac{\sum_{x \in X} μ_{α \land β} (x)}{\sum_{x \in X} μ_{α \lor β} (x)}$ (1) where X is the set of samples, μ_α∧β (x) and μ_α∨β (x) are the membership functions of sample x belonging to fuzzy set α ∧ β and α ∨ β respectively, where the membership function is defined as: $μ_{ξ} (x) = sup_{i \in I} inf_{γ \in A_{i}} \frac{\sum_{u \in A_{i}^{⪰} (x)} ρ_{γ} (u) N_{u}}{\sum_{u \in X} ρ_{γ} (u) N_{u}}, \forall x \in X$ (2) where N_u is the number of samples of u and ρ is the weighting function [27].

The similarity is motivated by Jaccard similarity coefficient. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. But here instead of using size of sample sets, we are using the fuzzy membership function. Fig. 4 jaccard gives a intuitively graphic explantation of the similarity.

The proposed feature selection mainly involves two steps: (1) Vote each feature by all others based on pairwise similarity, and find the one which is the most similar to others; (2) Depending on the selected features, iteratively add one new feature which is the most dissimilar one to the existing selected features in. Intuitively, the first feature we picked contains the most information, just like principal component. Then, the next step we search the most dissimilar one with existing features, and that is like the orthogonal principal component.

Suppose we have n features, for each feature f_i (1 ≤ i ≤ n), we can compute the similarity SI_ij with all features and rank them as f_ij (1 ≤ j ≤ n) in a descending order. In fact f_ij is a re-ordering of {f₁, f₂, ⋯ , f_n}. Let S_ij be the score of f_j based on the index j in the list f_ij, i.e., $S_{ij} = j - 1 (1 \leq j \leq n)$ (3) where 0 ≤ S_ij ≤ n - 1 (S_i1 = 0, S_i2 = 1, . . . , S_in = n - 1). It is easy to see that the more dissimilar of two features, the larger the score S_ij. Then we sum the scores of each feature given by others and each feature f_j has a total score S_j, $S_{j} = \sum_{i = 1}^{n} S_{ij}$ (4)

Fig. 5 illustrates an example of feature voting. Note that the score is the “dissimilarity score”. In this case the “base” feature is defined as the the most similar one with all others, which has the lowest score. Now, let F represent the selected feature set, $F_{1} = {f_{j} | j = arg min_{1 \leq j \leq n} S_{j}}$ (5)

After we find the “base” feature, then the rest is done by a greedy process. Technically, we can have all features re-scored based the selected feature set F as stated below. Let N_F be the size of F, the new score of f_j is, $S_{j} = \sum_{i = 1}^{N_{F}} S_{ij}$ (6)

Then our aim is to select another feature which is the most dissimilar to what we have already selected, so the new feature f_j for our selection is, $F_{j} = {f_{j} | j = arg min_{1 \leq j \leq n} S_{j}}$ (7)

One can see that if we keep going the above algorithm we will select all features one by one iteratively. Eventually a stopping criteria is required here. One intuitive choice is the score of new added feature should be larger than the “average score” of selected feature set F, i.e., $S_{j} \geq \bar{S_{F}};$ (8) where $\bar{S_{F}} = {nN}_{F} k (0 \leq k \leq 1)$ . It is easy to see that the number of selected feature is controlled by parameter k and in fact k is a ratio of average score and total score. The effect of parameter k will be analyzed in Section 4.3. Another choice for stopping is by experimental validation as discussed in Section 4.3.

Algorithm 1 Unsupervised Feature Selection based on Similarity

Require:

Feature similarity matrix M_n×n.

Ensure:

Selected feature set F

1: Let S_ij represent the score of feature f_j given by feature f_i, S_j be the total score of f_j (refer Eq. (5) and Eq. (4));

2: Sorting M on row by descending order, Scoring each feature based on ranking;

3: j : =1 to ndo

4: $S_{j} = \sum_{i = 1}^{n} S_{ij}$ ;

5: end for

6: i : =1;

7: $F_{i} = arg min_{1 \leq j \leq n} S_{j}$ ;

8: i ← i + 1;

9: whilei ≤ ndo

10: forj : =1 to ndo

11: ifj ∈ F_ithen

12: S_j = 0;

13: else

14: $S_{j} = \sum_{k = 1}^{i} S_{kj}$ ;

15: end if

16: end for

17: if $S_{j} \leq \bar{S_{F}}$

18: break;

19: end if

20: $F_{i} = arg max_{1 \leq j \leq n} {S_{j}}$ ;

21: i ← i + 1;

22: end while

Stop.

3.3 AFS clustering

3.3.1 Fuzzy characteristic description for facial components

To utilize AFS clustering, the first step is to transfer the data in feature space to AFS fuzzy space. Define F, the facial component set, F = {lefteye, righteye, nose, mouth . . .}, f ∈ F. Suppose we choose top k features from the feature ranking in the previous step. Then we divide each feature into 3 fuzzy parts with semantic interpretation, “large”, “medium”, “small”, represented by fuzzy terms $M^{f} = {m_{i, j}^{f} | 1 \leq i \leq k, 1 \leq j \leq 3}$ , where $m_{i, j}^{f}$ is the jth fuzzy term associating with the ith feature of facial component f.

Let $I_{k}^{f}$ represent the facial component f ∈ F on the kth face image I_k. For example, $I_{k}^{f_{1}}$ is the right eye on the kth face image I_k. Let $μ_{m} (I_{k}^{f})$ be the membership degree of $I_{k}^{f}$ belonging to fuzzy term m. Then we select salient fuzzy attributes/terms to represent each facial component significantly, which we call fuzzy characteristic description. For such purpose, we first define a set $B_{I_{k}^{f}}$ of fuzzy terms for $I_{k}^{f}$ which can be given as follows:

$B_{I_{k}^{f}} = {m \in M^{f} | μ_{m} (I_{k}^{f}) = μ_{\lor_{b \in M} b} (I_{k}^{f})}$ (9)

where $μ_{\lor_{b \in M^{f}} b} (I_{k}^{f}) = max_{m \in M^{f}} {μ_{m} (I_{k}^{f})}$ , i.e., the membership degrees of $I_{k}^{f}$ belonging to fuzzy terms in $B_{I_{k}^{f}}$ should be the largest value among the membership degrees of $I_{k}^{f}$ belonging to fuzzy terms in M^f. This set $B_{I_{k}^{f}}$ is in fact all the best fuzzy terms to characterize this facial component f and then the fuzzy facial component characterization can be given as $ζ_{I_{k}^{f}} = \land_{β \in B_{I_{k}^{f}}} β,$ (10)

where “∨” and “∧” are the fuzzy logic operations in AFS algebra defined in [30], and the semantic logic expressions of “∨” and “∧” are “or” and “and”. The computation details for $μ_{m} (I_{k}^{f})$ , $B_{I_{k}^{f}}$ and $ζ_{I_{k}^{f}}$ can be found in [27]. The fuzzy terms in $B_{I_{k}^{f}}$ can be regarded as the most salient characteristics of $I_{k}^{f}$ (the facial component f of the face image I_k). Thus, all the most salient characteristics are combined to describe $I_{k}^{f}$ as Equation (10). Keep in mind that the whole process is only for each component.

3.3.2 Facial component clustering

After the data has been transferred to the AFS fuzzy space, similarity of facial component $I_{i}^{f}$ and $I_{j}^{f}$ is defined as follows: $r_{ij} = min {μ_{ζ_{I_{i}} \land ζ_{I_{j}}} (I_{i}), μ_{ζ_{I_{i}} \land ζ_{I_{j}}} (I_{j})}$ (11) where r_ij (0 ≤ r_ij ≤ 1) represents the similarity between $I_{i}^{f}$ and $I_{j}^{f}$ . The larger r_ij, the more similar $I_{i}^{f}$ and $I_{j}^{f}$ . Then we obtain an relation matrix R = (r_ij) _n×n for each facial components. However, the similarity matrix R = (r_ij) _n×n does not necessarily satisfy the fuzzy transitive condition r_ij ≥ ∨ _k (r_ik ∧ r_jk), where ∨, ∧ stand for max and min, respectively. Usually an object is considered similar to another if and only if the degree of similarity between them is greater than or equal to a predefined threshold α. Therefore, the transitive condition states that, for any three objects i, j and k, if object i is similar to object k (r_ik ≥ α) and object k is similar to object j (r_kj ≥ α), object i is similar to object j (s_ij ≥ α) as well. Since the transitive condition is indispensable for clustering, the matrix can always be transformed into its Transitive Closure (denoted by TC (R) = (t_ij) _N×N). TC (R) is defined as a minimal symmetric and transitive matrix. Usually, TC (R) is obtained by searching for an integer k such that (R^k) ² = R^k. With a given α, objects can now be partitioned into different clusters. The optimal threshold α selection for the best clustering is proposed in [27] based on a validation index $I_{α}$ (Equation (12)) defined as follows: $I_{α} = \frac{\sum_{k = 1, 2, \dots, n} μ_{ζ_{bou}} (I_{k}^{f})}{\sum_{k = 1, 2, \dots, n} μ_{ζ_{Total}} (I_{k}^{f})}$ (12) where $ζ_{bou} = \lor_{1 \leq i, j \leq l, i \neq j} (ζ_{{\bar{C}}_{i}} \land ζ_{{\bar{C}}_{j}})$ , $ζ_{Total} = \lor_{1 \leq i \leq l} ζ_{{\bar{C}}_{i}}$ , l ≥ 2. Fuzzy set ζ_bou describes the boundaries among different clusters which shows the clarity of the clusters. The smaller the degree of an object belongs to ζ_bou, the more clearly it is clustered, just like compactness. Thus the less $I_{α}$ , the clearer the clustering. ζ_Total represents the overall characterization for all clusters, which can be treated as separateness. In practice, the threshold α is between the minimum and maximum values q_ij defined in the matrix Q. Let U = {α₁, α₂, …, α_u} = {q_ij| 1 ≤ i ≤ n, 1 ≤ j ≤ n} be the set of all the entries in Q and α₁ < α₂ < … < α_u. The best α is selected based on $I_{α}$ : $α = arg min_{α_{v} \in [α_{1}, α_{2}, \dots, α_{u}]} {I_{α}}$ (13)

The clusters ${\bar{C}}_{1}, {\bar{C}}_{2}, \dots, {\bar{C}}_{l}$ , which have more than one face images, can be obtained (i.e., the cluster with one single face image is discarded). In this case, we can obtain the initial clusters ${\bar{C}}_{1}, {\bar{C}}_{2}, \dots, {\bar{C}}_{l}$ for Q = (q_ij).

Remember $ζ_{I_{k}^{f}}$ is a characterization of $I_{k}^{f}$ , and also we have obtained an initial clusters ${\bar{C}}_{i}$ . Then we select the best $ζ_{I_{k}^{f}}$ for constructing the semantic description of each cluster based on the initial clusters as follows. $Γ_{i} = {ζ_{I_{k}^{f}} | \frac{| {y | y \in {\bar{C}}_{i}, μ_{ζ_{I_{k}^{f}}} (y) \geq λ} |}{| {\bar{C}}_{i} |} \geq ω}$ (14)

The elements in Γ_i are some fuzzy characteristic descriptions $ζ_{I_{k}^{f}}$ in the ith cluster ( ${\bar{C}}_{i}$ ). The motivation of this selection is that only some representative descriptions $ζ_{I_{k}^{f}}$ can be used to represent the semantic descriptions of ${\bar{C}}_{i}$ , and others are not typical enough to represent its cluster or are noise. Therefore, the fuzzy descriptions of some representative face images which can represent their facial component cluster are collected in Γ_i as Equation (14). All the most salient characteristics in Γ_i are combined to describe the initial facial component cluster ${\bar{C}}_{i}$ . Consequently, the sematic description of each facial component cluster can be defined as follows: $ζ_{{\bar{C}}_{i}} = \land_{γ \in Γ_{i}} γ, (if Γ_{i} \neq \emptyset)$ (15)

As explained previously, this set represents the salient description for each initial cluster. One can see that the universality and particularity of the semantic description of ${\bar{C}}_{i}$ can be controlled by ω and λ. The effects of the variation of parameter ω and λ are analyzed in [27], and plenty of experiments illustrate that the algorithm is not sensitive to the setting of the parameters if the parameters are selected in reasonable intervals.

Then all the semantic description $ζ_{{\bar{C}}_{i}}$ of each facial component cluster can be regarded as a classifier, in order to classify all the instances $I_{k}^{f}, k = 1, 2, \dots, n$ again. This process is called as re-clustering process. Its aim is to revise the initial clusters and cluster the lost instances whose similarities r_ij with other instances are lower than α. Finally, $I_{k}^{f}$ is re-clustered by measuring the membership degrees of $I_{k}^{f}$ belonging to $ζ_{{\bar{C}}_{1}}, ζ_{{\bar{C}}_{2}}, \dots, ζ_{{\bar{C}}_{l}}$ as follows: $\begin{matrix} I_{k}^{f} \in C_{q}, if q = arg max_{1 \leq i \leq l} {μ_{ζ_{{\bar{C}}_{i}}} (I_{k}^{f})} \end{matrix}$

i.e., if the membership degree of $I_{k}^{f}$ belonging to C_q is the largest one among the membership degrees of $I_{k}^{f}$ belonging to $ζ_{{\bar{C}}_{1}}, ζ_{{\bar{C}}_{2}}, \dots, ζ_{{\bar{C}}_{l}}$ , then $I_{k}^{f} \in C_{q}$ .

4 Experimental results

Empirical tests, observations, analysis and evaluations are presented in this section. 249 frontal faces from 249 subjects in Session one of Multi-PIE and 134 frontal faces from 134 subjects in session one of AR database are used. Since the face sizes vary in scale, the images were scaled so that the Euclidean distance between the pupil centers were fixed with the average distance of the database. The results are illustrated in two aspects. The first one is semantic extraction, we show the relationship between semantic concepts and low-level features. Each facial component is clustered into three groups, “large”, “medium”, “small”. Obviously “medium” is not a useful semantic concept in terms of applications. It is not so useful to describe a person with “medium eye” or “medium nose”. Therefore we only consider about two salient clusters, “large” and “small”. As for the second two features, we show that the clustering result with the proposed approach is better than those obtained by K-means and FCM in terms of area of facial components and some defined clustering indexes.

4.1 Eye clustering

4.1.1 Comparison of semantic concepts

First after the features selection with 17 features, the global feature centroid distance and some “local” features are selected for these two databases as shown in Fig. 6 eyeFeature.

From Fig. 6 eyeFeature we can see that, except for centroid distance, one of the features in Ren’s approach, is selected as the “global” feature, three “local” features are also selected to keep the shape information. This is very important, because the global feature can guide the clustering process while the other local features can help to hold the shape consistent. In addition, those features are selected by an automatic feature selection algorithm, for both Multi-PIE and AR data set, one global feature and three local feature are selected. To illustrate the advantage of the feature selection and eye clustering, the relationships between semantic concepts and those low-level features are shown as below after clustering.

Multi-PIE Large Eyes Cluster: $ζ_{C_{l}} = m_{15, 1}^{f_{1}}$ $m_{11, 1}^{f_{1}} m_{1, 1}^{f_{1}}$ , with the semantic rules: “The eyes in this cluster have large f₁₅, large f₁₁ and large f₁”;

Multi-PIE Small Eyes Cluster: $ζ_{C_{s}} = m_{15, 2}^{f_{1}}$ $m_{11, 2}^{f_{1}} m_{3, 2}^{f_{1}}$ , with the semantic rules: “The eyes in this cluster have small f₁₅, small f₁₁ and small f₃”.

AR Large Eyes Cluster: $ζ_{C_{l}} = m_{15, 1}^{f_{1}} m_{10, 1}^{f_{1}}$ $m_{7, 1}^{f_{1}}$ , with the semantic rules: “The eyes in this cluster have large f₁₅, large f₁₀ and large f₇”;

AR Small Eyes Cluster: $ζ_{C_{s}} = m_{15, 2}^{f_{1}} m_{10, 2}^{f_{1}}$ $m_{7, 2}^{f_{1}}$ , with the semantic rules: “The eyes in this cluster have small f₁₅, small f₁₀ and small f₇”.

From the above cluster’s description, it is easy to see that the characteristic combinations contain not only the global feature $m_{15}^{f_{1}}$ (centroid distance) but also local features related to the height and width of the eye. That can make the clustering better and more consistent. Figure 7 eyeExample shows the improvement of our method in terms of shape. It is obvious that in Ren’s approach, some long and narrow eyes are clustered into large eye group, which is not appropriate.

In order to show the quality of clusters visually, Fig. 8 eyeComparison shows four examples of facial image with large eye in Large Eyes cluster and four in Small Eyes cluster. And the comparison result is very encouraging in that the eyes of persons in Large Eyes cluster are distinctly larger than that in Small Eyes cluster, and also the shape is kept more consistently.

4.1.2 Comparison of clustering results with different clustering approaches

Considering a fact that there is no semantic ground truth data in available data sets to test the accuracy of our results. As we are actually clustering the “size” of the facial components,the area is chosen to be the validity criteria, which is the closest to human perception (i.e., a larger eye is the one with larger area). The area criteria is also adopted in [9]. In this paper, a new class separability is defined for evaluating the clustering performance. Below items are three indexes.

The average area of small eyes’ class $\bar{A_{S}}$ . $\bar{A_{S}}$ is defined as the mean of all subjects in the small cluster, a lower value means the eyes in small cluster are smaller than common people;

The average area of large eyes’ class $\bar{A_{L}}$ . $\bar{A_{L}}$ is the mean of large cluster, a higher value means eyes in large cluster are larger than common people;

The class separability CS. CS is defined as the quotient of the minimum inter-class distance and the maximum intra-class distance. $CS = \frac{min {D_{mn} | D_{mn} = \sum_{j = 1}^{M} \sqrt{({\bar{X}}_{m, j} - {\bar{X}}_{n, j})^{2}}}}{max {D_{pq} | D_{pq} = \sum_{j = 1}^{M} \sqrt{(X_{p, j} - X_{q, j})^{2}}}}$ where X_p,j denotes feature value for p along jth image, $\bar{X_{m}}$ is the center/mean of the m cluster, k and M are the number of clusters and features.

Our proposed method is compared with the conventional clustering algorithms, K-means and FCM, also the novel fuzzy approach AFSC on the face images data set Multi-PIE and AR. Due to the sensitivity to initial conditions, K-means and FCM are performed for 100 times to obtain the average result.

From the Table 1 algorithmComparison_eye we can see, compared to K-means and FCM, our method is much superior for all three indices in both Multi-PIE and AR. That is to say, our method can gain more clear and reliable clusters in terms of the size and shape of the eye. In addition, K-means and FCM can only gain the partition clusters, and no semantic descriptions which can be further applied to semantic FIR system. On the other hand, compared to Ren’s approach AFSC, our method has significantly better performance in Multi-PIE data set and it is slightly superior in AR data set. It is worth to mention that in AFSC, the area of facial components itself is one of the low-level features while same time acting as clustering criteria. Such selection can make the result seems better in bias. While in our algorithm, the feature we used is called “star” features, which are not associated with area directly. In this sense, our approach is more objective.

4.2 Nose clustering

We further apply the algorithm to nose clustering, and analyze both semantic results and clustering results as eye clustering. For the clustering performance, we use the evaluation criteria defined in Section 4.1.2.

4.2.1 Comparison of semantic concepts

Figure 9 noseFeature shows us the features selected by our feature selection algorithm for nose clustering. As we can see, two global features centroid distance and height in Ren’s approach and two local features from “star” are selected in AR database, while in Multi-PIE database, centroid distance and four local features are selected. The combination of global and local features are better than the features used in Ren’s approach as global feature can guide the clustering process while the other local features can guarantee the shape is consistent in each cluster.

To prove the superiority of there feature combinations, we present the clustering description, which is the conjunction of samples in clusters, below. Note that the clustering description is the representation of low-level features, and meanwhile in terms of high-level semantics, we can easily know which one is “large nose” while the other is “small nose”. Then semantic concepts are expressed by low-level features and the semantic gap is naturallybridged.

Multi-PIE Large Noses Cluster: $ζ_{C_{l}} = m_{15, 1}^{f_{2}} m_{14, 1}^{f_{2}} m_{1, 1}^{f_{2}} m_{10, 1}^{f_{2}}$ , with the semantic rules: “The eyes in this cluster have large f₁₅, large f₁₄, large f₁ and large f₁₀”;

Multi-PIE Small Noses Cluster: $ζ_{C_{s}} = m_{15, 2}^{f_{2}} m_{14, 2}^{f_{2}} m_{1, 2}^{f_{2}} m_{10, 2}^{f_{2}}$ , with the semantic rules: “The eyes in this cluster have small f₁₅, small f₁₄, small f₁ and small f₁₀”;

AR Large Noses Cluster: $ζ_{C_{l}} = m_{15, 1}^{f_{2}} m_{1, 1}^{f_{2}} m_{9, 1}^{f_{2}} m_{12, 1}^{f_{2}}$ , with the semantic rules: “The eyes in this cluster have large f₁₅, large f₁, large f₉ and large f₁₂”;

AR Small Noses Cluster: $ζ_{C_{s}} = m_{15, 2}^{f_{2}} m_{6, 2}^{f_{2}}$ $m_{9, 2}^{f_{2}}$ , with the semantic rules: “The eyes in this cluster have small f₁₅, small f₆ and small f₉”.

From the clustering description we can see that, not only global features in Ren’s approach [9] are used, but also some local features. In this case, different noses are distinctively grouped into different clusters depends on both size and shape. In order to show the quality of clusters visually, Fig. 10 noseComparison shows four examples of facial image in large nose cluster and four in small noses cluster. And the comparison result is very encouraging in that the noses in large eyes cluster are distinctly larger than those in small eyes cluster, and also the shape is kept more consistently.

To show the improvement of our method from Ren’s approach [9], we show some examples that they failed to cluster into appropriate groups in Fig. 11 noseExamples. It can be seen that, those noses are indeed large in terms of area of some other global features used in their approach, but they are clustered into medium noses as they are not large enough in terms of the local features in our method.

4.2.2 Comparison of clustering results with different clustering approaches

Table 2 presents the performance of different algorithmComparison_nose presents the performance of different algorithms for nose clustering. As we can see, K-means and FCM perform similar results while AFSC performs much better than them on Multi-PIE but similar result on AR. Compared to traditional approaches K-means and FCM, or the novel fuzzy approach AFSC, our method is superior on both Multi-PIE and AR dataset. It can bee seen from all three evaluation criteria. The mean area of small nose group is the smallest one while the mean area in large nose group is much larger than others. And also the class separability tells us the proposed method gives a more meaningful clustering result.

4.3 Effect of parameter k

In the proposed algorithm, the size of reduced feature subset and hence, the scale of details of data representation is controlled by the parameter k (refer Equation (8)). Figure 12 parameterK illustrates such an effect on two data sets, Multi-PIE and AR. For certain range of k, it is observed that there is no change in the reduced subset, i.e., no reduction in dimension occurs. However, as expected, the size of the reduced subset decreases overall with increase in k. In this way, the representation of data at different degrees of details is controlled by its choice. This characteristics is useful in many areas where multi-scale representation of the data is often necessary. Note that the said property may not always be possessed by other algorithms where the input is usually the desired size of the reduced feature set. The reason is that changing the size of the reduced set may not necessarily result in any change in the level of details. In contrast, for the proposed algorithm, k acts as a scale parameter which controls the degree of details in a more direct manner.

Here we conduct some experiments to show the change of clustering performance along with the number of selected features. We use two validation indices, V1 and V2. Both of them are related to the indices used in Section 4.1.2. V1 is defined as the difference of $\bar{A_{L}}$ and $\bar{A_{S}}$ . V2 is the class separability CS. They are normalized to [1] for clear observation.

Figure 13 featurePerformance has shown that the clustering performance is sensitive with the change of number of features. In both Multi-PIE and AR data set, four features gives us the best clustering performance. With more than for features, the performance would decrease. It also proved the significance of our feature selection algorithm that help us to reduce computation cost as well as find the best scale of details of data representation.

5 Conclusion

In this paper, we developed a new feature selection algorithm to extract semantic facial component descriptions based on AFS fuzzy clustering algorithm. It has shown that the semantic-gap between high-level semantic concepts and low-level visual features has been bridged automatically by the fuzzy descriptions of clusters. Compared to the K-means and FCM, which have no semantic interpretation for their clustering result, the proposed method has a significant advantage. Also the semantic descriptions produced by our method are more consistent and meaningful compared to the novel fuzzy approach AFSSC. Since additional local features are detected, our method could reveal more latent shape cue. On the other hand, the clustering performance also outperforms other methods in both Multi-PIE and AR data set in terms of validation of area and class separability. The proposed feature selection method is proved to be useful, and it gives us an effective tool for multi-scale of data representation and dimension reduction.

References

Castellano

, Fanelli

and Torsello

, Shape annotation by semi-supervised fuzzy clustering, Information Sciences289 (2014), 148–161.

Rasiwasia

, Moreno

P.J.

and Vasconcelos

, Bridging the gap: Query by semantic example, Multimedia, IEEE Transactions on9(5) (2007), 923–938.

Smeulders

A.W.

, Worring

, Santini

, Gupta

and Jain

, Content-based image retrieval at the end of the early years, Pattern Analysis and Machine Intelligence, IEEE Transactions on22(12) (2000), 1349–1380.

Sridharan

, Nayak

, Chikkerur

and Govindaraju

, A probabilistic approach to semantic face retrieval system, in Audio-and Video-Based Biometric Person Authentication, Springer, 2005, pp. 977–986.

Ito

and Koshimizu

, Face image retrieval and annotation based on two latent semantic spaces in fiars, in Multimedia, 2006 ISM’06 Eighth IEEE International Symposium on IEEE, 2006, pp. 831–836.

Conilione

and Wang

, Fuzzy approach for semantic face image retrieval, Comput J55 (2012), 1130–1145.

Avanija

and Ramar

, Semantic similarity-based clustering of web documents using fuzzy c-means, International Journal of Computational Intelligence and Applications (2015).

Ramathilaga

, Jiunn-Yin Leu

, Huang

K.-K.

and Huang

Y.-M.

, Two novel fuzzy clustering methods for solving data clustering problems, Journal of Intelligent & Fuzzy Systems26(2) (2014), 705–719.

Ren

, Li

, Liu

and Li

, Semantic facial descriptor extraction via axiomatic fuzzy set, Neurocomputing171 (2016), 1462–1474.

10.

Xiaodong

, The fuzzy theory based on AFS algebras and AFS structure, Journal of Mathematical Analysis and Applications217(2) (1998), 459–478.

11.

Liu

, The fuzzy sets and systems based on AFS structure, EI algebra and EII algebra, Fuzzy Sets and Systems95(2) (1998), 179–188.

12.

Cootes

T.F.

, Edwards

G.J.

and Taylor

C.J.

, Active appearance models, IEEE Transactions on Pattern Analysis and Machine Intelligence23(6) (2001), 681–685.

13.

Leung

T.K.

, Burl

M.C.

and Perona

, Finding faces in cluttered scenes using random labeled graph matching, in Computer Vision, 1995 Proceedings, Fifth International Conference on IEEE, 1995, pp. 637–644.

14.

Wiskott

, Fellous

J.-M.

, Kuiger

and Von

, Der Malsburg, Face recognition by elastic bunch graph matching, Pattern Analysis and Machine Intelligence, IEEE Transactions on19(7) (1997), 775–779.

15.

Cristinacce

and Cootes

T.F.

, Feature detection and tracking with constrained local models, in BMVC1(2) (2006), 3. Citeseer.

16.

Viola

and Jones

, Rapid object detection using a boosted cascade of simple features, in Computer Vision and Pattern Recognition (CVPR), 2001 IEEE Conference on, 1 IEEE, 2001, pp. 1–511.

17.

Zhu

and Ramanan

, Face detection, pose estimation, and landmark localization in the wild, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, 2012, pp. 2879–2886.

18.

, Brandt

, Lin

, Bourdev

and Huang

T.S.

, Interactive facial feature localization, in Computer Vision–ECCV, Springer, pp. 2012, 679–692.

19.

Valstar

, Martinez

, Binefa

and Pantic

, Facial point detection using boosted regression and graph models, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on IEEE, 2010, pp. 2729–2736.

20.

Liang

, Liu

, Li

, Farid

M.R.

and Le

, Accurate facial landmarks detection for frontal faces with extended treestructured models, in Pattern Recognition (ICPR), 2014 22nd International Conference on IEEE, 2014, pp. 538–543.

21.

Krishnapuram

, Medasani

, Jung

S.-H.

, Choi

Y.-S.

and Balasubramaniam

, Content-based image retrieval based on a fuzzy approach, Knowledge and Data Engineering, IEEE Transactions on16(10) (2004), 1185–1199.

22.

Krishnapuram

, Keller

J.M.

and Ma

, Quantitative analysis of properties and spatial relations of fuzzy image regions, Fuzzy Systems, IEEE Transactions on1(3) (1993), 222–233.

23.

Zadeh

L.A.

, Fuzzy sets, Information and Control8(3) (1965), 338–353.

24.

Liu

, Chai

, Wang

and Liu

, Approaches to the representations and logic operations of fuzzy concepts in the framework of axiomatic fuzzy set theory i, Information Sciences177(4) (2007), 1007–1026.

25.

Liu

and Pedrycz

, The development of fuzzy decision trees in the framework of axiomatic fuzzy set logic, Applied Soft Computing7(1) (2007), 325–342.

26.

Liu

, Wang

and Chai

, The fuzzy clustering analysis based on AFS theory, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on35(5) (2005), 1013–1027.

27.

Liu

and Ren

, Novel artificial intelligent techniques via AFS theory: Feature selection, concept categorization and characteristic description, Applied Soft Computing10(3) (2010), 793–805.

28.

Liu

and Liu

, The framework of axiomatics fuzzy sets based fuzzy classifiers, Journal of Industrial Management Optimization4(3) (2008), 581–609.

29.

Liu

, Pedrycz

, Chai

and Song

, The development of fuzzy rough sets with the use of structures and algebras of axiomatic fuzzy sets, Knowledge and Data Engineering, IEEE Transactions on21(3) (2009), 443–462.

30.

Liu

and Pedrycz

, Axiomatic fuzzy set theory and its applications, Springer, 2009.

31.

Liu

, Wang

and Pedrycz

, Fuzzy clustering with semantic interpretation, Applied Soft Computing26 (2015), 21–30.

32.

Wang

, Ma

, Xu

, Wang

and Liu

, Vehicle routing problem based on a fuzzy customer clustering approach for logistics network optimization, Journal of Intelligent & Fuzzy Systems29 (2015), 1427–1442.

33.

Ren

, Liu

and Cao

, A parsimony fuzzy rule-based classifier using axiomatic fuzzy set theory and support vector machines, Information Sciences181(23) (2011), 5180–5193.