Graph-based semisupervised and manifold learning for image retrieval with SVM-based relevant feedback

Abstract

Over the years, many content-based image retrieval (CBIR) methods, which use SVM-based relevant feedback, are proposed to improve the performance of image retrieval systems. However, the performance of these methods is low due to the following limitations: (1) ignore the unlabeled samples; (2) only exploit the global Euclidean structure and (3) not taking advantage of the various useful aspects of the object. In order to solve the first problem, we propose a graph-based semisupervised learning (GSEL), which can add positive samples and construct balanced sets. With the second problem, we propose a manifold learning for dimensional reduction (MAL), which exploits the geometric properties of the manifold data. With the third problem, we propose a combination of classifiers by aspect (CCA), which exploits the various useful aspects of the object. Experimental results reported in the Corel Photo Gallery (with 31,695 images), which demonstrate the accuracy of our proposed method in improving the performance of the content-based image retrieval system.

Keywords

Content-based image retrieval (CBIR)relevance feedback support vector machines (SVM)Graph-based Semisupervised learning and manifold learning

1 Introduction

Over the years, much attention has been devoted to the field of image-based image retrieval [2 , 51]. Measure, which measures the similarity between the query image and the images in the database [2 , 36] is commonly used in traditional image retrieval systems as Euclidean distance measures. Due to the gap between low-level visual features and high-level semantic concepts, these measures become very inefficient. To reduce the gap between low-level visual features and high-level semantic concepts, the relevant feedback (RF) approach is given priority in the proposed methods [39 , 47]. In RF, the interaction between the user and the image retrieval system is allowed and the purpose of the RF is to obtain positive and negative feedback samples from the user.

Over the past few years, on the basis of different assumptions about positive and negative feedback samples, many methods have been proposed [21 , 51]. The One-Class SVM method only considers the density of positive feedback samples [48]. The two-class SVM examines both positive and negative feedback samples, but not considers equal importance for these two groups [27]. SVM active learning [32] is interested in patterns close to the boundary (for the user making the label) and the samples on similarity the positive side that are the furthest away from the boundary. A constrained similarity measure for image retrieval has been developed by Guo et al. [12]. Based on the defined boundary, this similarity measure uses the Euclidean distance to rank images in the margins according to the query image. In [6], the authors proposed a method that views the positive feedback samples as a group and divides manifold the negative feedback samples into several groups, then the kernel marginal convex machines were developed between a positive group and a few negative groups. The performance of this method has been significantly improved. Several other image retrieval methods that use SVM-based relevance feedback have been proposed [38]. However, their performance needs to be improved. The performance of image retrieval systems using SVM-based relevance feedback, have been proposed in the past, is not as expected due to:

Firstly, methods often ignore unlabeled samples. However, in image retrieval using relevant feedback, the user only labels a few images and does not guarantee that the labels of each response sample are correctly assigned by the user for all the times. Besides that, during the feedback process, the number of samples with negative labels is often much higher than that of positive samples. In addition, the manifold structure of the images can be explored through unlabeled samples. Therefore, the image retrieval method, which is based on the number of user-feedback samples, creates an unstable SVM classifier.

Second, the methods only explore the global Euclidean structure, while the local manifold structure of the low-level visual features is ignored. Global statistics such as variance are often difficult to estimate when the number of samples is insufficient.

Finally,methods only exploit a small number of aspects of the object.An object can consist of many different useful aspects,using a classifier that can eardly represent the various useful aspects of an object.

In image retrieval using relevant feedback, the samples provided by the user are usually very small compared to the dimension of the image feature. This makes the algorithms that can be effective in low-dimensional space becomes meaningless, we have to solve the problem called “curse of dimensionality”. In many problems such as face recognition [33, 34], image classification [26] and image annotation [9, 17], the subspace analysis approach is shown to be effective. This approach retains the specific statistical properties of high-dimensional space in the low-dimensional space. Two algorithms are widely used including principal component analysis(PCA) [30, 50] and linear discriminant analysis (LDA) [8, 28]. The PCA finds a projection on which the variance is maximized. The LDA finds a projection on which samples of the same label are grouped into a cluster, with different label samples located in different clusters. With image retrieval systems using relevant feedback, LDA gives better performance than PCA. However, when applying the LDA to the RF process, it is not feasible to collect large label samples [39]. Both PCA and LDA only consider Euclidean structures, but the low-level visual feature space can be a manifold [41, 42].

In recent years, many manifold learning algorithms have been proposed to explore the manifold structure. The method proposed by He et al. [41], named LPP, to perform locality preserving projections to find a linear approximation of the internal data manifold.

Based on LPP, the image retrieval method proposed in [41, 42] to perform comparison images in the reduced subspace by using a Euclidean metric. The image retrieval method learns a semantic manifold with the user’s preferences being included, called augmented relation embedding (ARE), proposed in [49]. Next, an image retrieval method can discover the local manifold structure through the extreme margin between the negative and positive samples in each local neighborhood, named maximum margin projection (MMP), proposed by [43]. Some methods consider the case when the data lies on or near a nonlinear submanifold in ambient space consisting of locally linear embedding (LLE) [35], Isomap [16] and Laplacian Eigenmaps [24]. The method proposed by Li et al. [17] which preserves the distinguishing information in the encoding by combining subspace learning with the maximum margin principle. These methods estimate both the geometrical and the discriminant properties of the submanifold from random points located on this unknown submanifold. However, these methods only work with data points in the training set, and it does not explicitly give the projection possible for new test points. Besides, these methods only consider geometric properties in a class, while ignoring the relationship of samples from different classes. Therefore, these methods are not effective for image retrieval.

In this paper, we propose a graph-based semisupervised and manifold learning method for image retrieval with SVM-based relevant feedback SEMAL (Graph-based Semisupervised and Manifold Learning). SEMAL based on the observation that 1) a sample that is located in a region with high positive label density, that will have a high probability of carrying a positive label and using graph partition with the Ncut criterion makes that assertion stronger; 2) A training set can have many different useful aspects, we need to have multiple classifiers to represent the different useful aspects; and 3) The global Euclidean structures do not necessarily represent the manifold of the data, we need to look at the local manifold structure of the low-level visual features. Combining these three observations with the SVM model, our proposed method improves the performance of SVM-based relevant feedback for image retrieval. In SEMAL, we chose the majority vote [4] as the similarity measure in a combination of different classifiers. The reason for this is that each different classifier has its own ability in distinguishing relevant and irrelevant images. The main contributions of this paper include:

It proposes a semi-supervised learning algorithm to add positive samples and establish a balanced training set GSEL, which take advantage of the information of the unlabeled samples in the image set.

It solves the overfitting problem of the classification model SVM through the manifold learning algorithm for dimensional reduction (MAL), which exploits the geometric properties of the data manifold. Our proposed algorithm can explore the local ma for the class manifold structure, so it is suitable for image retrieval where the nearest neighbor search is usually included.

It proposes a combination of classifiers by aspect (CCA), which exploits the statistical nature of the aspects of the class.

The rest of the paper is organized as follows. In Part 2, we present GSEL, a graph-based semisupervised learning algorithm for building the balanced training set from labeled and unlabeled samples. In part 3, we present an manifold leaning algorithm (MAL) for dimensionality reduction. Section 4 focuses on presenting a combination of classifiers by aspect and exploring the statistical properties for classification. Next, section 5 describes our proposed image retrieval algorithm (SEMAL). After that, our proposed image retrieval system diagram is presented in section 6. Analyzing our experimental results is described in section 7. Finally, we draw the conclusions in section 8.

2 Graph-based semisupervised learning for building a balanced training set

As mentioned in the previous section, most of SVM-based relevant techniques do not utilize the information of unlabeled samples, although they are very useful in building a strong classifier. Besides that, in the image retrieval with relevant feedback, the number of positive samples is much less than the number of negative samples, which creates an imbalance training sample set. In this section, we propose a graph-based semi-supervised learning technique to add positive samples to the training set in order to obtain a balanced sample set.

Given N points X₁, X₂, . . . , X_N ∈ Rⁿ we construct a nearest neighbor graph G = (X, S) [37] with N nodes, one for each point. These images (nodes) are the result of the previous query. Graph G is also an undirected weighted graph. In graph G, each edge is created by two vertices x_i and x_j carrying a non-negative weight S_ij ⩾ 0.

Assume the weighted adjacency matrix of the graph, constructed above, is the matrix S = (S_ij) _i,j=1,...N. Call kNN (X_i) is the k nearest neighbors of point x_i. If X_i ∈ kNN (X_j) (or X_i ∈ kNN (X_i)) then S_ij = 1. Otherwise, S_ij = 0. Since G is undirected, we require S_ij = S_ji. Assume that there are m points already labeled by the user (including the original query image) LX = { X₁, X₂, . . . , X_N } ∈ Rⁿ and N - m points have not been labeled by the user UX = { X_N-m+1, X_N-m+2, . . . , X_N-m } ∈ Rⁿ. To determine the temporary label of point x_i, where the density of the positive class around that point is high, we construct graph G ^label. In the graph G ^label, its vertices are those of the graph G and its weight matrix is S^label. Let label (x_i) be the label of the point x_i (this label is either relevant or irrelevant). For each point x_i, point set kNN^label (x_i) include its neighboring points that have the same class or not labeled. The reason for this is that we look at the points closest to x_i to be similar to x_i, i.e.,

$\begin{matrix} {kNN}^{label} (x_{i}) = {x | label (x) \\ = = label ({kNN}^{label} (x_{i}) or x \in UX} \end{matrix}$ (1)

We define the following: $S_{ij}^{label} = {\begin{matrix} β, if label (x_{i}) = label (x_{i}) \\ 1, {ifx}_{i} and x_{j} \in UX but x_{i} \in {kNN}^{label} (x_{i}) \\ or x_{j} \in {kNN}^{label} (x_{i}) \\ 0, otherwise \end{matrix}$ (2)

In Equation (2), the value of β is large, it implies that the two images have the same label.

On the graph G ^label, the degree of node x_i ∈ X is defined as follows: $d_{i}^{label} = \sum_{j = 1}^{N} S_{ij}^{label}$ (3)

For each point x_i ∈ UX that has not been labeled, find the point with the highest degree $d_{i}^{label}$ of the neighboring points kNN^label (x_i)and get the label of that point as the temporary label of x_i. Thus, the temporary label of x_i will be the label of x^*, where x is defined as follows: $x^{*} = \underset{x_{j \in {kNN}^{label}} (x_{i})}{argmax} (d_{j}^{label})$ (4)

The reason for this is based on the observation that matrix D^label provides a natural measure on data points. If $d_{i}^{label}$ is large, it implies that the class containing x_i has a high density around x_i. Therefore, the value of $d_{i}^{label}$ is large, x_i becomes more important. This temporary labeling process is illustrated in Fig. 1.

Fig.1

(a) Graph G. Points of the k-nearest neighbors are connected and weighted by 1, (b) On graph G ^label, both labeled and unlabeled points are connected, (c) The graph with point x₆ is temporarily labeled as (+) (because $d_{3}^{label}$ is bigger than $d_{5}^{label}$ ).

The idea of identifying the last label of a point x_i is presented as follows. First, we divide the graph G ^label into two classes: negative and positive class. Then, we check whether the point x_i belongs to any of the two classes, if x_i belongs to the positive class, the last label of point x_i is positive. In the opposite case, we will ignore point x_i.

Here, the criterion used to partition the graph G ^label is Ncut [18]. The reason for using Ncut is because it uses both basic principles for partitioning including minimize the number of between-group connections and maximize the number of within-group connections. Therefore, it produces a more balanced partitions. Besides, we also have methods to find the approximate solution an acceptable time. The following is a brief description of the Ncut criterion and how to find the minimum Ncut value.

On the graph G ^label, to consider inter-class, we define a cut as a set of edges with only one vertex in a class (either positive or negative). $cut (P, N) = \sum_{i \in P, j \in N} {S_{ij}}^{label}$ (5)

Here P ∈ G ^label is a symbol that denotes a positive class and N ∈ G ^label is a symbol that denotes a negative class.

To consider within-class, u’e define a vol (p) as the total weight of edges with at least one endpoint in P. $vol (p) = \sum_{i \in P} {e_{i}}^{label}$ (6)

where, $e_{i}^{label}$ is the edge of the graph G ^label that has at least one endpoint in P.

The Nut criterion is defined as follows: $NCut (P, N) = \frac{cut (P, N)}{vol (P)} + \frac{cut (P, N)}{vol (N)}$ (7)

The problem that finds the minimum Ncut value for a graph is an NP-complete problem. Fortunately, the method proposed by Shi and Malik [18] allow us to find approximations by solving the generalized eigenvalue problem. $(D^{label} - S^{label}) \vec{y} = λ D^{label} \vec{y}$ (8)

Generalized eigen-vector corresponding to the second smallest generalized eigen-value used to partition the graph G ^label. The procedure for determining the formal label is illustrated in Fig. 2.

Fig.2

Graph G ^label is partitioned according to Ncut criterion. x₆ will belong to the positive class because Ncut criterion produces a balanced partition, so x is officially labeled as (+).

Figure 3 below is a graph-based semi-supervised learning algorithm for building a balanced sample set from labeled and unlabeled samples GSEL. The input of the algorithm is a set of images that are the result of a previous query, with m points labeled and N-m points unlabeled. The algorithm can determine the label of some unlabeled points and produce a result set that includes balanced sample set.

Fig.3

The algorithm of Graph-based semisupervised learning (GSEL).

3 Manifold learning for dimensional reduction

In general, the dimension of the vectors that represent the image data is usually very high. In image retrieval using associated feedback, the number of samples provided by the user is often much smaller than the number of dimensions of the feature. SVM learning model in such high space and small sample sizes often yields poor performance. Therefore, to improve the performance of image retrieval using SVM-based relevant feedback, the number of dimensions of the feature vector must be reduced and the used dimensional reduction method should consider the local manifold structure of the low-level visual features. In this section, we present the manifold learning algorithm MAL (Manifold Learning for dimensional reduction) for reducing the dimensionality of the feature.

Let G = (X, S) be an undirected graph with a set of vertices X = {x₁, x₂, . . . x_N} ∈ R^n×N. x_i = (i = 1, . . . , N) is the point that is sampled from the basic submanifold M and they have been labeled. We construct a neighborhood graph G to model the geometric properties of the submanifold. In practice, the dimension of the feature n is very high. The goal of reducing the feature is to find a projection W which transforms the X data from the original high-dimensional space into Y = {y₁, y₂, . . . y_N} ∈ R^N×d in the low-dimensional space d [3, 34]. $y_{i} = W^{T} x_{i} \in R^{d}$ (9) with any x_i ∈ Rⁿ and d << n where W = {w₁, w₂, . . . w_d} ∈ R^N×d.

Assume S = (S_ij) _i,j=1,..N is the adjacent matrix of graph G. Because our goal is to build a graph that represents local neighborhood relations, so the Gaussian similarity function [24] $s (x_{i}, x_{j}) = exp (- \frac{{∥ x_{i} - x_{j} ∥}^{2}}{2 σ^{2}})$ is chosen (here σ controls the width of the neighborhood).

On the graph G, we construct the diagonal matrix D with its entries are row (or column) sums of S, d_i = ∑_js_ji After constructing the diagonal matrix D and the matrix S, we compute the Laplacian matrix L [10] as follows: $L = D - S$ (10)

Suppose the map from a graph to a real line is denoted by Y = [y₁ ; y₂ ; . . . ; y_N] Suppose a linear projection Y = X^TW will give us the low-dimensional vector representations of the vertices of the graph G. In order to obtain the optimal Y, it is necessary to perform the minimum $\sum_{i \neq j} {∥ y_{i} - y_{i} ∥}^{2} S_{ij}$ (11)

In the objective function (11), if the vertices have the same label and neighbors i and j are mapped far apart, the objective function will perform a heavy penalty.

Therefore, the above minimum will to ensure that if the vertex i and j are “close” then y_iand y_jmust also be close [31]. We get $\sum_{i \neq j} {∥ y_{i} - y_{i} ∥}^{2} S_{ij} = Y^{T} LY$ (12)

At this point, the objective function will be as follows. $\begin{matrix} W^{*} = \underset{W^{T} {XDX}^{T} W = d}{argmin} \sum_{i \neq j} {∥ W^{T} x_{i} - W^{T} x_{j} ∥}^{2} S_{ij} \\ = \underset{W^{T} {XDX}^{T} W = d}{argmin} W^{T} {XLX}^{T} W \\ = \underset{W}{argmin} \frac{trace (W^{T} {XLX}^{T} W)}{trace (W^{T} {XDX}^{T} W)} \end{matrix}$ (13)

Here d is a constant and a constraint matrix W^TXDX^TW = d to avoid a trivial solution of the objective function.

The above optimization problem refers to finding the eigenvector of the generalized eigenvalue problem in combination with the smallest eigenvalues. ${XLX}^{T} W = λ {XDX}^{T} W$ (14)

That is, the columns of W are eigenvectors corresponding to the smallest eigenvalues of the generalized eigenvalue problem.

Figure 4 below is the manifold learning algorithm MAL (Manifold Learning for dimensional reduction). The input of the algorithm is a set of n-dimensional images and produces a projection matrixW.

Fig.4

The algorithm of manifold learning for dimensional reduction (MAL).

4 Combine the aspect classifiers for content - based image retrieval

In this section, we first focus on presenting a technique for constructing aspect classifiers of the object. We then describe a combinatorial technique for a generalized classifier that covers various useful aspects of the object.

With the belief that there is no classifier that can represent all the useful aspects of the object. Therefore, in order to represent the various useful aspects of the object, we need to have many classifiers in terms of the aspect. On the basis of aspect classifiers, we construct a generalized classifier that can represent all the various useful aspects of an object. For example, an object is a human, a classifier can be based on aspects: face, gait, and fingerprint. Given the various useful aspects of the object under consideration, these classifiers can be independently trained on a training set of aspects. Then, a generalized classifier will be determined by combining from the existing aspect classifiers. Thus, this process can represent different useful aspects of the object and lead to improved performance of the lookup system. The aspects are determined according to the specific objectives of each problem. In this paper, we chose five aspects, including color moment, color histogram, color correlation, Gabor features and wavelet features.

We expect that a generalized classifier, which is combined with many aspect classifiers with using the majority voting technique [4], yield good results based on the belief that the majority of experts are capable of making the right decision. Thus, if there are m aspect classifiers and more than half of them decide that observation x belongs to class A, then the whole determines that x ∈ A.

Figure 5 below is an algorithm for combining aspect classifiers CCA. The input of the algorithm is a training sample set and the number of aspects of the object. The algorithm results in a generalized classifier.

Fig.5

An algorithm of the combination of classifiers by aspect (CCA).

5 Proposed image retrieval algorithm

In this section, we propose an image retrieval algorithm for improving image retrieval system’s accuracy with relevant feedback. Our proposed algorithm is called SEMAL (Graph-based Semisupervised and Manifold learning). Figure 6 below is the SEMAL algorithm.

Fig.6

Algorithm of graph-based semisupervised and manifold learning for image retrieval (SEMAL).

The SEMAL algorithm is implemented as follows: Each image in S is represented as a point in a high-dimensional feature space. When a user submits initial query Q via a query-by-example, the algorithm uses the same procedure as for the database image to represent the image Q into a point in a high-dimensional feature space. Initial query Retrieval_Init(Q, S, N > is performed (Step 1), where S is the database image set and N is the number of images that are retrieved for each iteration. The result of the initial query is assigned to the Result_init(Q). On the result set Result_init(Q) which is returned by the initialization query, the user performs the feedback on a graphical interface via the function Feedback(Result_init(Q)) to get the relevant image and save it to the set R Relevant(Q,m) (Step 2.1). The process that adds positive samples and builds balanced training set is done through the function GSEL(S,Relevant(Q,m),TS) (Step 2.2). The Take_Aspect(TS) function (Step 2.3) performs the separation of the aspects of the set TS to obtain the set Aspect_i. With each set of aspects i, the MAL procedure will learn a projection W to perform a dimensional reduction on the set Aspect_i and we will get the set reduced_Aspect₁ (Step 2.4). With each aspect set Aspect_i (i = 1,...,k), CCA(reduced_Aspect₁,..reduced_Aspect_k,C^*) (Step 2.5) creates a aspect classifier C^* (reduced_Aspect_i) and combining these classifiers, we will get a generalized classifier C^*. Based on the generalized classifier C^*. The Retrieval(C^*,S,N) function (Step 2.6) will rank the images in the set S and take out the N images at the top to assign them to R. This process is repeated until the user satisfies thedemand.

6 The content-based image retrieval system

Figure 7 shows the schema of the proposed image retrieval system, named SEMAL. The first stage, when a query image is given by the user, low-level visual features are extracted. Next, the system will calculate a similar measure for a distance function. Images are sorted by this measure and it returns a set of images at the top for the user.

Fig.7

The image retrieval system flow chart (SEMAL).

The second stage, users label the images at the top as negative samples or positive samples. The system will obtain a small training set, in which the number of samples is greater than the number of positive samples. Thus, in order to create a large and balanced training set, the system implements the block “Semi-supervised learning for balancing the training set”. Based on this training set, the system generates k training sets by aspect (Here, we choose two aspects of color and texture). With each training set by aspect i (i = 1, 2, \dots k), The system reduces the number of dimensions of the training set by aspect i via the manifold learning algorithm. We get the training set by aspect, which the dimension is small. With each training set by aspect, SVM model is applied to obtain the aspect classifier. We get k aspect classifiers. The system combines k classifiers to obtain a generalized classifier. Apply the generic classifier to the image retrieval process, the similarity of the images can be updated. After that, all images are reordered based on similarity that has just been updated. If the user is not satisfied with the result, the relevance feedback process will be repeated.

This system implements the SEMAL algorithm in Fig. 6. In the system, images are represented by two main features, including color [1, 25; 45] and texture [13]. Color features are the most important feature for image retrieval because this feature is not sensitive to scaling, rotation and distortion [25]. Texture features are another important signal for image retrieval. Details of these two features are presentedbelow.

Color information: We use three features including color moment, color histogram and color correlation. First color features we extracted two moments include color mean, color Standard Deviation, in each color component (H, S, and V respectively), so the number of dimensions of the color moment is 6 dimensional. The second color feature is the HSV color histogram, which consists of 32 dimensions (8 * 2 * 2). The H component is quantized into 8 bin. Both S and V components are quantized into 2 bin. The final color feature is color correlation, it uses RGB space. The number of dimensions of this feature is 64 (4 * 4 * 4). These three types of visual color features are used to represent the color information in ourexperiments.

Texture: We extract classic global texture descriptions such as Gabor features and wavelet features. Gabor features include Mean-squared energy and meanAmplitude for 4 levels, 6 directions for the grayscale image; The number of dimensions of the wavalet feature is 40, which consists of two wavelet moments as the mean, standard deviation.

In summary, these features are combined into a feature vector of 190 values (6 + 32 + 64 + 40 + 48 = 190). After that, all the feature components are normalized into Normal distributions with zero mean and standard deviation of one to represent the images.

7 Experiments

In this experiment, we use the Corel Photo Gallery, which contains 31,695 images [11]. Figure 8 shows some samples in this database.

Fig.8

Some samples in the image database.

In the Corel Photo Gallery, each folder contains 100 images and ground truth consist of 320 concepts.

The feedback process in the experiment is automatically simulated by the computer program. This means that an image in the returned image set is labeled as positive or negative depending on whether it has the same concept as the query image (in the ground truth). First, the system randomly selects 500 images from the image Set as query images. Then, with each query, 80 images are returned. Next, the computer automatically assigns positive or negative labels to each of these 80 images. In the returned image set, we chose 80 images because a system usually requires users to mark three to four screenshots of images during the current lookup. While, with some users, they want to just label a small number of feedback samples and expect to receive the best results.

In this paper, precision is used to evaluate the performance of the relevant feedback algorithm. Precision is the percentage of images in N images that are retrieved at the top. Precise curves are the average precision values of 500 queries. We use the precision curve to evaluate the precision of a given algorithm.

7.1 Performance of GSEL

Figure 9 shows the accuracy values of O-SVM (Original SVM) [22], the SVM-MSMOTE method (SVM with Modified Synthetic Minority Oversampling Technique) [23] and the SVM-GSEL method (combine SVM with graph-based semisupervised learning). Experimental results show that as the number of loops increases, the performance of the original SVM method is greatly affected. The reason for this is that as the number of loops increases, the number of samples increases rapidly and thus, resulting in an imbalance of the training set. In addition, the SVM-MSMOTE method is higher than the O-SVM method but the precision is lower than the SVM-GSEL method. The reason for this is that the SVM-MSMOTE method uses artificial positive samples, which do not properly represent the context of the data space.

Fig.9

The precision of O-SVM and SVM-GSEL.

From this experiment, we can see that the SVM-GSEL algorithm outperforms the original SVM method. The precision curve of the SVM-GSEL algorithm is higher than that of the O-SVM.

7.2 Performance of MAL

Figure 10 shows the accuracy values of O-SVM (original SVM) [22] and the SVM-MAL method (combine manifold learning for dimensional reduction with SVM). The results show that as the number of loops increases, the performance of both methods increases. However, when the number of loops is small (from 1 to 2), the performance of the O-SVM method is affected. The reason for this is that when the number of loops is small, the number of samples is too small compared to the number of dimensions of the feature, the O-SVM classifier is unstable.

Fig.10

The precision of O-SVM and SVM-MAL.

From this experiment, we can see that the SVM-MAL algorithm performs better than the original SVM method. The precision curve of the SVM-MAL algorithm is higher than that of the O-SVM.

7.3 Performance of CCA

This experiment evaluates the performance of SVM-CCA (combination of classifiers by aspect with SVM) versus O-SVM (original SVM). We select the number of aspect k = 5 (including color moment, color histogram, color correlation, Gabor features and wavelet features) for SVM-CCA. Experimental results show that as the number of aspects increases, the O-SVM method is not affected, but the SVM-CCA increases rapidly in number of aspects. The reason for this is that the generalized classifier exploits the various useful aspects of the object.

The results in Fig. 11 show that the performance of the SVM-CCA algorithm is increasing rapidly.

Fig.11

The precision of O-SVM and SVM-CCA.

7.4 The overall performance of SEMAL

Our proposed approach was compared with four other methods, including the O-SVM (the original SVM-based relevant feedback) [22], DSSA (the discriminative semantic subspace analysis) [21], DMINTIR (discriminative multi-view interactive image re-ranking) [17] and HMR (heterogeneous manifold ranking) [19]. We evaluate all algorithms on 5 loops. The precision curves are shown in Fig. 12. From this figure, we see that the SMAL method performs better than the other four, O-SVM, DSSA, DMINTIR and HMR.

In Fig. 12, we find that the performance of the DSSA method is higher than the O-SVM, because it is possible to learn a semantic subspace from similar and dissimilar pairwise constraints without using class label information. However, according to empirical results, DSSA’s performance is worse than DMINTIR, because it does not exploit the different viewing angles of the object. From Fig. 12, we see that the performance of DMINTIR is slightly lower than that of HMR because HMR exploits the locality of the data manifold. SEMAL takes advantage of semi-supervised learning for balancing the training set, manifold learning for dimensional reduction, various useful aspects of the object. Therefore, it gives the highest results.

Fig.12

Performance of the five methods.

8 Conclusions

Content-based image retrieval, which uses SVM-based relevant feedback, has been widely used to reduce the semantic gap and improve the accuracy of the content-based retrieval system. However, this approach has three limitations as follows. First, based only on the labels provided by the user to build the training set, thus generating an unstable SVM classifier. Second, bypass the local manifold structure of visual features of the image. Finally, do not fully exploit the various useful aspects of the object. In this paper, we proposed solutions based on the argument that if a sample is located in a region with a high positive-label density and it is in the positive class of the Ncut criterion, that point will be labeled positive; A training set may have many different useful aspects, so we need more classifiers to represent those aspects and; The global Euclidean structures do not fully reflect the manifold of the data, we need to look at the local manifold structure. We have proposed the SEMAL algorithm to improve the efficiency of the image retrieval system with RF. Our proposed method can: (1) utilize the information of unlabeled samples to create good training sample set; (2) exploit the geometric properties of the data manifold for reducing the dimensionality of the feature and (3) Takes advantage of the various useful aspects of the object to enhance classification accuracy. Experiments using the Corel Photo Gallery, which includes 31,695 labeled images, demonstrate that our proposed SEMAL method outperforms better than some established CBIR RF methods in terms of accuracy.

Footnotes

Acknowledgments

This research was supported in part by a research project titled “Improve the efficiency of image retrieval via distance metric learning” and Code VAST01.07/19-20. The authors also expressed their gratitude for the reviewers, their comments have improved the quality of the paper.

References

Jain

and Vailaya

, Image Retrieval Using Color and Shape, Pattern Recognition 29(8) (1996), 1233–1244.

Smeulders

A.W.M.

, Worring

, Santini

, Gupta

and Jain

, Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12) (2000), 1349–1380.

Wang

, Zhao

, He

, Chen

and Bu

, Image retrieval using nonlinear manifold embedding, Neurocomputing 72 , (16–18) (2009), 3922–3929.

Bahler

and Navarro

, Methods for Combining Heteroge-neous Sets of Classifiers. Proc. 17th Nat’l Conf. Am. Assoc. for Artificial Intelligence, (2000).

Swets

D.L.

and Weng

, Using Discriminant Eigenfeatures for Image Retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence 18(8) (1996), 831–836.

Tao

, Tang

, Li

and Rui

, Direct kernel biased discriminant analysis: A new content-based image retrieval relevance feedback algorithm, IEEE Transactions on Multimedia 8(4) (2006), 716–727.

Tao

, Tang

, Li

and Wu

, Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7) (2006), 1088–1099.

Tao

, Li

, Wu

and Maybank

S.J.

, Geometric mean for sub-space selection, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2) (2009), 260–274.

Zhang

, Islam

M.M.

and Lu

, A review on automatic image annotation techniques, Pattern Recognition 45(1) (2012), 346–362.

10.

Chung

F.R.K.

, Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. AMS, (1997).

11.

Carneiro

, Chan

A.B.

, Moreno

P.J.

and Vasconcelos

, Supervised Learning of Semantic Classes for Image Annotation and Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3) (2007), 394–410.

12.

Guo

, Jain

A.K.

, Ma

and Zhang

, Learning similarity mea-sure for natural image retrieval with relevance feedback, IEEE Trans Neural Netw 12(4) (2002), 811–820.

13.

Tamura

, Mori

and Yamawaki

, Texture Features Corresponding to Visual Perception, IEEE Trans. Systems, Man, and Cybernetics 8(6) (1978), 460–473.

14.

Huu

Q.N.

, Thuy

Q.D.T.

, Van

C.P.

, Van

C.N.

and Quoc

T.N.

, An efficient image retrieval method using adaptive weights,, Appl Intell (2018).

15.

Chen

, Shan

, Zhao

, Chen

and Gao

, Matti Pietikainen, WLD: A robust descriptor based on Weber’s law, IEEE Trans Pattern Anal Mach Intell 32(9) (2010), 1705–1720.

16.

Tenenbaum

, Silva

and Langford

, A global geometric framework for nonlinear dimensionality reduction, , Science 290 (2000), 2319–2323.

17.

, Xu

, Yang

, Sun

and Tao

, Discriminative multi-view interactive image re-ranking,”, IEEE Trans Image Process 26(7) (2017), 3113–3127.

18.

Shi

and Malik

, Normalized cuts and image segmentation, IEEE Trans Pattern Anal Mach Intell 22(8) (2000), 888–905.

19.

Jun

, He

, Guo

Xiangnan

, Zhang

Yujia

and Zhao

, Heterogeneous Manifold Ranking for Image Retrieval”, , Access IEEE 5 (2017), 16871–16884.

20.

Shao

, Zhu

and Li

, Transfer learning for visual categorization: A survey, IEEE Transactions on Neural Networks and Learning Systems 26(5) (2015), 1019–1034.

21.

Zhang

, Shum

H.P.H.

and Shao

, Discriminative semantic subspace analysis for relevance feedback,”, IEEE Transactions on Image Processing 25(3) (2016), 1275–1287.

22.

Zhang

, Lin

and Zhang

, Support Vector Machine Learning for Image Retrieval,”, Proc. IEEE Int’l Conf. Image Processing (2001), pp.721–724.

23.

Nanni

, Fantozzi

and Lazzarini

, Coupling different methods for overcoming the class imbalance problem, Neurocomputing (2015), 48–61.

24.

Belkin

and Niyogi

, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing System 14 (2001), 585–591.

25.

Swain

and Ballard

, Color Indexing, Int’l J. Computer Vision 7(1) (1991), 11–32.

26.

, Shao

, Zhen

and He

, Local feature discriminant projection,”, IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99) (2015).

27.

Hong

, Tian

and Huang

T.S.

, Incorporate support vector machines to content-based image retrieval with relevance feedback, inProceedings of the IEEE International Conference on Image Processing (2000), pp. 750–753.

28.

Fisher

R.A.

, The use of multiple measurements in taxonomic problems,”, Annals of Human Genetics 7(2) (1936), 179–188.

29.

Datta

, Joshi

, Li

and Wang

J.Z.

, Image retrieval: Ideas, influences, and trends of the new age., ACM Computing Surveys 40(2) (2008), 1–60.

30.

Duda

R.O.

, Hart

P.E.

and Stork

D.G.

, Pattern Classification, second ed. Wiley-Interscience, (2000).

31.

Guattery

and Miller

G.L.

, Graph embeddings and laplacian eigenvalues, SIAM Journal on Matrix Analysis and Applications 21(3) (2000), 703–723.

32.

Tong

and Chang

, Support Vector Machine Active Learning for Image Retrieval,”, Proc. ACM Int’l Conf. Multimedia (2001), pp.107–118.

33.

, Tao

and Geng

, Bregman divergence-based regularization for transfer subspace learning,”, IEEE Transactions on Knowledge and Data Engineering 22(7) (2010), 929–942.

34.

Yan

, Xu

, Zhang

, Yang

, Zhang

and Lin

, Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Trans Pattern Anal Mach Intell 29(1) (2007), 40–51.

35.

Roweis

and Saul

, Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500) (2000) 2323–2326.

36.

Thuy

Q.D.T.

, Huu

Q.N.

, Van

C.P.

and Quoc

T.N.

, An efficient semantic—related image retrieval method. , Exp Syst Appl 72 (2017), 30–41.

37.

von Luxburg

, A Tutorial on Spectral Clustering, Statistics and Computing 17 (2007), 395–416.

38.

Zhou

and Huang

T.S.

, Relevance Feedback for Image Retrieval: A Comprehensive Review, ACM Multimedia Systems J 8(6) (2003), 536–544.

39.

Zhou

X.S.

and Huang

T.S.

, Relevance feedback in image retrieval: A comprehensive review,”, Multimedia Systems 8(6) (2003), 536–544.

40.

, Ma

W.-Y.

and Zhang

H.-J.

, Learning an image manifold for retrieval, in: Proceedings of the ACM Conference on Multimedia, New York, (2004).

41.

, Incremental semi-supervised subspace learning for image retrieval, in: Proceedings of the ACM Conference on Multimedia, New York, (2004).

42.

, Yan

, Hu

, Niyogi

and Zhang

H.-J.

, Face recognition using Laplacian faces, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3) (2005), 328–340.

43.

, Cai

and Han

, Learning a maximum margin subspace for image retrieval, IEEE Transactions on Knowledge and Data Engineering 20(2) (2008).

44.

and Niyogi

, Locality preserving projections,”, in Advances in Neural Information Processing Systems (2004).

45.

Niblack

, Barber

, Equitz

, Flickner

, Glasman

, Petkovic

, Yanker

, Faloutsos

and Taubino

, The QBIC Project: Querying Images by Content Using Color, Texture, and Shape. Proc. SPIE Storage and Retrieval for Images and Video Databases, (1993), pp.173–181.

46.

Rui

, Huang

, Ortega

and Mehrotra

, Relevance feedback: A power tool in interactive content-based image retrieval, IEEE Trans Circuits Syst Video Technol 8(5) (1998), 644–655.

47.

Rui

, Huang

and Chang

, Image retrieval: Current techniques, promising directions and open issues, J Vis Commun Image Repre-sent 10(1) (1999), 1–23.

48.

Chen

, Zhou

X.S.

and Huang

T.S.

, One-class svm for learning in image retrieval, inProceedings of IEEE International Conference on Image Processing (2001), pp. 34–37.

49.

Lin

Y.Y.

, Liu

T.-L.

and Chen

H.-T.

, Semantic Manifold Learning for Image Retrieval,”, Proc. 13th Ann. ACM Int’l Conf. Multimedia (Multimedia ’05) (2005).

50.

, Li

and Zhang

H.-J.

, Extraction of Feature Subspace for Content-Based Retrieval Using Relevance Feedback. Proc. Ninth Ann. ACM Int’l Conf. Multimedia, (2001).

51.

Zhang

, Lai

, Xu

, Shao

, Wu

and Xie

, Discriminative Elastic-Net Regularized Linear Regression”, IEEE T-IP 26(3) (2017), 1466–1481.