Semi-supervised LDA pedestrian re-identification algorithm based on K-nearest neighbor resampling

Abstract

Person re-identification identify a specific person in surveillance network by similarity measurement between images of different camera views. However, existing metric learning based methods suffer from over-fitting problem. To solve this problem, a resampled linear discriminant analysis (LDA) method was proposed based on the statistical and topological characteristics of pedestrian images. This method utilized the k-nearest neighbours to form potential positive sample pairs. The potential positive pairs are used to improve the metric model and generalize the metric model to the test data. By minimizing the inter-class divergence of potential positive sample pairs, a semi-supervised re-sampling LDA person re-identification algorithm was established. It was then tested on the VIPeR, CUHK01 and Market 1501datasets. The results show that the proposed method achieves the best performance compared to some available methods. Especially, the proposed method outplays the best comparison method by 0.6% and 5.76% at rank-1 identification rate on the VIPeR and CUHK01 datasets respectively. At the same time, the improved LDA algorithm has improved the rank-1 identification accuracy of traditional LDA method by 9.36% and 32.11% on these two datasets respectively. Besides, the proposed method is limited to Market-1501 dataset when the test data is of large size.

Keywords

Person re-identification measurement model LDA semi-supervised learning

1 Introduction

With the rapid development of computer vision technology and the construction of video surveillance networks, the technology of person re-identification, as an emerging hot topic, has attracted more and more researchers to invest efforts on it [1 –5]. In fact, the task of re-identification of pedestrians is to identify pedestrians who appear in different locations or at different times in non-overlapping surveillance networks. Compared with traditional image recognition tasks, the appearance features of pedestrian images captured by suffer from drastic variations caused by multiple factors such as light, posture, shooting angle and background changes et al. [2, 3]. Its application scenarios are more complex, as shown in Fig. 1, existing metric learning based methods learn a similarity measurement model by enforcing the positive samples close to the test individual and the negative samples far away the test individual. However, the negative pairs are different from degree of similarity to the test individual. Existing metric leaning based methods should take the degree of similarity of negative samples into consideration. Otherwise, the metric subspace will be curved and weak in similarity measurement.

Fig. 1

Metric learning for person re-identification.

Person re-identification mainly includes 3 steps: feature extraction, metric model training and similarity measurement. Finally, an image feature expression model with strong discrimination and robustness are obtained. Feature extraction is the key point of person re-identification task. Because the image resolution of a pedestrian is low, traditional methods based on pose and face recognition cannot be applied to person re-identification task. Instead, the color and texture information of the image is used to build a feature model. Traditional color and texture feature extraction techniques, such as LBP, Gabror operator, Schmid operator, and color histogram, have been widely used in early person re-identification research. However, these traditional handcrafted features have poor robustness, resulting in a very low recognition rate. Zhang et al. [4] proposed a feature model based on the Codebook method, which converts image pixel features into visual vocabulary features. It effectively improves the robustness of features. On the basis of salient color names based color descriptor (SCNCD), Yang Y et al. [6] proposed to describe the human re-identification problem as a color distribution matching problem. This method significantly improved the recognition accuracy of person re-identification. Liao et al. [7] proposed a local maximum occurrence (LOMO) feature representation method, which combines the color feature and texture feature of the image, and effectively improved the state-of-the-art rank-1 identification rates recognition rate on the person re-identification dataset.

Although people have proposed effective and improved appearance feature models for person re-identification tasks for many years, the accuracy of person re-identification is still at a low level due to the complexity of the dramatic changes in the appearance of pedestrians during the process of crossing the camera views. In order to overcome the interference of these factors, a person re-identification method based on transformation has been proposed. Person re-identification methods based on image brightness transfer functions (brightness transfer functions) [8, 9] solve the interference caused by changes in light brightness in the external environment by constraining the cumulative histogram of image brightness of the same target. However, in real life, the changes in the light environment of pedestrian images are irregular. Human body structure information is another important content of pedestrian images. The use of human body structure information can effectively reduce the drastic changes in appearance features caused by the misalignment of human body structure [10]. Considering the symmetry of the human body structure, Franzen et al. [11] proposed the Symmetry-Driven Accumulation of Local Features (SDALF) model to extract the features of human body. However, the improved person re-identification algorithm based on the transformation model has very limited improvement in the recognition accuracy, and the complex high-dimensional features also have higher requirements for the measurement model. Therefore, in the follow-up research work, the establishment of similarity distance measurement model has received extensive attention, and researchers have carried out in-depth research on the person re-identification algorithm based on the metric model.

Through the research of pedestrian re-identification algorithms based on metric learning, many algorithms have been proposed, such as RDC [12], ITML [13], KISSME [14], XQDA [7], and NFST [15]. The RDC algorithm is a person re-identification algorithm based on the metric projection space method proposed by Zheng et al. This method learns a projection subspace so that the distance between the positive sample pair is smaller than the distance of the negative sample in this subspace. In the RDC algorithm, the pedestrian re-identification problem is transformed into a relative distance comparison problem, and an optimization objective function is established, meanwhile, through derivation, an iterative optimization projection space solution method is designed. In addition, Zheng et al. proposed the PRDC algorithm by improving the RDC algorithm through a probability model based on the RDC algorithm, which effectively enhanced the recognition accuracy. M. Kostinger et al. [14] established a metric model for person re-identification from the perspective of the overall distribution of positive and negative samples. The algorithm assumes that the populations of positive and negative samples obey normal distributions with different parameters, and establishes a probability model for the distribution of positive samples and negative samples. Then, according to the probability density functions of the two distributions, a statistical inference model is established to learn the distance measurement model for person recognition. From the perspective of statistical inference, Du Yuning et al. [16] proposed a new person re-identification algorithm based on statistical inference, and under the verification of sample training experiments, the performance of this algorithm is better than the existing pedestrian recognition algorithms. Combining the two ideas of the projection subspace learning method and the Mahalanobis distance learning method, Liao et al. proposed a novel method which learned a Mahalanobis distance in the projection space for person recognition.They first learned a.

Qi Ji [17] et al. proposed a metric learning algorithm based on the distance between geometries on the basis of the existing metric model algorithm research. The algorithm started from the appearance features of the sample and combines the RDC algorithm to improve the measurement model. Through a large number of comparative experiments, the effectiveness of the algorithm is verified. Chen Feng [18] and others introduced the idea of semi-supervision to improve KISSME. It established an unlabeled training dataset to generate potential image pairs which then used to for unsupervised weight learning for metric model of KISSME method. Zhang et al. [15] proposed to solve the problem of person re-identification by learning the discriminative zero space of training samples, and the NFST algorithm model adopted would be relatively simple. It learned a projection subspace by which the samples of each class collapsed to a point. The NFST method learned the projection pattern by enforce within-class scatter equal to zero in the subspace. At the same time, the NFST algorithm was nucleated to further improve the recognition accuracy of the algorithm. Using the feature fusion method, the accuracy of person re-identification has been greatly improved. Although through years of research, the identification accuracy of the person re-identification method based on metric learning has been significantly improved, but in practice, the appearance features of pedestrians crossing the camera views are very complicated, and the overall sample size is huge. The existing data contains only a small part of the feature changes, so that the metric model learned from the training sample overfits the training data, resulting in unsatisfactory measurement results of the test data.

Deep learning based methods are now the mainstream algorithm for person re-identification. It reaches great identification accuracy on this task and many methods have been proposed [17, 18]. However, the deep learning based method requires a lot of training data which needs a lot of manpower to label. In this paper, we study the metric learning based methods and try to design a metric learning based method to catch up with the performance of deep learning models.

In this paper, based on the existing research results, the person re-identification algorithm based on metric learning is studied. Aiming at the over-fitting problem in person re-identification task, an improved algorithm is proposed, and a metric learning model with stronger generalization ability is designed. The main contributions of the algorithm in this paper are as follows:

Based on the linear discriminant analysis algorithm, a metric projection space learning algorithm with strong discrimination ability is proposed. This method establishes the objective function and solves the projection matrix by constraining the positive sample pair divergence to be smaller than the negative sample pair divergence;

The semi-supervised learning algorithm is introduced, and a semi-supervised linear discriminant analysis metric learning method is proposed. Different from the existing metric learning method, this method improves the learning of the projection matrix through the semi-supervised learning method, and enhances the generalization ability of the algorithm;

Through the tests on VIPeR, CUHK01 and Market1501 two public datasets, the effectiveness of the proposed method and the reliability of the recognition accuracy are fully verified, which provides a useful reference for similar research.

2 Pedestrian re-identification based on linear discriminant algorithm

2.1 Basic model of pedestrian re-identification

The main content of the research on pedestrian re-identification is to match the pedestrian images captured by different cameras. Given data sets ${x_{i}^{p}}$ and ${x_{j}^{g}}$ , represent the pedestrian samples taken by cameras A and B, respectively $x_{i}^{p}$ denotes the feature vector of the i-th sample by camera A, $x_{j}^{g}$ denotes the feature vector of the j-th sample by camera B. In this paper, the pedestrian re-identification problem is defined as a sample classification problem, assuming that $l_{i}^{p}$ represents the category label of the i-th person in ${x_{i}^{p}}$ ; and $l_{j}^{g}$ represents the category label of the j-th person in ${x_{j}^{g}}$ . Therefore, when $l_{i}^{p} = l_{j}^{g}$ , it means that $x_{i}^{p}, x_{j}^{g}$ corresponds to the same pedestrian, then $(x_{i}^{p}, x_{j}^{g})$ means a positive sample pair; on the contrary, when $l_{i}^{p} \neq l_{j}^{g}$ , means $x_{i}^{p}, x_{j}^{g}$ corresponds to different pedestrians, then $(x_{i}^{p}, x_{j}^{g})$ means a negative sample pair. Therefore, the problem of pedestrian re-identification is transformed into a two-classification problem of pedestrian image sample pairs.

Next, we calculate the similarity between image sample pairs by defining a metric model. The measurement model defined in this paper is shown in Equation (1):

$d (x_{i}^{p}, x_{j}^{g}) = (x_{i}^{p} - x_{j}^{g})^{T} W^{T} W (x_{i}^{p} - x_{j}^{g}),$ (1) where W is the projection matrix of the metric subspace. The purpose of establishing the metric learning model in this paper is to learn an effective metric subspace, and project the original data into the metric subspace to calculate the similarity of the samples, so as to achieve more accurate measurement between pedestrian images and make the distance of the positive sample pairs to be identified smaller than the distance of the negative sample.

Let $y_{ij} = (x_{i}^{p} - x_{j}^{g})$ . If $l_{i}^{p} = l_{j}^{g}$ , then y_ij is the difference vector of positive sample pair; otherwise, $l_{i}^{p} \neq l_{j}^{g}$ , then y_ij is the difference vector of negative sample pair. Define $Ω_{p} = {y_{ij} | l_{i}^{p} = l_{j}^{g}}$ denote the set of the difference vectors of positive sample pairs. Let $Ω_{n} = {y_{ij} | l_{i}^{p} \neq l_{j}^{g}}$ denote the set of the difference vectors of negative sample pairs. Let Ω denote the set of all difference vectors of sample pairs. Thus, the relative distance function between positive and negative samples is established, as shown in Equation (2): $C (w) = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{1}} \sum_{s = 1}^{n_{2}} (∥ d_{w} (y_{ij}) - d_{w} (y_{is}) ∥_{2}^{2})$ (2)

Where y_ij ∈ Ω_p, y_is ∈ Ω_n, n denotes the number of image samples in camera A. n₁ denotes the number of positive image samples in camera B. n₂ denotes the number of negative image samples in camera B.

As shown in Equation (2), the purpose of the metric learning model is to learn the distance metric function d_w (·), so that the positive sample distance of each pedestrian in the training data is smaller than its negative sample distance. However, due to the complexity of the features of the pedestrian image samples, starting from the average distance between the positive and negative samples in the training data, we establish an optimization model for metric learning as in (3):

$\begin{matrix} min C (w) = & (\frac{1}{N_{+}} \sum ∥ d_{w} (y_{ij}) ∥_{2}^{2} \\ - \frac{1}{N_{-}} \sum ∥ d_{w} (y_{is}) ∥_{2}^{2}) \end{matrix}$ (3) where N₊ represents the number of positive sample pairs, and N_- represents the number of negative sample pairs. Aiming at the above optimization problem, this article transforms it into a data classification problem based on LDA linear discriminant analysis to establish a pedestrian re-identification metric learning algorithm. The content of the algorithm is introduced in section 2.2.

2.2 Pedestrian re-identification method based on linear discriminant analysis

Linear discriminant analysis is a classic metric learning method for metric learning. The idea of this algorithm is to project the sample data from the high-dimensional original feature space to the feature subspace with stronger discrimination power. The algorithm uses the prior information of the sample category label to extract it from the original feature space to achieve feature dimension reduction.

Given the pedestrian sample set {y_ij|y_ij ∈ R^m}, where y_ij is the feature difference vector of the sample pair composed of the i-th image captured by camera A and the j-th image sample captured by camera B, where m represents the feature vector dimension of the sample. In the task of person re-identification, we divide all sample pairs into 2 categories, positive sample pairs and negative sample pairs. Define the inter-class divergence and intra-class divergence of the sample set {y_ij} as follows:

$\begin{matrix} S_{b} & = \frac{1}{n} \sum_{i = 1}^{2} n_{i} (u_{i} - u) (u_{i} - u)^{T} \\ S_{w} & = {\frac{1}{n}}_{1} \sum_{y_{ij} \in Ω_{p}} (y_{ij} - u_{1}) (y_{ij} - u_{1})^{T} + \\ \frac{1}{n_{2}} \sum_{y_{ik} \in Ω_{n}} (y_{ik} - u_{2}) (y_{ik} - u_{2})^{T} \end{matrix}$ (4)

Where S_b represents the inter-class divergence and S_w represents the intra-class divergence. u₁ represents the sample mean of a positive sample; u₂ represents the sample mean of a negative sample; u represents the overall mean of the sample. n denotes the number of all sample pairs, n₁ represents the number of positive sample pairs, and n₂ represents the number of negative sample pairs. Therefore n₁ + n₂ = n,

$\begin{matrix} u_{1} & = \frac{1}{n_{1}} \sum_{y_{ij} \in Ω_{p}} y_{ij} \\ u_{2} & = \frac{1}{n_{2}} \sum_{y_{ik} \in Ω_{n}} y_{ik} \\ u & = \frac{1}{n} \sum_{y_{ik} \in Ω} y_{ij} \end{matrix}$ (5)

According to the definition of the inter-class divergence in Equation (4), it can be seen that the inter-class divergence reflects the difference between different classes of the population, and the intra-class divergence reflects the difference between the same class. According to the distribution hypothesis of the positive sample to the population and the negative sample population in the literature [7], it can be known that Ω_p and Ω_n respectively obey the zero-mean Gaussian distribution with different parameters, and u₁ = u₂ = u = 0, then the Equation (4) can be rewritten as:

$\begin{matrix} S_{b} & = 0 \\ S_{w} & = \sum_{y_{ij} \in Ω_{p}} y_{ij} y_{ij}^{T} + \sum_{y_{ij} \in Ω_{n}} y_{ij} y_{ij}^{T} \end{matrix}$ (6)

Based on the distribution assumption in the literature [7], using the above method to establish the optimization model of projection space learning, the positive and negative samples obtained by Equation (6) have a divergence of 0 for the population, and the population of positive and negative samples is as small as possible. It is contradictory to the optimization goal. Therefore, this paper establishes a metric learning model by minimizing the overall distribution divergence of positive samples and maximizing the overall divergence of negative samples. First, define the calculation method of the overall divergence of positive and negative samples as shown in Equation (7):

$\begin{matrix} S_{p} & = \sum_{y_{ij} \in Ω_{p}} y_{ij} (y_{ij})^{T} \\ S_{n} & = \sum_{y_{ik} \in Ω_{n}} y_{ik} (y_{ik})^{T} \end{matrix}$ (7)

Aiming at the problem of person re-identification, the purpose of the metric learning model is to make the original sample, in the metric projection space, the population of positive samples as small as possible, and the population of negative samples as large as possible. According to Equation (7), the divergence expression of the population of positive and negative samples in the projection space is obtained, as shown in Equation (8):

$\begin{matrix} S_{p} (w) & = \sum_{y_{ij} \in Ω_{p}} w^{T} y_{ij} (y_{ij})^{T} w = w^{T} S_{p} w \\ S_{n} (w) & = \sum_{y_{ij} \in Ω_{n}} w^{T} y_{ij} (y_{ij})^{T} w = w^{T} S_{n} w \end{matrix}$ (8)

According to the idea of linear discriminant analysis algorithm, construct the Fisher discriminant criterion function, as shown in Equation (9):

$max J (w) = \frac{w^{T} S_{n} w}{w^{T} S_{p} w}$ (9)

For the Fisher discriminant function in the above Equation (9), we can clearly see that the goal of the optimization model is to calculate the projection subspace so that the numerator w^TS_pw in the objective function is as large as possible, and the denominator w^TS_nw is as small as possible. Since Equation (9) is very difficult to solve, it is transformed into its equivalent form as follows:

$max J (w) = w^{T} S_{n} w s . t . w^{T} S_{p} w = 1$ (10)

3 Generalization weighted linear discrimination analysis method for person re-identification

The existing person re-identification method learns a metric projection space or Mahalanobis distance through a supervised metric learning model, thereby learning a metric subspace with stronger discriminative ability. However, due to the dramatic changes in the appearance characteristics of the person re-identification sample image and the problem of small samples, the generalization ability of the commonly obtained metric model is weak, that is, the performance of the training data and the test data in the metric space is quite different. In response to this problem, this chapter proposes a person re-identification metric model based on improved linear discriminant analysis.

The linear discriminant analysis algorithm establishes a metric learning model for the person re-identification problem. It converts the Fisher discriminant criterion function into a solvable objective function by constraining the positive sample to the minimum overall divergence. However, due to the small sample problem, the sample feature becomes a high-dimensional vector, which makes the above optimization model an under-determined problem, which makes the metric model after training over-fitting. Literature [18] improves the discriminative power of the model by adding unlabeled training samples, but it does not enhance the generalization ability of the model. This article focuses on studying a semi-supervised metric learning method to improve the metric model to improve the generalization ability of the final metric model.

Given ${y_{ij} = x_{i}^{p} - x_{j}^{g} | y_{ij} \in R^{m}}$ training set and test set ${z_{ij} = s_{i}^{p} - s_{j}^{g} | z_{ij} \in R^{m}}$ . First, establish the following objective function formula according to Equation (10):

$\max J (w) = \frac{w^{T} S_{n} w}{w^{T} S_{p} w} + \frac{w^{T} S_{n}^{'} w}{w^{T} S_{p}^{'} w}$ (11)

Where and respectively represent the divergence of the positive and negative samples of the test sample. According to the expression of Equation (10), it can be seen that the improved metric learning model increases the generalization ability of the model by restricting the overall divergence of the positive and negative samples of the test data.

However, the label information of the test data is unknown, and and cannot be calculated. Therefore, this paper adopts a semi-supervised learning method to solve the above-mentioned metric model through stepwise optimization. Proceed as follows:

According to the training samples, calculate the initial metric model, that is, use the Lagrange multiplier method to solve Equation (10), then:

$max J (w) = w^{T} S_{n} w - λ w^{T} S_{p} w$ (12)

Taking the derivative of the above formula, we get:

$\frac{\partial J (w)}{\partial w} = S_{n} w - λ S_{p} w$ (13)

Let the derivative in Equation (13) be equal to 0, then:

$S_{n} w = λ S_{p} w$ (14)

Therefore, the solution of the optimization problem in Equation (10) could be found by solving the eigenvalue problem in Equation (15):

$| S_{p}^{- 1} S_{n} w - λ I | = 0$ (15)

Solve the eigenvalue problem in the above Equation (15) to obtain the eigenvector (w₁, w₂, w₃, ⋯ , w_m). Among them, (w₁, w₂, w₃, ⋯ , w_m) is arranged in descending order of eigenvalues, and the elements are orthogonal to each other. The first eigenvectors are selected to form the projection matrix W = (w₁, w₂, ⋯ , w_r) to obtain the solution of the optimization problem.

Calculate the positive and negative sample divergence of unlabeled data. The formula for , $S_{p}^{'}$ $S_{n}^{'}$ is as follows:

$\begin{array}{l} S_{p}^{'} = \sum k_{i j} (z_{i j} (z_{ij}) T) \\ S_{n}^{'} = \sum (1 - k_{i j}) (z_{i j} (z_{ij}) T) \end{array}$ (16) where {z_ij} represents the test data set, and k_ij represents whether the test sample $s_{j}^{g}$ belongs to the k nearest neighbor of $s_{i}^{p}$ in the metric subspace, which is defined as follows:

$k_{ij} = {\begin{matrix} 1, if rank (d_{w} (z_{ij}); D_{w}) \leq k \\ 0, otherwise \end{matrix}$ (17) where rank (·) represents the ordering function of sample similarity; $D_{w}^{(i)}$ represents the set of similarity distance measurement results between the i-th sample to be identified in scene A and all samples in scene B; k represents the k-nearest neighbor parameter of sample $s_{i}^{p}$ . Therefore, $rank (d_{w} (z_{ij}); D_{w}^{(i)})$ represents the similarity distance d_w (z_ij) of z_ij sample, and the ranking among all similarity measurement results of sample $s_{i}^{p}$ .

For the test data, the intra-class divergence of positive and negative samples is estimated by the k-nearest neighbor relationship of the samples in the metric subspace. In the metric subspace, the k-nearest neighbor similar samples of the sample to be identified are used to obtain the preliminary identification result. However, the model has an overfitting problem to the training samples and poor generalization ability. In this paper, the semi-supervised learning method is used to roughly estimate the divergence of positive and negative samples of unlabeled data, and the metric model is revised by the estimated value of the overall divergence of positive and negative samples.

Through the calculation in Equation (16), an approximate estimation of the overall divergence of positive and negative samples of unlabeled data is obtained. Similarly, using Lagrange multiplier method to solve Equation (11), we get:

$max J (w) = w^{T} (S_{n} + S_{n}^{'}) w - λ w^{T} (S_{p} + S_{p}^{'}) w$ (18)

Therefore, model (18) is transformed into the solution of the following eigenvalue problems:

$| λ I - (S_{n} + S_{n}^{'}) -^{1} (S_{p} + S_{p}^{'}) w | = 0$ (19)

By solving the eigenvalue problem in Equation (19), the metric space projection matrix of the improved algorithm is obtained.

Through the above steps, the person re-identification algorithm modeling based on the improved linear discriminant analysis method is completed. It can be seen from the modeling process that this paper uses a semi-supervised learning method to estimate the divergence of the positive and negative sample population of unlabeled data, and then uses the divergence of the positive and negative samples of unlabeled data to improve the projection space of the metric model.

This paper designs an improved linear discriminant analysis method for person re-identification.Our method firstly learns the initial metric projection space based on the linear discriminant analysis algorithm in the training sample set. Then, the semi-supervised learning method is used to estimate the overall divergence of positive and negative samples of unlabeled data in the metric projection space. The calculation process is shown in Table 1.

Table 1

Algorithm calculation process of this paper

Input: training set y_ij, training set l_ij label, and unlabelled test set $y_{ij}^{'}$ .
(1)The projection matrix w of the initial measurement model based on the supervised linear discriminant analysis model training;
(2)Use the metric model learned by the linear discriminant analysis method to measure the similarity of the unlabelled data, and obtain the metric result D_w;
(3)According to the measurement result, calculate the k-nearest neighbor samples of each individual to be identified in the test data, and calculate the estimated value, of the overall divergence of the positive and negative samples of the unlabelled data according to formula (16);
(4)According to formula (19), calculate the metric model projection matrix of the improved algorithm;
(5)According to the metric model learned by the improved algorithm, calculate the similarity measurement result of the test sample;
(6)Sort the test samples according to the similarity measurement results to get the final recognition result.
Output: Metric model A and the test results of the sample.

4 Experimental results

4.1 Datasets and parameter settings

In order to verify the effectiveness of the algorithm in this paper, we select VIPeR[3], CUHK01, and Market1501 person re-identification datasets for model test. Among them, VIPeR is the most widely used algorithm accuracy test data set. It contains 1264 images of 632 people. Each pedestrian has and only one image in each monitoring scene. The image size is normalized to 128×48 pixels. The CUHK01 dataset is larger than VIPeR and is a multi-shot dataset. Each pedestrian contains 2 images in the same surveillance scene. The dataset has 3884 images of 971 pedestrians.

This article is based on Matlab software to program the algorithm. The hardware environment is: PC (Intel i3-2130 3.40GHz CPU, memory 4 GB, hard disk 512GB).

4.2 Evaluation index

To evaluate the accuracy of the person re-identification algorithm, this paper uses the CMC cumulative accuracy curve [5, 7] as the evaluation method of the recognition accuracy. The CMC curve sorts the measured distance of the samples from small to large, and calculates the ratio of the number of correct recognition results to the total number of samples to be recognized among the recognition results ranked in the first l. The calculation formula is as follows: $CMC (l) = (\frac{1}{N} \sum_{i = 1}^{N} II 2 (rank (d_{i}) < l)$ (20)

Among them, l is the order of the recognition results and rank (d_i) < l represents the ratio of the correct samples ranked first l-1 in the recognition results to the total samples to be recognized. N represents the total number of samples to be identified. uppercaseexpandafterromannumeral2 (·)is a sign function which indicates whether the recognition result d_i is ranked in the top l, and the function value is 0 or 1.

4.3 Sensitivity analysis of parameters

In order to further verify the effectiveness and robustness of the algorithm in this paper, the recognition accuracy of the algorithm was tested under different training sample sizes. The results are shown in Fig. 2.

Fig. 2

Comparison of algorithm identification accuracy under different number of training samples. P denotes the size of data used for model training.

In Fig. 2, the abscissa represents the identification accuracy under different training sample sizes, and the ordinate represents the identification rate. It can be seen from Fig. 2 that the algorithm in this paper has excellent performance under different training sample sizes, and all have reached the optimal recognition results. Meanwhile, the algorithm in this paper is very robust to changes in the size of training samples. When the number of training individuals is reduced from 485 to 250, the recognition accuracy is only slightly reduced. Only when the number of training individuals is reduced to 100, the recognition accuracy decreases significantly due to too few samples.

Moreover, the sensitiveness of the proposed method on k values is analysed. The identification results under different k values are displayed in Fig. 3. As shown in this figure, the identification rates of rank-1, rank-5, rank-10, and rank-20 are displayed in this figure. According to the results, the proposed method reaches the best performance when k = 4.

Fig. 3

Identification rates under different k values. k denotes number of k-nearest neighbour.

4.4 Experimental results on VIPeR

In order to ensure the fairness of the comparison test results, this paper adopts a standard experimental procedure, randomly divides the data set to generate training samples and test samples, and tests the recognition accuracy of the algorithm under the same training scale. At the same time, in order to ensure the validity of the statistical results, 10 independent repeated tests were performed, and the average accuracy was calculated as the final recognition result.

As shown in Table 2 and Fig. 4, for the comparison test results of the proposed method on the VIPeR dataset, the CMC curves of different algorithms of the recognition accuracy of rank-1, rank-5, rank-10, and rank-20 are displayed. p = 316 is the size of training data, which means that there 316 individuals are used for metric model training.

Fig. 4

Comparison of algorithm recognition accuracy.

Table 2

Comparison results on VIPeR

	P = 316
Methods	r = 1	r = 5	r = 10	r = 20
ITML[13]	14.41	18.30	23.20	33.72
LMNN[19]	25.15	56.68	71.43	85.77
RDC[12]	11.71	25.32	35.06	45.68
MVSLDDL[21]	16.86	41.22	58.06	95.57
KISSME[15]	19.94	47.89	62.77	77.21
XQDA[7]	40.00	68.33	81.07	91.24
MLAPG[22]	40.85	69.26	81.92	91.78
LDA	37.44	64.81	77.63	89.40
HRNet[23]	48.7	73.4	81.7	-
CAD-Net++[24]	43.4	68.7	78.2	-
INTACT[25]	46.2	73.1	91.6	-
Improved LDA	46.80	70.15	80.16	91.22

Experimental results show that the proposed method in this paper has achieved a high recognition accuracy. Compared with the state-of-the-art person re-identification algorithm, the recognition accuracy of the algorithm in this paper has been significantly improved, especially in the recognition accuracy of rank-1. Compared with the best comparison algorithm, the algorithm improves 0.60%. At the same time, the improved LDA algorithm improves the recognition accuracy by 9.36% compared with the LDA model. As for the rank-5, rank-10, and rank-20 identification rate, the proposed method also achieved the best or close to the best recognition accuracy.

4.5 Experimental results on CUHK01

In addition, Table 3 shows the comparison experiment results of the recognition accuracy of the proposed method on the CUHK01 dataset with XQDA [7], NFST [15], MLAPG [19], and Improved Deep [20]. p = 485 is the size of training data of CUHK01 dataset. It can be seen from the table that the proposed method has a very excellent performance. In addition, it is worth noting that, compared with the LDA person re-identification algorithm, the improved algorithm proposed in this paper has greatly improved the recognition accuracy. It can be seen that the algorithm in this paper is effective in improving the baseline LDA method. It outplays LDA by 32.11% at rank-1.

Table 3
Comparison experiment recognition accuracy statistics table (CUHK01)

P = 485

Methods r = 1 r = 5 r = 10 r = 20

Improved Deep[14] 47.53 71.00 80.00 -

NFST[16] 65.0 85.0 89.9 94.4

XQDA[7] 63.21 83.89 90.04 94.14

MLAPG[22] 64.24 85.41 90.84 94.92

LDA 43.23 66.21 75.05 83.23

PN-GAN[25] 67.65 86.64 91.77 -

Quadruplet[26] 62.55 83.44 89.71 -

DECAMEL[27] 65.81 - - -

pt-GAN[28] 73.95 93.04 96.97 -

Improved LDA 75.34 91.56 96.61 99.38

	P = 485
Improved Deep[14]	47.53	71.00	80.00	-
NFST[16]	65.0	85.0	89.9	94.4
XQDA[7]	63.21	83.89	90.04	94.14
MLAPG[22]	64.24	85.41	90.84	94.92
LDA	43.23	66.21	75.05	83.23
PN-GAN[25]	67.65	86.64	91.77	-
Quadruplet[26]	62.55	83.44	89.71	-
DECAMEL[27]	65.81	-	-	-
pt-GAN[28]	73.95	93.04	96.97	-
Improved LDA	75.34	91.56	96.61	99.38

4.6 Experimental results on market-1501

Besides, more experiments were conduct on the Market-1501 dataset to further prove the effectiveness of the proposed method. We repeated the experiments 10 times and the average identification rates are used for performance evaluation.

We first set test data size to 100 and the comparison experiment results of the proposed method on the Market-1501 dataset are presented in Table 4. Many state-of-the-art models are selected to compare with our method. The comparative results indicate that the algorithm in this paper has a very excellent performance, reaching the second accuracy. The proposed method reaches 91.12%, 96.97%, 98.69%, and 99.79% identification rate at rank-1, rank-5, rank-10 and rank-20 respectively. In addition, it is worth noting that the proposed method reaches the same level of identification accuracy as the deep learning models. It can be seen that the algorithm in this paper is effective in improving the metric learning based method.

Table 4
Comparison experiment recognition accuracy statistics table (Market-1501)

r = 1 r = 5 r = 10 r = 20

PN-GAN[25] 89.43 - - -

pt-GAN[28] 90.87 - - -

HRNet[23] 87.8 - - -

TO[29] 92.00 97.10 - -

Improved LDA 91.12 96.97 98.69 99.79

	r = 1	r = 5	r = 10	r = 20
PN-GAN[25]	89.43	-	-	-
pt-GAN[28]	90.87	-	-	-
HRNet[23]	87.8	-	-	-
TO[29]	92.00	97.10	-	-
Improved LDA	91.12	96.97	98.69	99.79

Moreover, we set the test data size to 100 and 750 to test the proposed method under different training sizes and the comparison experiment results are displayed in Figs. 5 and 6 . As shown in these two figures, the proposed method is of low accuracy when the test size is large. This is because the improvement of the proposed method is based on the k-nearest neighbours for positive sample pairs estimation. There is a big error between estimation of overall divergence of positive samples and the ground truth. Therefore, the proposed method is limited when the test sample is large.

Fig. 5

Identification rate on Market-1501(test size=100).

Fig. 6

Identification rate on Market-1501(test size=750).

4.7 Training Size

In order to further demonstrate the effectiveness of the proposed method, we test the proposed method over different training size on VIPeR dataset. The performance of the proposed method over different training size is displayed in Table 5. As shown in this table, the proposed method reaches superior performance on robustness of training size. When training size decreases, the identification accuracy of our method decreases slightly.

As we know, the deep learning based methods need big size of training data. It cost much time and human resources to label training samples for deep learning model training. While the proposed method need only a small size of samples to reach a good performance. Besides, the proposed method also provides a novel way for improvement of the metric learning based methods.

Table 5
Comparison results on VIPeR under different training size

Training Size r = 1 r = 5 r = 10 r = 20

P=100 43.04 67.41 79.30 89.24

P=200 45.24 68.43 78.99 90.15

P=316 46.80 70.15 80.16 91.22

Training Size	r = 1	r = 5	r = 10	r = 20
P=100	43.04	67.41	79.30	89.24
P=200	45.24	68.43	78.99	90.15
P=316	46.80	70.15	80.16	91.22

4.8 Time Complexity

In this section, the time complexity of our method is analysed. According to the method introduced in section 2 and section 3, the time complexity of our method is mainly related to the solution of the eigenvalue problem in formula (19). Therefore, the time complexity of our method is O (n ²). n denotes the dimension of feature vector of samples. Besides, our method resamples the k-nearest neighbor samples for metric model fine-tune base on the distance of model in formula (15). Then the total time complexity is O (2n²).

4.9 Pros and Cons

In this paper, we design a novel method of similarity measurement for person re-identification. The pros and cons are summarized as follows:

Our method proposed a simple approach to improve existing metric learning based methods. It improves existing methods by resampling similar samples from the nearest neighbors and retraining the metric model. Our method provides a generalized approach which could be promote to methods. Besides, our method solves person re-identification problem from a novel angle that the similar negative samples should be closer to the test individual than the dissimilar negative samples.

However, the shortcoming of the proposed method is that it needs to re-training the metric model for test data. While, most metric learning based methods only train the models on training data. The test data is measured by the model with parameters trained on training data.

5 Conclusion

Person re-identification is a key and difficult points in the field of computer vision research. Due to the complexity of the application scenario and the drastic changes in image features, the current person re-identification algorithms are still far from practical applications. In response to the above problems, this paper proposes a sampling method based on the normality of sample characteristics. According to the normal characteristics of paired samples from multiple perspectives of pedestrians, a linear discriminant analysis (LDA) algorithm is established. And a new generalized algorithm of pedestrian re-identification is proposed by weighting the results of generalized model and LDA model. The effectiveness, accuracy, and robustness of this algorithm are proved by the comparison with algorithms such as ITML, LMNN, RDC, MVSLDDL, KISSME, XQDA, and MLAPG. Especially in the recognition accuracy of rank-1, compared with the baseline method, LDA, the proposed algorithm improved by 9.36% and 32.11% in the VIPeR and CUHK01 datasets respectively. It also reaches the second identification accuracy on Market-1501 dataset. It is worth noting that, the proposed method is limited to the accuracy of k-nearest neighbours. The proposed method provides a new thought for similarity measurement.

Footnotes

Acknowledgment

The project supported by Scientific Research Project of Hubei Province Department of Education (Q20201601), This work was finished at Wuhan Polytechnic University, Wuhan.

Data Availablity Statement

Research data are not shared.

References

Liu

, Tao

Learning to track multiple targets, IEEE Trans on Neural Networks & Learning Systems 26(5) (2015), 1060–1073.

Yang

, Yang

, Yan

et al, Salient color names for person re-identification, in: European Conference on Computer Vision, (2014), pp. 536–551.

Gray

, Tao

Viewpoint invariant pedestrian recognition with an ensemble of localized features, in: European Conference on Computer Vision, Marseille, France, (2008), pp. 262–275. 715

Zhang

, Chen

, Saligrama

A Novel Visual Word Cooccurrence Model for Person Re-identification, in:Workshop at the European Conference on Computer Vision, Springer International Publishing, (2014), pp. 122–133.

Hirzer

, Roth

P.M.

, Köstinger

, Bischof

Relaxed pairwise learned metric for person re-identification, in: European Conference on Computer Vision, Florence, (2012), pp. 780–793. 723

Yang

, Yang

, Yan

et al Salient Color Names for Person Re-identification, State Key Laboratory of Pattern Recognition 8689(9) (2014), 536–551.

Liao

, Hu

, Zhu

et al Person re-identification by local maximal occurrence representation and metric learning, in: IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, (2015), pp. 2197–2206. 730

Prosser

, Gong

, Xiang

Multi-camera matching under illumination change over time, in:Workshop on Multi Camera and Multi-Modal Sensor Fusion Algorithms and Applications, 2008.

Datta

, Brown

L.M.

, Feris

et al Appearance modeling for person re-identification using Weighted Brightness Transfer Functions, in: IEEE Int. Conference on Pattern Recognition, Tsukuba, Japan, (2012), pp. 2367–2370.

10.

Dong

S.C.

, Cristani

, Stoppa

et alCustom Pictorial Structures for Re-identification, in: British Machine Vision Conference, Dundee, Scotland, (2011), pp. 68.1–68.11.

11.

Farenzena

, Bazzani

, Perina

, Cristani

et al., Person re-identification by symmetry-driven accumulation of local features, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, (2010), pp. 2360–2367. 746

12.

Zheng

, Gong

, Xiang

Reidentification by relative distance comparison, IEEE Trans on Pattern Analysis and Machine Intelligence 35(3) (2013), 653–668.

13.

Davis

J.V.

, Kulis

, Jain

et al Information theoretic metric learning, in: Intelligence Conference on Machine Learning, Corvallis, Oregon, 752 USA, (2007), pp. 209–216

14.

Köstinger

, Hirzer

, Wohlhart

et al Large scale metric learning from equivalence constraints, in: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, (2012), pp. 2288–2295.

15.

Zhang

, Xiang

, Gong

Learning a Discriminative Null Space for Person Re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, (2016), pp. 1239–1248.

16.

Y.N.

, Ai

H.Z.

Pedestrian re-identification algorithm based on statistical inference, Journal of Electronics and Information Technology 36(7) (2014), 1612–1618.

17.

Yaghoubi

, Kumar

, ProenÃğa

SSS-PR: A short survey of surveys in person re-identification, Pattern Recognition Letters 143 (2021), 50–57.

18.

, Shen

, Lin

et al, Deep learning for person reidentification: A survey and outlook, IEEE Transactions on Pattern Analysis and Machine Intelligence (99) (2021), 1–1.

19.

Weinberger

K.Q.

, Saul

L.K.

et al., Distance metric learning for large margin nearest neighbor classification, The Journal of Machine Learning Research 10(1) (2009), 207–244.

20.

Jing

, Zhu

, Wu

, Hu

et al., Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning, IEEE Trans on Image Processing 26(3) (2017), 1363–1378.

21.

Liao

, Li

S.Z.

Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification, in: IEEE International Conference on Computer Vision, Santiago, Chile, (2015), pp. 3685–3693.

22.

Ahmed

, Jones

, Marks

T.K.

An improved deep learning architecture for person re-identification, in: IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, (2015), pp. 3908–3916.

23.

Y.J.

, Chen

Y.C.

, Lin

Y.Y.

, Wang

Y.C.F.

Crossresolution adversarial dual network for person reidentification and beyond 2020, in arXiv preprint arXiv: 2002.09274.

24.

Cheng

, Dong

, Gong

, Zhu

Inter-task association critic for cross-resolution person re-identification, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, (2020), pp. 2602–2612.

25.

Qian

, Fu

, Xiang

et al Pose-normalized image generation for person re-identification, in: Proceedings of the European Conference on Computer Vision, Munich, Germany, (2018), pp. 650–667.

26.

Dong

, Shen

, Wu

, et al Quadruplet network with oneshot learning for fast visual object tracking, IEEE Transactions on Image Processing 28(7) (2019), 3516–3527.

27.

, Wu

, Zheng

Unsupervised person reidentification by deep asymmetric metric embedding,, IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99) (2018), 1âĂŞ

28.

Karmakar

, Mishra

Pose invariant person reidentification using robust pose-transformation gan 2021, in arXiv preprint arXiv: 2105.00930.

29.

, Zheng

S.J.

, Yuan

C.A.

, Huang

D.S.

A deep model with combined losses for person re-identification, Cognitive Systems Research 54(May) (2019), 74–82.

Semi-supervised LDA pedestrian re-identification algorithm based on K-nearest neighbor resampling

Abstract

Keywords

1 Introduction

2.1 Basic model of pedestrian re-identification

4.1 Datasets and parameter settings

4.2 Evaluation index

Table 4 Comparison experiment recognition accuracy statistics table (Market-1501) r = 1 r = 5 r = 10 r = 20 PN-GAN[25] 89.43 - - - pt-GAN[28] 90.87 - - - HRNet[23] 87.8 - - - TO[29] 92.00 97.10 - - Improved LDA 91.12 96.97 98.69 99.79

Table 5 Comparison results on VIPeR under different training size Training Size r = 1 r = 5 r = 10 r = 20 P=100 43.04 67.41 79.30 89.24 P=200 45.24 68.43 78.99 90.15 P=316 46.80 70.15 80.16 91.22

4.9 Pros and Cons

5 Conclusion

Footnotes

Acknowledgment

Data Availablity Statement

References

Table 4
Comparison experiment recognition accuracy statistics table (Market-1501)

r = 1 r = 5 r = 10 r = 20

PN-GAN[25] 89.43 - - -

pt-GAN[28] 90.87 - - -

HRNet[23] 87.8 - - -

TO[29] 92.00 97.10 - -

Improved LDA 91.12 96.97 98.69 99.79

Table 5
Comparison results on VIPeR under different training size

Training Size r = 1 r = 5 r = 10 r = 20

P=100 43.04 67.41 79.30 89.24

P=200 45.24 68.43 78.99 90.15

P=316 46.80 70.15 80.16 91.22