Doubly stochastic subdomain mining with sample reweighting for unsupervised domain adaptive person re-identification

Abstract

Clustering-based unsupervised domain adaptive person re-identification methods have achieved remarkable progress. However, existing works are easy to fall into local minimum traps due to the optimization of two variables, feature representation and pseudo labels. Besides, the model can also be hurt by the inevitable false assignment of pseudo labels. In order to solve these problems, we propose the Doubly Stochastic Subdomain Mining (DSSM) to prevent the nonconvex optimization from falling into local minima in this paper. And we also design a novel reweighting algorithm based on the similarity correlation coefficient between samples which is referred to as Maximal Heterogeneous Similarity (MHS), it can reduce the adverse effect caused by noisy labels. Extensive experiments on two popular person re-identification datasets demonstrate that our method outperforms other state-of-the-art works. The source code is available at https://github.com/Tchunansheng/DSSM.

Keywords

Unsupervised domain adaptation person re-identification sample reweighting feature learning

1. Introduction

Person re-identification (ReID) aims to identify a specific pedestrian across different surveillance cameras, which has wide range of applications such as suspect tracking, finding lost kids or aged man, crowd trajectory analysis and etc. Existing works have achieved satisfactory performance for supervised person re-identification [17,18,27,28,30], in which the training data and testing data are collected from the same domain. But the performance of these models usually drops dramatically while generalizing to a new target domain. To address this problem, more and more researchers concentrate on Unsupervised Domain Adaptive (UDA) person ReID tasks [21,34,41], in which both labeled source domain data and unlabeled target domain data are utilized in model training.

For UDA person ReID tasks, many methods [5,7,13,22,25,33,43] devote to reducing the gaps between source domain and target domain. Along the typical GAN-based approach, some models [5,7,33,43] map the data from source domain to target domain while retaining their labels. Other approaches [13,22,25] focus on aligning the source and target domains in feature level. But all these methods do not use the target domain data in model training, the model’s performance is not satisfactory.

Another typical approach to UDA person ReID focuses on the generation of pseudo labels [4,10,37,41], which consists of three steps: model pre-training with labeled source domain data, clustering-based pseudo label assignment for samples in target domain, fine-tune the model using the target domain data with pseudo labels. The last two steps are generally carried out iteratively to optimize the feature representation and the clustering results. However, noisy pseudo labels are inevitably generated due to the gaps between the source domain and the target domain and the unstable clustering algorithm. These noisy labels would mislead the model training, and then lead to the inaccurate clustering results. Therefore, addressing the mutual negative influence for the last two steps is important and necessary for UDA person ReID models.

To address this problem, there are two challenges. For the first one, UDA person ReID involves the optimization of two variables, feature representation and pseudo labels, which is more complicated than the traditional supervised ReID tasks and brings more risk of falling into local minimum traps. The second challenge arises from the interaction between the two variables, since the feature learning is based on the pseudo labels and the pseudo labels are obtained from the clustering result which is calculated according to the feature space. In the early stage of the iterative optimization, the feature space has not been trained well to distinguish different person IDs while pseudo labels are still noisy.

To solve the first challenge, we propose a stochastic sampling scheme to prevent the model from falling into local minimum traps. Although the previous works used the minibatch-based SGD methods to reduce the risk of falling into local minima for feature learning, they still adopted all training samples to generate pseudo labels, which makes the clustering results easy to fall into local minima. Moreover, the adverse effect of the noisy labels in the current epoch is difficult to be corrected in the next epoch if the training target samples remain constant. To overcome these issues, we present a new sampling algorithm, referred to as Doubly Stochastic Subdomain Mining (DSSM), with which both the feature learning and the sample clustering in each training round are implemented based on a part of randomly selected samples instead of the whole training set. These samples can be viewed as a subdomain of the training set, which provides better diversity for model learning, and therefore reduces the risk of falling into local minimum traps and helps to correct noisy pseudo labels.

For the second challenge, we propose a new sample reweighting algorithm to reduce the adverse impact of noisy labels. The weight of each sample is generated based on a new metric referred to as Maximal Heterogeneous Similarity (MHS), which reflects the confidence level of the assigned pseudo label. The proposed MHS-based sample Reweighting algorithm (R-MHS) can effectively suppress the negative impact of noisy pseudo labels and consequently break down the vicious circle between poor feature representation and false clustering results.

Overall, the main contributions of this paper can be summarized in three aspects:

We propose the Doubly Stochastic Subdomain Mining sampling algorithm to supply different sample subdomains for each training epoch, which helps prevent the model from falling into local minimum traps.

A new sample reweighting algorithm is designed based on Maximal Heterogeneous Similarity to reduce the negative impact caused by noisy pseudo labels.

Extensive experiments on popular benchmarks show that our model achieves the state-of-the-art performance, which demonstrates the effectiveness of our proposed model.

2. Related work

2.1. Unsupervised domain adaptive person ReID

The UDA person ReID aims at transferring the knowledge from labeled source domain to unlabeled target domain. Existing UDA person ReID methods can be summarized into two categories. For the first category, some methods attempt to reduce the domain gaps between the source domain and target domain. Among which, some works [5,7,33,43] apply the Generative Adversarial Network to transform the image style from source domain to target domain, which can reduce the domain bias in image level. Some existing methods attempt to eliminate the domain bias in feature level. Mekhazni et al. [22] proposed to utilize the maximum mean discrepancy to reduce the difference between the distributions of source and target domains. Eanet [13] utilized the key points to align the features across source domain and target domain distributions. However, the samples of target domain and their underlying identity information are not fully used in the training of the feature network. For the second category, clustering-based methods are widely used to generate pseudo labels as supervision to fine-tune the model. Some methods [10,20,35] use the pseudo labels which are generated based on the clustering algorithm to fine-tune the pretrained model directly. Since noisy pseudo labels are inevitably introduced in the clustering step, some works strive to reduce the negative impact caused by these noisy labels, such as MMT [9], NRMT [37] and MEB-Net [36]. But these methods can still fall into the local minimum traps through the alternate iterative learning between the feature network and the pseudo labels.

In this paper, we follow the clustering-based approach to solve the UDA person ReID task. The difference compared with the above mentioned works is that our model is motivated to learn knowledge from diverse target subdomains instead of the whole target domain in different training rounds, and a new reweighting method is proposed to alleviate the influence of noisy pseudo labels.

2.2. Dealing with local minimum traps

The training of deep neural network can easily fall into local minimum traps. In order to analyse this phenomenon and address this issue, a lot of works focus on this purpose. Some of these methods [15,19,23,29] concentrate on the objective functions, which attempt to avoid the problem that cannot achieve the global optimal solution by imposing some constraints on the objective functions. And some works [3,31,42] aim at the convergence of gradient-based methods for optimizing losses. For example, Liang et al. [19] proposed to add one special neuron with a skip connection to the output or one special neuron per layer, with the help of these new structures, the new loss function can eliminate spurious local minima. Brutzkus et al. [3] showed that when the input data follows the gaussian distribution, the gradient descent of lightweight network would converge to the global optimum solution. However, these methods only focus on the problem that the supervised learning of model is easy to fall into local minima, they cannot solve the local minimum problem in the iterative optimization process between the clustering and feature learning in UDA person ReID. In order to address this problem, we propose a new stochastic sampling scheme to supply various subdomains for clustering and feature learning in different training rounds, which can prevent the nonconvex iterative optimization between both of them from falling into local minima.

2.3. Sample reweighting

In some tasks, there exists some outlier samples or some noisy data that would confuse the model training. However, these samples can’t be found directly, therefore sample reweighting strategy is frequently utilized to reduce their adverse impact. In order to enable the model to avoid suffering from these noisy labels, some methods [4,6,11] discard the samples which may have the noisy labels during the training phase. Co-teaching [11] proposed to train the two networks at the same time and then the two networks work together to select partial samples as reliable data to update the parameters of models. DCML [4] proposed to progressively adopt samples based on two credibility metrics which include K-Nearest Neighbor similarity for density evaluation and the prototype similarity for centrality evaluation. All these works need pre-defined threshold as empirical parameter to filter out noisy labels. Cheng et al. [6] proposed an adaptive dynamic threshold to ensure that the clean samples are selected. However, it is hard to guarantee that the discarded samples are noisy, which reduces the diversity in training data while wasting some useful samples. Therefore some works [26,38] assign continuous weights to different samples instead of removing these samples directly, which can make the training process robust. Shu et al. [26] used an MLP with one hidden layer to reweight each sample, which can automatically learn the mapping relationship from loss to weight under the guidance of the unbiased meta data. UNRN [38] adopted the KL divergence to evaluate the reliability of each sample and utilize this uncertainty to reweight its contribution for model training. However, these methods do not consider the correlations between samples which may neglect hidden information.

Different from the above works, this paper proposes a novel sample reweighting method based on a new metric referred to as Maximal Heterogeneous Similarity, which takes the similarity between different samples into account. Based on the MHS, samples with high credibility labels are enhanced and the samples with low quality labels are suppressed in the training of the model, which can reduce the negative impact caused by noisy pseudo labels.

3. Method

3.1. Overall framework

Given a labeled training dataset $D^{s} = {(x_{j}^{s}, y_{j}^{s})}_{j = 1}^{N_{s}}$ collected from the source domain, where $x_{j}^{s}$ and $y_{j}^{s}$ denote the j-th source sample and its person identity label respectively, and $N_{s}$ is the number of training samples in the source domain. The unlabeled target domain samples are denoted as $D^{t} = {x_{i}^{t}}_{i = 1}^{N_{t}}$ . The model proposed in this paper adopts the mean-teacher framework, it consists of a feature network F and the corresponding momentum update network $\tilde{F}$ . Among them, the feature network F learns feature representation under the guidance of our proposed loss functions, and it is updated by gradient-based back propagation; the network $\tilde{F}$ is designed to provide pseudo labels for the samples of target domain, and the momentum update is performed according to the feature network F. For the source domain, the feature network F learns directly from the ground-truth labels of the samples; as for the target domain, F learns from the pseudo labels provided by the mean teacher network $\tilde{F}$ . The momentum update of the network $\tilde{F}$ can be described as Eq. (1): $\begin{matrix} (1) & \tilde{θ} (t) = α \tilde{θ} (t - 1) + (1 - α) θ (t) \end{matrix}$ where $\tilde{θ}$ and $θ$ represent the parameters of $\tilde{F}$ and F respectively, t and $t - 1$ represent the current and the previous update round, and α is the momentum coefficient.

According to the above analysis, model training involves the co-optimization problem of two interacting factors, feature representation and pseudo labels, which is easy to fall into local minimum traps, and the optimization process is also prone to fall into a vicious circle due to the adverse effect caused by noisy pseudo labels. Therefore, this paper proposes two new modules based on the traditional mean-teacher architecture: Doubly Stochastic Subdomain Mining (DSSM) and sample Reweighting based on Maximal Heterogeneous Similarity (R-MHS) as Fig. 1 shows. The former effectively reduces the risk of falling into a local minimum trap in the training process by enhancing the diversity of clustering in each training round; the latter reduces the adverse effect of noisy pseudo labels on the model by reweighting the samples.

Fig. 1.

The framework of our work, among which the two parts circled by purple rectangles respectively represent the doubly stochastic subdomain mining (DSSM) and reweighting method based on our proposed maximal heterogeneous similarity (R-MHS) modules.

With the participation of the above two novel modules, the learning process of the entire model can be described as the following steps: 1) sampling from the target domain based on the DSSM; 2) generating pseudo labels for the samples of target domain using the DBSCAN [1] clustering algorithm; 3) reweighting samples using the R-MHS algorithm; 4) training the model with source samples and target samples according to their weights. Through the iterative optimization between feature representation and pseudo labels, the training process will be finished till the model converges, and the performance of the model will be effectively boosted.

3.2. Doubly stochastic subdomain mining

Traditional clustering-based UDA person ReID methods generally generate pseudo labels for the whole target training set. Unfortunately, clustering itself is a nonconvex combinatorial optimization problem with many local minima. In addition, the feature space is not reliable in the early stage of training, and then the clustering results are unreliable at that time, which will lead to more noisy pseudo labels and further increase the risk of falling into local minimum traps. To address this issue, this paper proposes the DSSM algorithm, in which a part of samples are randomly selected as a subdomain of the target domain in each training epoch. Since the subdomains in each round are different, the objective function of clustering changes along with the training rounds. It can effectively prevent the optimization process from falling into some local minimum traps. Besides, the proposed sampling scheme based on DSSM can effectively reduce the dependence of subdomain selection on the hyperparameters and consequently improve the universality and reliability of the algorithm.

Randomly selecting subsets from the whole training set is a classic strategy to reduce the local minimum risk and improve the generalization ability of the model. It is widely used in ensemble learning and SGD. But most of these methods apply a fixed size for subset selection. It is easy to understand that the diversity between subdomains will decline while the reliability of each subdomain will increase if the size of the subdomain becomes large, and vice versa. That means the convergence and the generalization ability of the model are sensitive to the size of subdomains. To overcome this limitation, we propose the doubly stochastic subdomain mining algorithm. By “doubly stochastic”, we mean each subdomain is mined with a random size and random sampling. The size of subdomain $D_{k}^{t}$ for the k-th epoch is randomly generated following the normal distribution as described as Eq. (2): $\begin{matrix} (2) & \begin{aligned} r_{k} & \sim N (μ, δ^{2}) \\ M_{k}^{t} & = r_{k} N_{t} \end{aligned} \end{matrix}$ where μ and δ are the hyperparameters of the mean value and standard deviation for the normal distribution, $r_{k}$ denotes the sampling rate in the k-th epoch, and $M_{k}^{t}$ is the sample number of the subdomain $D_{k}^{t}$ which consists of random samples selected from the whole target training set. There is a point to note, the value of $r_{k}$ are constrained by the specific maximum and minimum. In order to avoid the condition that the number of samples in subdomain is too small, which cannot provide stable knowledge for model training, the value of $r_{k}$ is limited to larger than 0.3. Similarly, the maximum of $r_{k}$ is constrained by 0.9, it ensures that the model can learn features from different subdomains in adjacent training epochs.

With the help of the doubly stochastic sampling, better diversity between different subdomains can be preserved in the whole training process while the model’s sensitivity to the size of subdomains can be effectively suppressed. Experimental results demonstrate that the reliability of the convergence is improved remarkably when DSSM algorithm is applied.

3.3. Sample reweighting based on maximal heterogeneous similarity

Because the generation of pseudo labels is based on an unsupervised clustering algorithm, the false assignment of pseudo labels is unavoidable, especially for the samples on the boundaries of the clusters. Once these noisy labels are introduced in the training process, the optimization of feature representation could be misled to a wrong direction, and in turn, make the clustering results in the next epoch get worse. The iterative accumulation of these errors could reinforce a vicious circle which can break down the whole training process. To reduce the adverse impact of these noisy labels, we propose a new sample reweighting algorithm to control the influence of the samples according to their confidence level. We present a new metric, referred to as Maximal Heterogeneous Similarity (MHS), to measure the reliability level of each pseudo label quantitatively.

The similarity between any two samples $x_{i}$ and $x_{j}$ is defined as their correlation coefficient as shown in Eq. (3): $\begin{matrix} (3) & \begin{array}{r} s_{i, j} = \frac{f {(x_{i})}^{T} f (x_{j})}{‖ f (x_{i}) ‖ \cdot ‖ f (x_{j}) ‖} \end{array} \end{matrix}$ where $f (x)$ is the feature vector of the sample $x$ , which is provided by the momentum update network $\tilde{F}$ . Then the MHS of a given target sample $x_{i}^{t}$ in the target domain is defined as follow: $\begin{matrix} (4) & \begin{array}{r} s_{i}^{mhs} = max_{j = 1, \dots, M_{k}^{t}, y_{i}^{t} \neq y_{j}^{t}} (s_{i, j}) \end{array} \end{matrix}$ where $y_{i}^{t}$ and $y_{j}^{t}$ represent the pseudo labels of the samples $x_{i}^{t}$ and $x_{j}^{t}$ respectively. Eq. (4) indicates that the MHS value of a sample is the similarity between $x_{i}^{t}$ and its nearest neighbor with different pseudo label. It is easy to understand that a pseudo label with larger MHS value has lower confidence level because it is close to a sample with a different pseudo label, and it should be assigned with a lower weight in the current round. Therefore, the weight $ω_{i}$ of the sample $x_{i}^{t}$ is designed based on the MHS values as follow: $\begin{matrix} (5) & \begin{array}{r} ω_{i} = M_{k}^{t} \times \frac{\exp (- s_{i}^{mhs})}{\sum_{j = 1}^{M_{k}^{t}} \exp (- s_{j}^{mhs})} \end{array} \end{matrix}$ Eq. (5) normalizes the weights of all the samples in the current epoch and increases the gaps between samples with different confidence levels by applying the nonlinear function $\exp (- s)$ . The presence of $M_{k}^{t}$ can normalize the weight of each sample, because it can make the average weight of all samples equal to 1 and we find that the value of weight for each sample is in the appropriate range, which is beneficial to the stability of parameter update during the training process. However, the weight for each sample will be extremely small if the $M_{k}^{t}$ is abandoned in Eq. (5), which could hurt the model training due to the vulnerable weight assignment. Our proposed R-MHS method designs a soft metric to evaluate the reliability of each sample instead of selecting or discarding samples with a hard threshold directly. The smoothing of the similarity between features can suppress the negative impact caused by unreliable feature space. Therefore, although MHS is calculated across the feature space, the resulted sample weight is reliable.

Figure 2 illustrates the effect of the proposed sample reweighting algorithm by analyzing the calculation of MHS values between different samples. Among them, different shapes and colors represent different ground-truth labels and pseudo labels respectively. For the red square sample in the left column, assuming that it is assigned with a false pseudo label, then there must be a sample with the true pseudo label among all its heterogeneous samples, such as the sample corresponding to the green square in the middle column. At this time, the correlation coefficient between the two samples is high and the MHS value is correspondingly large, so the weight calculated by Eq. (5) is small, which is consistent with the assumption that the pseudo label of the red square sample has low reliability.

Fig. 2.

The description of our proposed R-MHS algorithm.

For the yellow pentagon sample in the left, we assume that its pseudo label is true and the samples with the same ground-truth labels are all clustered into the same class and assigned with the same pseudo labels. Then there is no sample with high correlation in the heterogeneous samples of this sample, the corresponding MHS value is consequently low, and the weight of this sample according to Eq. (5) is large, which is also consistent with the assumption that the pseudo label of this sample is reliable.

According to the above analysis, the proposed Reweighting algorithm based on MHS (R-MHS) can assign reasonable weights for different samples, which can further improve the reliability of model by modulating the training process with these weights. The results of ablation study in Section 4 confirm the effect of our proposed R-MHS algorithm.

3.4. Overall loss and algorithm steps

As the proposed DSSM sampling and R-MHS reweighting modules are embedded into the pipeline of our model, the whole framework can be trained with our proposed loss functions on source domain during the pretraining process, and then both source domain and target domain training sets are used to further fine-tune the model.

For both of the mini-batch $B^{t}$ in the target domain and the mini-batch $B^{s}$ in the source domain, we present a Variant of Cross Entropy loss function $L_{VCE}$ as Eq. (6): $\begin{matrix} (6) & \begin{array}{r} L_{VCE} = - [\frac{1}{| B^{t} |} \sum_{x_{i}^{t} \in B^{t}} (ω_{i} log P (y_{i}^{t} | x_{i}^{t})) + \frac{1}{| B^{s} |} \sum_{x_{j}^{s} \in B^{s}} (u_{j} log P (y_{j}^{s} | x_{j}^{s})] \end{array} \end{matrix}$ where $P (y_{i}^{t} | x_{i}^{t})$ and $P (y_{j}^{s} | x_{j}^{s})$ represent the predicted posterior probabilities of the target sample $x_{i}^{t}$ and the source sample $x_{j}^{s}$ belong to their pseudo label $y_{i}^{t}$ and the ground-truth label $y_{j}^{s}$ respectively. Under the guidance of the proposed VCE loss, the feature network F will be trained to gather samples with the same label and distinguish samples with different labels. A point to note, the proposed R-MHS algorithm will assign different weight $ω_{i}$ to each pseudo label $y_{i}^{t}$ while the weights of all the source samples are set as $u_{j} = 1$ , $\forall x_{j}^{s} \in B^{s}$ .

To further improve the recognition ability of the model on some hard samples, we combine the classic triplet loss with the sample weights $ω_{i}$ and $u_{j}$ to construct a new loss function $L_{WTRI}$ as Eq. (7): $\begin{aligned} L_{WTRI} & = \frac{1}{| B^{t} |} \sum_{x_{i}^{t} \in B^{t}} \frac{2 ω_{i} + ω_{i, +} + ω_{i, -}}{4} {[τ^{t} + {‖ f_{i}^{t} - f_{i, +}^{t} ‖}_{2}^{2} - {‖ f_{i}^{t} - f_{i, -}^{t} ‖}_{2}^{2}]}_{+} \\ (7) & + \frac{1}{| B^{s} |} \sum_{x_{j}^{s} \in B^{s}} \frac{2 u_{j} + u_{j, +} + u_{j, -}}{4} {[τ^{s} + {‖ f_{j}^{s} - f_{j, +}^{s} ‖}_{2}^{2} - {‖ f_{j}^{s} - f_{j, -}^{s} ‖}_{2}^{2}]}_{+} \end{aligned}$ where function ${[\cdot]}_{+} = \max (\cdot, 0)$ , $τ^{t}$ and $τ^{s}$ are constant margin thresholds of the traditional triplet loss. $f_{i}^{t}$ , $f_{i, +}^{t}$ and $f_{i, -}^{t}$ denote the features of the anchor sample $x_{i}^{t}$ , its farthest positive sample $x_{i, +}^{t}$ and its nearest negative sample $x_{i, -}^{t}$ in the target domain mini-batch $B^{t}$ respectively. Their corresponding weights $ω_{i}$ , $ω_{i, +}$ and $ω_{i, -}$ are combined as shown as Eq. (7) to balance the contribution of the three samples. Similarly, $f_{j}^{s}$ , $f_{j, +}^{s}$ and $f_{j, -}^{s}$ and their weights $u_{j}$ , $u_{j, +}$ and $u_{j, -}$ can be utilized to compute the WTRI loss term for the source domain mini-batch $B^{s}$ . It is easy to understand that the proposed WTRI loss can correct the boundaries between two neighboring clusters by increasing the distance to the hardest negative samples and reducing the distance to the hardest positive samples. For the target domain samples, all above training processes are still under the guidance of the MHS-based reweighting, which is believed to boost the performance of the model on the target domain because the bad impact of noisy pseudo labels is suppressed by the weights.

At last, the proposed VCE loss and WTRI loss are combined to train the model in an end-to-end fashion. The overall loss function can be written as Eq. (8): $\begin{matrix} (8) & \begin{array}{r} L = β L_{VCE} + γ L_{WTRI} \end{array} \end{matrix}$ where β and γ are the contribution coefficients of the two loss terms. Significantly, the target domain samples are not involved during the pretraining process, so the mini-batch $B^{t}$ can be viewed as an empty set in the pretraining stage.

Under the guidance of the above overall loss, the pretrained model can be adapted to the target domain until it converges. The whole training process is described as follow (see Algorithm 1).

Algorithm 1:

Training process

4. Experiments

4.1. Datasets

All the experiments are carried out on two popular person ReID datasets Market-1501 [40] and DukeMTMC-reID [24] respectively. Market-1501 dataset contains 32668 labeled images of 1501 identities from disjoint cameras, in which the training set includes 751 identities and 12936 images, the gallery set contains 19732 images from 750 identities and the query set contains 3368 images from 750 identities. DukeMTMC-reID dataset is collected from 8 non-overlapping cameras, which includes 16522 images of 702 identities for training, 2228 images for query and 17661 images for gallery set. Based on above two datasets, two inverse domain adaptation tasks, Market-to-Duke (M2D) and Duke-to-Market (D2M), are designed for testing.

4.2. Settings

Implementation details: We adopt the Resnet-50 [12] pretrained on ImageNet as the backbone for networks F and $\tilde{F}$ . Compared with the original Resnet-50, the fully connected layer is removed. During the training process, the input images are uniformly resized to 256 × 128. In order to generate the augmented data, the traditional image processing operations including horizontal flipping, random cropping and erasing are adopted. The mini-batch size is set as $| B^{t} | = | B^{s} | = 60$ , in which each identity contains 4 different samples. As for the hyperparameters in Eq. (1), Eq. (2), Eq. (7) and Eq. (8) are empirically set as $α = 0.999$ , $μ = 0.5$ , $δ = 0.3$ , $τ^{t} = τ^{s} = 0.3$ , $β = γ = 1$ , respectively. The Adam optimizer is employed with the initialized learning rate $η = 0.00035$ . The whole training process is finished after 60 epochs, and the first epoch is utilized to pretrain the model using only the source domain dataset, and the other epochs are carried out on both the source and the target domains jointly. The model is built and trained on pytorch platform with the NVIDIA 2080 Ti GPU.

Evaluation metrics: In the testing phase, Cumulative Matching Characteristic (CMC) at Rank-1, Rank-5, Rank-10 and mean Average Precision (mAP) are applied to evaluate the performance of our model and the competing models.

4.3. Comparison with the state-of-the-art methods

We compare our model with some state-of-the-art works on both of the M2D and D2M tasks. The comparison results are shown in Table 1.

As shown in Table 1, our method is evaluated and compared with the state-of-the-art methods on the two domain adaptation tasks respectively. At first, we compare our work to the methods which attempt to reduce the gaps between target and source domains, including GAN-based models such as PTGAN [33], SPGAN [7], CR-GAN [5] and feature alignment-based methods such as DAAM [14], D-MMD [22] and SADA [32]. The comparison results show that our work outperforms these methods by a large margin about 13–21% (map:69.0% vs 55.8% on the M2D task, and 80.8% vs 59.8% on the D2M task).

Table 1
Performance (%) comparison with the state-of-the-art methods for UDA person ReID on M2D and D2M domain adaptation tasks

Methods M2D D2M

mAP R1 R5 R10 mAP R1 R5 R10

PTGAN [33](CVPR’18) _ 27.4 _ 50.7 _ 38.6 _ 66.1

SPGAN [7](CVPR’18) 22.3 41.1 56.6 63.0 22.8 51.5 70.1 76.8

ECN [41](CVPR’19) 40.4 63.3 75.8 80.4 43.0 75.1 87.6 91.6

CR-GAN [5](ICCV’19) 48.6 68.9 80.2 84.7 54.0 77.7 89.7 92.7

SSG [8](ICCV’19) 53.4 73.0 80.6 83.2 58.3 80.0 90.0 92.4

D-MMD [22](CVPR’20) 46.0 63.5 78.8 83.9 48.8 70.6 87.0 91.5

DAAM [14](AAAI’20) 48.8 71.3 82.4 86.3 53.1 77.8 89.9 93.7

SADA [32](CVPR’20) 55.8 74.5 85.3 88.7 59.8 83.0 91.8 94.1

NRMT [37](ECCV’20) 62.2 77.8 86.9 89.5 71.7 87.8 94.6 96.5

MMT [9](ICLR’20) 65.1 78.0 88.8 92.5 71.2 87.7 94.9 96.9

SpCL [10](NeurIPS’20) 68.8 82.9 90.1 92.5 76.7 90.3 96.2 97.7

MMT + RDSBN [2](CVPR’21) 66.6 80.3 89.1 92.6 81.5 92.9 97.6 98.4

UNRN [38](AAAI’21) 69.1 82.0 90.7 93.5 78.1 91.9 96.1 97.8

GLT [39](CVPR’21) 69.2 82.0 90.2 92.8 79.5 92.2 96.5 97.8

Ours 69.0 82.0 91.3 94.5 80.8 92.3 97.5 98.5

Methods	M2D	D2M
PTGAN [33](CVPR’18)	_	27.4	_	50.7	_	38.6	_	66.1
SPGAN [7](CVPR’18)	22.3	41.1	56.6	63.0	22.8	51.5	70.1	76.8
ECN [41](CVPR’19)	40.4	63.3	75.8	80.4	43.0	75.1	87.6	91.6
CR-GAN [5](ICCV’19)	48.6	68.9	80.2	84.7	54.0	77.7	89.7	92.7
SSG [8](ICCV’19)	53.4	73.0	80.6	83.2	58.3	80.0	90.0	92.4
D-MMD [22](CVPR’20)	46.0	63.5	78.8	83.9	48.8	70.6	87.0	91.5
DAAM [14](AAAI’20)	48.8	71.3	82.4	86.3	53.1	77.8	89.9	93.7
SADA [32](CVPR’20)	55.8	74.5	85.3	88.7	59.8	83.0	91.8	94.1
NRMT [37](ECCV’20)	62.2	77.8	86.9	89.5	71.7	87.8	94.6	96.5
MMT [9](ICLR’20)	65.1	78.0	88.8	92.5	71.2	87.7	94.9	96.9
SpCL [10](NeurIPS’20)	68.8	82.9	90.1	92.5	76.7	90.3	96.2	97.7
MMT + RDSBN [2](CVPR’21)	66.6	80.3	89.1	92.6	81.5	92.9	97.6	98.4
UNRN [38](AAAI’21)	69.1	82.0	90.7	93.5	78.1	91.9	96.1	97.8
GLT [39](CVPR’21)	69.2	82.0	90.2	92.8	79.5	92.2	96.5	97.8
Ours	69.0	82.0	91.3	94.5	80.8	92.3	97.5	98.5

Compared with clustering-based methods such as SSG [8] and ECN [41], our approach also leads the way in all the metrics on both the two adaptation tasks. Besides the above works, there are also some latest models which aim at addressing the noisy labels such as NRMT [37], SpCL [10], UNRN [38] and GLT [39], which have achieved remarkable performance. Though some methods show little advantages on scores of mAP or Rank-1, our method still has competitive results in other metrics. For example, GLT and UNRN have about 0.2% improvement on mAP score for M2D task, but our approach outperforms them by a large margin on the D2M task. Similarly, the scores of MMT + RDSBN on D2M task is good, but its performance drops dramatically on M2D task. And the same phenomenon happens on the Rank1 score. All of these methods can achieve better performance on the single task, but their performance drops significantly when they are adopted on the other task. Compared with these methods, our method has better generalization and our approach performs well on both of the M2D and D2M tasks, so the university of our approach is better than them and we provide a solid and universal baseline for future research on UDA person ReID tasks.

4.4. Analysis on DSSM algorithm

We design a group of experiments to validate our proposed Doubly Stochastic Subdomain Mining (DSSM) algorithm. Two different sampling schemes, including “Without Subdomain Mining (WSM)” and “Single Stochastic Subdomain Mining (SSSM)”, are presented as comparative experiments to study the effect of DSSM, where WSM refers to using all the training samples of the target domain for clustering and training, SSSM refers to that the subdomain sampling ratio $r_{k}$ is fixed as the mean value μ rather than the random value generated from the normal distribution $N (μ, δ^{2})$ . By assigning different values to μ, nine different experiments are implemented to compare our proposed DSSM algorithm with WSM and SSSM under our proposed R-MHS algorithm. As shown as the results in Table 2, the first row corresponds to WSM, rows 2 to 5 show the results achieved using SSSM with different values of μ; and the last four rows are for our proposed DSSM algorithm.

Table 2
Performance (%) of the model on the M2D and D2M tasks under different experimental settings, in which the parameter μ represents the fixed sample rate of SSSM and the mean in normal distribution of DSSM

Sampling μ M2D D2M

mAP R1 R5 R10 mAP R1 R5 R10

WSM 1 64.0 79.6 89.3 92.3 64.1 84.8 93.1 95.6

SSSM 0.4 66.8 80.2 89.5 92.3 75.9 90.3 96.4 97.2

SSSM 0.5 67.7 81.7 91.0 93.7 78.6 91.7 96.9 98.1

SSSM 0.6 67.9 81.9 90.8 93.4 79.8 91.5 97.4 98.2

SSSM 0.7 66.5 80.7 90.4 93.0 79.0 92.0 97.3 98.4

DSSM 0.4 68.8 82.4 91.2 93.5 80.6 92.2 97.4 98.1

DSSM 0.5 69.0 82.0 91.3 94.5 80.8 92.3 97.5 98.5

DSSM 0.6 68.7 82.6 91.1 93.3 80.9 92.1 97.5 97.9

DSSM 0.7 68.9 82.2 91.1 93.4 79.9 91.7 97.3 98.2

It can be observed clearly that either SSSM or DSSM outperforms the scheme of WSM by a large margin, which proves that the stochastic sampling strategy can improve the performance of the model significantly. It is because that the stochastic sampling can counteract the misleading of noisy labels across different epochs, avoid converging to suboptimal clustering results, and consequently prevent the model from falling into local minima. From another perspective, each round of sampling can be regarded as the knowledge mining for a subdomain, which makes it equal to the ensemble learning across the whole target domain and displays better generalization ability of the model.

In addition, the comparison results between DSSM and SSSM reveals the advantages of our proposed DSSM algorithm on precision and stability. To be specific, DSSM achieves about 1% improvement on mAP for both of the M2D and D2M tasks. We believe that this advantage derives from that the DSSM algorithm adopts a random sampling ratio $r_{k}$ , which provides better diversity for different rounds of subdomain mining, which is beneficial to both model optimization and ensemble learning strategies. From the optimization view, the greater the difference in the distribution of subdomains between different rounds, the easier it is for the model to escape from the local minimum traps of the previous round. From the theory of ensemble learning, the stronger the diversity, the smaller the knowledge redundancy between subdomains, and the stronger the generalization ability of the model. Therefore, the DSSM algorithm can achieve better performance on the target domain.

Besides, the comparison results show that the proposed DSSM is more robust than SSSM in terms of the hyperparameter μ. For example, when the parameter μ changes from 0.6 to 0.4, the mAP scores of SSSM drop about 1.1–3.9% (map:67.9% vs 66.8% on the M2D task, and 79.8% vs 75.9% on the D2M task); and the same performance degradation also occurs on Rank-1 score. However, when the same change of the hyperparameter μ happens to DSSM, the performance of our model remains stable throughout. Due to the sampling rate $r_{k}$ of DSSM has great randomness instead of strongly depending on a certain parameter, therefore it has better robustness against parameter changing and it is more suitable for practical applications.

4.5. Ablation study

The experiments described above only discussed the impact of the DSSM algorithm on the model performance. In this section, the combinations of different sampling algorithms and reweighting algorithms are used to verify the effectiveness of our proposed method in this paper. We design three groups of experiments to validate the proposed method by comparing with different ablation settings on sampling and reweighting algorithms. In the first group, SSSM and DSSM are compared without employing sample reweighing algorithm, which can be regarded as a baseline model. In the second group, three different sample reweighing schemes, including without reweighting, a classic reweighing algorithm according to CleanNet [16] and our proposed R-MHS algorithm, are compared by combining them with DSSM respectively. A point to note, CleanNet reweights samples based on the similarity between the sample and its corresponding class prototype. This group of experiments are designed to evaluate the influence of our proposed R-MHS reweighting algorithm. In the third group, we combine the proposed R-MHS method with DSSM and SSSM jointly to evaluate the overall performance of our model.

Table 3
Ablation studies on the effectiveness of our proposed R-MHS and DSSM on the domain adaptation tasks between Market and Duke

Group Sampling Reweighting M2D D2M

mAP R1 R5 R10 mAP R1 R5 R10

1 SSSM ✗ 65.6 80.9 90.2 92.6 78.3 91.1 97.0 98.2

DSSM ✗ 66.8 81.1 90.8 93.5 78.9 92.0 97.2 98.3

2 DSSM ✗ 66.8 81.1 90.8 93.5 78.9 92.0 97.2 98.3

DSSM CleanNet 67.2 81.2 90.9 93.8 79.2 92.2 97.0 98.4

DSSM R-MHS 69.0 82.0 91.3 94.5 80.8 92.3 97.5 98.5

3 SSSM ✗ 65.6 80.9 90.2 92.6 78.3 91.1 97.0 98.2

SSSM R-MHS 67.9 81.9 90.8 93.4 79.8 91.5 97.4 98.2

DSSM ✗ 66.8 81.1 90.8 93.5 78.9 92.0 97.2 98.3

DSSM R-MHS 69.0 82.0 91.3 94.5 80.8 92.3 97.5 98.5

Group	Sampling	Reweighting	M2D	D2M
1	SSSM	✗	65.6	80.9	90.2	92.6	78.3	91.1	97.0	98.2
DSSM	✗	66.8	81.1	90.8	93.5	78.9	92.0	97.2	98.3
2	DSSM	✗	66.8	81.1	90.8	93.5	78.9	92.0	97.2	98.3
DSSM	CleanNet	67.2	81.2	90.9	93.8	79.2	92.2	97.0	98.4
DSSM	R-MHS	69.0	82.0	91.3	94.5	80.8	92.3	97.5	98.5
3	SSSM	✗	65.6	80.9	90.2	92.6	78.3	91.1	97.0	98.2
SSSM	R-MHS	67.9	81.9	90.8	93.4	79.8	91.5	97.4	98.2
DSSM	✗	66.8	81.1	90.8	93.5	78.9	92.0	97.2	98.3
DSSM	R-MHS	69.0	82.0	91.3	94.5	80.8	92.3	97.5	98.5

As shown in Table 3, the results of the first group show that the proposed pipeline based on mean teacher architecture and stochastic sampling can provide a great baseline for UDA person ReID task. Besides, the proposed DSSM algorithm has a little advantage over SSSM even without sample reweighting. The results of second group reveal that our proposed R-MHS reweighting algorithm is effective, and it is also superior to the popular reweighting algorithm CleanNet. Comparing with the baseline without reweighting, our approach achieves 2.2% improvement on mAP score on the M2D task and 1.9% on the D2M task. The CleanNet method can boost the performance on both of the domain adaptation tasks too, but it only has limited improvements due to the generation of a class’s prototype is based on all the samples in the same cluster and once there exist noisy samples, it can affect all the weights in this cluster. However, the proposed R-MHS takes the correlations between all the samples of subdomain into account, it can suppress the negative effect of noisy labels better. The results of the third group of experiments demonstrate that the R-MHS algorithm proposed in this paper achieves about 2% improvement on mAP score when combines it with both of the stochastic sampling strategies SSSM and DSSM. Among them, the combined setting of R-MHS + DSSM achieves the best performance, it shows that the R-MHS algorithm and DSSM algorithm can not only boost performance when they are adopted alone, but also can further improve the performance when they work together. Through the joint experiments, we achieve the state-of-the-art performance when the DSSM and R-MHS are all employed to the model training.

5. Conclusion

In this paper, we propose a novel method to address the local minimum traps and noisy labels in unsupervised domain adaptive person re-identification, which adopts doubly stochastic subdomain mining to supply different sample subdomains for each training round, it is helpful to prevent the model from falling into local minimum traps. And we also propose a new sample reweighting algorithm to reduce the negative impact caused by noisy pseudo labels. We introduced the knowledge mining in subdomains and the sample reweighting method base on our proposed Maximal Heterogeneous Similarity. Extensive experiments prove the effectiveness and superiority of different components, and the results indicate that our method outperforms most state-of-the-art methods on popular datasets. We also show that our method is robust and then it is feasible to be applied in real-world scenes.

Footnotes

Acknowledgement

This work was supported in part by The Fundamental Research Funds for the Central Universities (N2104027), Innovation Fund of Chinese Universities Industry University Research (2020HYA06003), Guangdong Basic and Applied Basic Research Foundation (2021B1515120064).

References

Bäcklund,

Hedblom and

Neijman, A density-based spatial clustering of application with noise, Data Mining TNM033 (2011), 11–30.

Bai,

Wang,

Hu and

Ding, Unsupervised multi-source domain adaptation for person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12914–12923.

Brutzkus and

Globerson, Globally optimal gradient descent for a convnet with Gaussian inputs, in: International Conference on Machine Learning, PMLR, 2017, pp. 605–614.

Chen,

Lu,

Lu and

Zhou, Deep credible metric learning for unsupervised domain adaptation person re-identification, in: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII, Vol. 16, Springer, 2020, pp. 643–659.

Chen,

Zhu and

Gong, Instance-guided context rendering for cross-domain person re-identification, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 232–242.

Cheng,

Zhu,

Li,

Gong,

Sun and

Liu, Learning with instance-dependent label noise: A sample sieve approach, 2020, arXiv preprint arXiv:2010.02347.

Deng,

Zheng,

Ye,

Kang,

Yang and

Jiao, Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 994–1003.

Fu,

Wei,

Wang,

Zhou,

Shi and

T.S.

Huang, Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6112–6121.

Ge,

Chen and

Li, Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification, in: International Conference on Learning Representations, 2020.

10.

Ge,

Zhu,

Chen,

Zhao and

Li, Self-paced contrastive learning with hybrid memory for domain adaptive object re-id, in: Advances in Neural Information Processing Systems, 2020.

11.

Han,

Yao,

Yu,

Niu,

Xu,

Hu,

Tsang and

M.S.

Sugiyama, Co-teaching: Robust training of deep neural networks with extremely noisy labels, Advances in neural information processing systems 31 (2018).

12.

He,

Zhang,

Ren and

Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

13.

Huang,

Yang,

Chen,

Zhao,

Huang,

Lin,

Huang and

Du, Eanet: Enhancing alignment for cross-domain person re-identification, 2018, arXiv preprint arXiv:1812.11369.

14.

Huang,

Peng,

Li,

Jin,

Xing and

Ge, Domain Adaptive Attention Model for Unsupervised Cross-Domain Person Re-Identification, AAAI, 2020.

15.

Kawaguchi, Deep learning without poor local minima, Advances in neural information processing systems 29 (2016).

16.

K.-H.

Lee,

He,

Zhang and

L.Y.

Cleannet, Transfer learning for scalable image classifier training with label noise, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5447–5456.

17.

Li,

Zhang,

Tian,

Wang and

Gao, Pose-guided representation learning for person re-identification, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.

18.

Li,

Zhao,

Xiao and

Wang, Deepreid: Deep filter pairing neural network for person re-identification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.

19.

Liang,

Sun,

J.D.

Lee and

Srikant, Adding one neuron can eliminate all bad local minima, Advances in Neural Information Processing Systems 31 (2018).

20.

Lin,

Dong,

Zheng,

Yan and

Yang, A bottom-up clustering approach to unsupervised person re-identification, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 8738–8745.

21.

Liu,

Z.-J.

Zha,

Chen,

Hong and

Wang, Adaptive transfer network for cross-domain person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7202–7211.

22.

Mekhazni,

Bhuiyan,

Ekladious and

Granger, Unsupervised domain adaptation in the dissimilarity space for person re-identification, in: European Conference on Computer Vision, Springer, 2020, pp. 159–174.

23.

Nguyen and

Hein, Optimization landscape and expressivity of deep cnns, in: International Conference on Machine Learning, PMLR, 2018, pp. 3730–3739.

24.

Ristani,

Solera,

Zou,

Cucchiara and

Tomasi, Performance measures and a data set for multi-target, multi-camera tracking, in: European Conference on Computer Vision, Springer, 2016, pp. 17–35.

25.

Shan,

Li,

C.T.

Li and

A.C.

Kot, Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification, in: BMVC 2018, 2018.

26.

Shu,

Xie,

Yi,

Zhao,

Zhou,

Xu and

Meng, Meta-weight-net: Learning an explicit mapping for sample weighting, Advances in neural information processing systems 32 (2019).

27.

Su,

Li,

Zhang,

Xing,

Gao and

Tian, Pose-driven deep convolutional model for person re-identification, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3960–3969.

28.

Sun,

Zheng,

Yang,

Tian and

Wang, Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline), 2018, pp. 480–496.

29.

Swirszcz,

W.M.

Czarnecki and

Pascanu, Local minima in training of neural networks, 2016, arXiv preprint arXiv:1611.06310.

30.

C.-P.

Tay,

Roy and

K.-H.Y.

Aanet, Attribute attention network for person re-identifications, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7134–7143.

31.

Tian, An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis, in: International Conference on Machine Learning, PMLR, 2017, pp. 3404–3413.

32.

Wang,

J.-H.

Lai,

Liang and

Wang, Smoothing adversarial domain attack and p-memory reconsolidation for cross-domain person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10568–10577.

33.

Wei,

Zhang,

Gao and

Tian, Person transfer gan to bridge domain gap for person re-identification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 79–88.

34.

H.-X.

Yu,

W.-S.

Zheng,

Wu,

Guo,

Gong and

J.-H.

Lai, Unsupervised person re-identification by soft multilabel learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2148–2157.

35.

Zeng,

Ning,

Wang and

Guo, Hierarchical clustering with hard-batch triplet loss for person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13657–13665.

36.

Zhai,

Ye,

Lu,

Jia,

Ji and

Tian, Multiple expert brainstorming for domain adaptive person re-identification, in: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII, Vol. 16, Springer, 2020, pp. 594–611.

37.

Zhao,

Liao,

G.-S.

Xie,

Zhao,

Zhang and

Shao, Unsupervised domain adaptation with noise resistible mutual-training for person re-identification, in: European Conference on Computer Vision, Springer, 2020, pp. 526–544.

38.

Zheng,

Lan,

Zeng,

Zhang and

Z.-J.

Zha, Exploiting sample uncertainty for domain adaptive person re-identification, in: AAAI 2021, AAAI, 2021.

39.

Zheng,

Liu,

He,

Mei,

Luo and

Z.-J.

Zha, Group-aware label transfer for domain adaptive person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5310–5319.

40.

Zheng,

Shen,

Tian,

Wang,

Wang and

Tian, Scalable person re-identification: A benchmark, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1116–1124.

41.

Zhong,

Zheng,

Luo,

Li and

Yang, Invariance matters: Exemplar memory for domain adaptive person re-identification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 598–607.

42.

Zhou,

Yang,

Zhang,

Liang and

Tarokh, Sgd converges to global minimum in deep learning via star-convex path, 2019, arXiv preprint arXiv:1901.00451.

43.

Zou,

Yang,

Yu,

B.V.K.

Vijaya Kumar and

Kautz, Joint disentangling and adaptation for cross-domain person re-identification, in: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII, Vol. 16, Springer, 2020, pp. 87–104.

Sampling	μ	M2D				D2M

		mAP	R1	R5	R10	mAP	R1	R5	R10
WSM	1	64.0	79.6	89.3	92.3	64.1	84.8	93.1	95.6
SSSM	0.4	66.8	80.2	89.5	92.3	75.9	90.3	96.4	97.2
SSSM	0.5	67.7	81.7	91.0	93.7	78.6	91.7	96.9	98.1
SSSM	0.6	67.9	81.9	90.8	93.4	79.8	91.5	97.4	98.2
SSSM	0.7	66.5	80.7	90.4	93.0	79.0	92.0	97.3	98.4
DSSM	0.4	68.8	82.4	91.2	93.5	80.6	92.2	97.4	98.1
DSSM	0.5	69.0	82.0	91.3	94.5	80.8	92.3	97.5	98.5
DSSM	0.6	68.7	82.6	91.1	93.3	80.9	92.1	97.5	97.9
DSSM	0.7	68.9	82.2	91.1	93.4	79.9	91.7	97.3	98.2