Boosting active domain adaptation with exploration of samples

Abstract

Nowadays, the idea of active learning is gradually adopted to assist domain adaptation. However, due to the existence of domain shift, the traditional active learning methods originating from semi-supervised scenarios can not be directly applied to domain adaptation. To solve the problem, active domain adaptation is proposed as a new domain adaptation paradigm, which aims to improve the performance of the model by annotating a small amount of target domain samples. In this regard, we propose an active domain adaptation method named Boosting Active Domain Adaptation with Exploration of Samples (BADA), dividing Active DA into two related issues: sample selection and sample utilization. We design the instability selection criterion based on predictive consistency and the diversity selection criterion. For the remaining unlabeled samples, we design a self-training framework, which screens out reliable samples and unreliable samples through the sample screening mechanism similar to selection criteria. And we adopt respective loss functions for reliable samples and unreliable samples. Experiments show that BADA remarkably outperforms previous active learning methods and Active DA methods on several domain adaptation datasets.

Keywords

Domain adaptation active learning active domain adaptation self-training

1. Introduction

Deep neural networks have achieved impressive performance in various scenarios such as image classification [1, 2] and semantic segmentation [3, 4] by training a large amount of supervised data. However, these networks, which depend on the utilization of the supervised data in the single domain, can not be well generalized to the unsupervised domain with different data distributions [5]. To solve the problem, unsupervised domain adaptation (UDA) [6, 7] aims to transfer the knowledge of the supervised source domain to the unsupervised target domain. Nonetheless, there is still a wide gap between the results of UDA methods and that of fully supervised learning [8]. In reality, it is feasible to label a small number of target domain samples. Therefore, the paradigm of labeling samples to improve the performance of domain adaptation defined as Active Domain Adaptation (Active DA) has attracted the attention of researchers [9].

Active learning focuses on how to select the most informative samples to improve the performance of the model [10, 11]. Most active learning algorithms design sample selection criteria from the perspectives of the uncertainty and diversity of unlabeled samples. The uncertainty of the sample usually depends on the classification results of the model [12, 13], and the diversity of the sample is regarded as its representativeness [11, 14]. Although existing sample selection criteria show superior performance in active learning tasks, directly applying these criteria to domain adaptation is infeasible. Due to domain shift, traditional uncertainty and diversity estimation on the target domain may be miscalibrated, which leads to sub-optimal sample selection [15].

Figure 1.

Comparison between Active DA methods pipeline and our proposed pipeline. The recent some Active DA methods achieve initial domain alignment through domain adversarial learning, followed by sample selection, and finally combine with source domain samples for supervised training. Our proposed method first selects informative samples without pre-alignment, and then screens out appropriate samples for self-training.

In order to overcome the negative effects of domain shift in Active DA, recent work in Active DA has sought an appropriate pipeline that can fit domain adaptation and sample selection. AADA [16], the state-of-the-art Active DA method, proposed the pipeline of Active DA. As shown in Fig. 1, its pipeline was to align the source domain and target domain through domain adversarial learning [17], and then designed complex sample selection criteria for unlabeled samples based on the output of the domain discriminator and the entropy. The pipelines of subsequent Active DA methods, such as TQS [18] and S3VAADA [19], are similar in that the target domain is originally well aligned with the source domain through DANN [20] before sample selection. The difference is that they design different sample selection strategies. On the whole, these Active DA methods select samples to label from the uncertainty and diversity of samples, which inherits the perspective of sample selection in traditional active learning. And these methods follow the same pipeline, that is, sample selection is performed on the basis of domain adversarial training.

Although the existing methods have achieved good results, we believe that there are two problems to be solved. First, most of Active DA methods rely on domain adversarial learning, which inspires us to design effective sample selection strategies that can overcome domain shift without relying on adversarial learning. The other is that the information on available unlabeled samples is not fully utilized, and it is hoped to make full use of the information on remaining unlabeled samples. In fact, there are Active DA methods that try to combine semi-supervised domain adaptation methods to explore the value. However, due to the difference between the experimental settings of semi-supervised DA and Active DA, the expected results may not be achieved. In general, the existing Active DA methods do not significantly exploit the value of the remaining samples after sample selection, and making full use of these unlabeled samples can further boost the performance.

In order to solve the above problems, we propose a new method named Boosting Active Domain Adaptation with Exploration of Samples (BADA). Its pipeline is shown in Fig. 1. In sample selection, to avoid sub-optimal selection caused by using an uncalibrated model under domain shift, we design consistency-based instability criteria based on the consistency learning paradigm. Specifically, the instability of the sample is estimated based on the predictive consistency between the sample and its label-preserved transformed versions, and the sample with high inconsistency is preferential to be labeled. The insight behind this is that consistency checks based on data augmentation which is considered as a way of perturbation injection can effectively detect initial errors of the model [21]. Secondly, to overcome the redundancy of selected samples, we further consider the diversity of the sample. To be specific, we estimate the local density of the sample by calculating the distance between the sample and its neighbors in the feature space and then evaluate the diversity of the sample with entropy as uncertainty weight. Finally, for the remaining samples, we design a self-training framework. Based on the predictive consistency checks, we screen out the samples with high stability among reliable samples and assign pseudo-labels to these samples for self-training. In addition, for the remaining unreliable samples, we further develop negative learning loss to enhance the performance of the model.

In general, the contributions of this work are as follows: firstly, we have designed simple and effective sample selection criteria, which can select the most valuable samples that are worth labeling. Our proposed sample selection criteria take the instability and the diversity of the sample into account. Secondly, we design the self-training framework to boost performance. By screening out reliable samples and unreliable samples, our proposed framework can further explore and utilize the value of samples. Finally, through conducting extensive experiments on mainstream datasets, we show that BADA can achieve excellent performance.

The rest of this article is organized as follows. We first retrospect the related work in Section 2, and then describe our proposed method in detail in Section 3. Next, the results of our experiments are presented and discussed in Section 4. Finally, we summarize our article in Section 5.

2. Related work

2.1 Domain adaptation

Domain adaptation aims to transfer the knowledge of the source domain to the target domain. A typical line of approaches [22] mine domain invariant features based on adversarial learning to achieve domain alignment between the source domain and the target domain, and another typical line of approaches improve feature alignment based on the clustering hypothesis [23] through the minimization of conditional entropy [24, 25]. However, there is still a large gap between the performance of these methods and the full supervision.

There exist some researches that assign pseudo-labels to samples with high confidence for self-training [26, 27]. These methods also show that partially pseudo-labeling the target domain samples can effectively improve the performance. In practice, it is feasible to give ground-truth labels to some samples. Inspired by active learning, we can select the most valuable samples for labeling on the target domain.

2.2 Active learning

Active learning aims to improve model performance by selecting the samples with the most information. They can be categorized into two mainstream sampling criteria: uncertainty sampling and diversity sampling. Uncertainity sampling methods identify the most ambiguous samples under the current model. The reason behind this is that by labeling the samples whose classification results are ambiguous, the model can learn more useful knowledge in the next round of training. These methods usually evaluate the uncertainty of the samples by relying on the confidence, the entropy [28, 29] or classification margin [30, 31]. Some methods measure the predictive consistency of the samples under ensemble models [32, 33] as the basis of uncertainty. Diversity sampling methods usually pre-cluster the unlabeled samples in the feature space in advance and annotate the most representative samples [34, 35]. There also exist approaches to combine the two, selecting samples with both uncertainty and diversity. BADGE [36] explored the diversity and uncertainty of samples in gradient embedding space, running KMeans $++$ on hallucinated gradient embeddings.

However, these traditional active learning algorithms were all designed in the single-domain, without overcoming domain shift. For this reason, we propose BADA, which labels the samples according to the instability and diversity of the samples.

2.3 Active domain adaptation

With the development of deep learning in domain adaptation, AADA [16] started with pre-alignment via DANN [20], and selected samples by measuring the entropy of samples and the domain similarity based on the domain discriminator. TQS [18], after pre-alignment, designed rather complex selection criteria that combined predictive consistency, margin, and domainness of under ensemble classifiers to select worthy samples. S3VAADA [19] designed a score function based on the sensitivity of samples to their adversarial perturbation, the diversity and representativeness to construct the annotated sample candidate pool after virtual adversarial domain adaptation. CLUE [37], similar to BADGE, selected samples by entropy-weighted clustering algorithm and also combined with MME [38]. SDM-AG [39] proposed selection criterion by calculating the distance between the sample and different categorical clusters.

The above some Active DA methods adopted similar pipeline. Our proposed method first selects samples from the instability and the diversity without pre-alignment, and then screens out reliable samples and unreliable samples for self-training.

3. Method

In this section, we describe BADA in detail. Firstly, we propose simple and effective sample selection criteria to select samples. Then, we explore the value of the remaining unlabeled samples.

In Active DA setting, we have a fully labeled source domain defined as ${{\cal D}_{s}=\{(x_{i}^{s},y_{i}^{s})\}_{i=1}^{N_{s}}}$ and a target domain with a large distribution difference from the source domain defined as ${{\cal D}_{t}=\{x_{i}^{t}\}_{i=1}^{N_{t}}}$ . Here ${N_{s}}$ and ${N_{t}}$ denote the number of source and target domain samples. At the same time, we have a labeling budget ${B}$ for annotating target domain unlabeled samples, which is far less than ${N_{t}}$ . After sample selection, the target domain will be divided into labeled domain ${{\cal D}_{tl}=\{(x_{i}^{tl},y_{i}^{tl})\}_{i=1}^{N_{tl}}}$ and unlabeled domain ${{\cal D}_{tu}=\{x_{i}^{tu}\}_{i=1}^{N_{tu}}}$ , where ${N_{tl}}$ and ${N_{tu}}$ denote the number of labeled and unlabeled target domain samples. Our goal is to learn a neural network mapping function ${h_{\theta}=f_{\theta}(g_{\theta}(x))}$ to boost performance with fewer annotations, where ${g_{\theta}}$ is the function that maps input to embedding and ${f_{\theta}}$ does final classification.

As shown in Fig. 2, we divide Active DA into two phases: (1) fully supervised training and (2) self-training. In the first phase, appropriate samples are selected through the designed sample selection criteria, and these labeled samples are trained along with the source domain supervised samples; In the second phase, we explore the value contained in unlabeled samples for self-training.

Figure 2.

The architecture of BADA, which contains two phases. The first phase is the fully supervised training of source domain samples and labeled target domain samples. Select the samples to be labeled through sample selection criteria, and then combine the source domain samples for fully supervised learning. For the remaining unlabeled samples in the target domain, samples can be split into reliable samples and unreliable samples based on sample screening mechanism. We adopt respective loss functions for different samples.

3.1 Consistency-based instability criterion

Traditional active learning methods use top-1 softmax confidence to measure the uncertainty of the sample under the model, and then label samples with high uncertainty. However, when in domain adaptation, the evaluation results of the sample by traditional selection criteria could be unreliable due to domain shift. Therefore, most of the current mainstream active domain adaptation methods usually achieve domain alignment through adversarial training before sample selection.

On the contrary, our proposed sample selection criteria do not need to align the source domain with the target domain initially. In order to avoid the error calibration problem of traditional active learning methods in the case of domain shift, we do not consider the uncertainty estimation of samples, but rather consider the instability of samples themselves. The stability of the sample refers specifically to the degree of consistency of the classification results of the sample and its label-preserving transformed versions. The insight is that firstly data augmentations can be seen as a way to inject perturbation to samples [40], and samples with high stability are less affected by perturbation; Secondly, for given classifier, the performance on highly stable samples is still robust despite the existence of noise.

We suggest using the predictive inconsistency as a more stable sample selection criterion, namely Consistency-based Instability Criterion. Specifically, given the sample ${x}$ , we generate $k$ augmentations $\{\alpha_{1}(x),\alpha_{2}(x),\ldots,\alpha_{k}(x)\}$ and get class probability vectors $\{p(\alpha_{1}(x)),p(\alpha_{2}(x)),\ldots,p(\alpha_{k}(x))\}$ through ${h_{\theta}}$ . We measure the instability of the sample as follows:

$\displaystyle\textit{Ins}(x)=\sqrt{{\textstyle\frac{\sum_{i=1}^{k}\left\|p(% \alpha_{i}(x))-{\textstyle\frac{1}{k}}\sum_{j=1}^{k}p(\alpha_{j}(x))\right\|}{% k}}}$ (1)

We use standard deviation to calculate the inconsistency of predictions. With Eq. (1), we can calculate the instability score for all unlabeled samples. We select the samples with the top ${t\%}$ scores as the candidates to be labeled.

3.2 Uncertainty-based diversity criterion

By performing CIC, we can initially construct a candidate pool of samples with high instability or out of distribution. However, if only rely on CIC for selecting samples, it will cause redundancy in sample selection, resulting in a pool of samples with similar properties. In addition to samples with high instability, there are still many informative samples that have not been selected. To overcome redundancy as much as possible and further select samples that are worth labeling, we need to select the most diverse samples among candidate pool.

The diversity of the sample usually refers to its representativeness in unlabeled samples. The distance between the sample with high diversity and its adjacent samples in the feature space can be compact, and the local density of the sample can be large. In this paper, we measure the diversity based on the local density of the sample in the feature space. Specifically, given an unlabeled sample ${x}$ , we evaluate the density of the sample by calculating the sum of the inverse euclidean distance between the sample and its neighbors:

$\displaystyle\textit{Den}(x)={\textstyle\frac{1}{M}}\sum\limits^{M}_{i=1}{% \textstyle\frac{1}{1+||g(x)-g(x_{i})||^{2}}}$ (2)

where ${M}$ denotes the number of the neighbors, ${g(x)}$ and ${g(x_{i})}$ denote the feature representations of ${x}$ and its neighbor $x_{i}$ .

In addition, entropy can not only reflect the uncertainty and information of the sample, but also measure the domainness of the sample [37]. The domainness of the sample represents how private the sample is to the target domain. Since the model is initially trained by the source domain samples, the sample easily classified by the model is biased to the source domain and less private to the target domain. We use the entropy of the sample and its augmentations as the weighting factor, so that the criterion can not only estimate the diversity, but also capture the uncertainty and domainness. The score of Uncertainty-based Diversity Criterion is calculated by the following formula:

$\displaystyle\textit{Div}(x)={\textstyle\frac{1}{k+1}}\left[H(p(x))+\sum^{k}_{% i=1}H(p(\alpha_{i}(x)))\right]\textit{Den}(x)$ (3)

where $H(p(x))$ denotes the predictive entropy of the sample $x$ .

In the process of sample selection, we first obtain the instability score of each sample through CIC, According to CIC, the samples with the top ${t\%}$ scores are selected to construct the candidate samples, and then Uncertainty-based Diversity Criterion (UDC) is performed among candidate samples. Then we can determine the final selected samples.

The final selected target domain samples and source domain samples participate in supervised training. The objective function is defined as:

$\displaystyle\min\limits_{h_{\theta}}L_{ce}=\mathbb{E}_{(x,y)\thicksim D_{s}% \cup D_{tl}}L_{ce}(p(x),y)$ (4)

3.3 Self-training

Through sample selection, we determine the samples that are ultimately worth labeling. For the remaining large amount of unlabeled samples, there are few Active DA methods to explore the value thoroughly. Due to the difference between the experimental setup of semi-supervised DA and that of Active DA, the effect of directly applying semi-supervised DA methods, such as MME [38], may not be as good as expected. To this end, we propose a self-training framework for the remaining samples. We divide the remaining samples into reliable samples and unreliable samples through the sample screening mechanism, and based on this, we design respective loss functions.

Reliable Samples

Usually, DA methods based on self-training manually set the confidence threshold to screen out reliable samples. The reliable samples can be defined as:

$\displaystyle\left\{x\in{D_{tr}}|\max\limits_{i}p^{i}(x)>r,i=1,2,\ldots,C\right\}$ (5)

where ${D_{tr}}$ denotes the set of reliable samples, $C$ denotes the number of classes and $p^{i}(x)$ denotes the probability that the target sample $x$ belongs to the i-th class and ${r}$ denotes the confidence threshold. We can generate the pseudo-label $\hat{y}$ of the sample $x$ by the following formula:

$\displaystyle\hat{y}=\arg\max\limits_{i}p^{i}(x)$ (6)

This method is simple and effective, but it can not overcome the error caused by domain shift. Therefore, we need to further screen out more reliable samples and its pseudo-labels. Due to the superiority of CIC on estimating the instability of the sample, we adopt the similar way to evaluate the instability of $p^{i}(x)$ by the following formula:

$\displaystyle\textit{Ins}(p^{i}(x))=\sqrt{{\textstyle\frac{\sum_{j=1}^{k}\left% |p^{i}(\alpha_{j}(x))-{\textstyle\frac{1}{k}}\sum_{{j^{\prime}}=1}^{k}p^{i}(% \alpha_{j^{\prime}}(x))\right|}{k}}}$ (7)

So combined with the instability of pseudo-labels, the sample screening mechanism of reliable samples can be defined as:

$\displaystyle\left\{x\in{D_{tr}}|\max\limits_{i}p^{i}(x)>r\wedge\textit{Ins}(p% ^{i}(x))\leqslant\kappa\right\}$ (8)

where $\kappa$ denotes the instability threshold and the value of $\kappa$ is between 0 and 1. When the instability of the pseudo-label is greater than $\kappa$ , we believe that the sample is unreliable and can not participate in the training. It is worth noting that $\kappa$ is dynamically improved by $\Delta\kappa$ after each round of sample selection. The insight behind this is that the model after supervised training with gradually labeling budget has better classification performance for those with low instability in the original high confidence samples, and these samples cannot provide additional useful knowledge for the model. By gradually relaxing $\kappa$ , more samples with high reliability can be selected to participate in the training while ensuring the accuracy of the pseudo-labels, which makes the model more generalized.

The objective function for reliable samples is defined as:

$\displaystyle L_{PL}=\mathbb{E}_{x\thicksim D_{tr}}L_{ce}(p(x),\hat{y})$ (9)

By optimizing Eq. (9), the probability of the pseudo-label $\hat{y}$ of the sample $x$ is pushed to 1, and the probability of other labels of the sample $x$ is pushed to 0, which can be called positive learning.

Unreliable Samples

For left samples that are not belong to $D_{tl}$ and $D_{tr}$ , namely unreliable samples, we focus on exploring the value of complementary labels of these samples. Specifically, if the number of the classes $C$ is 3, the prediction of the sample ${x}$ is [0.47, 0.47, 0.06]. According to Eq. (5), it is considered that the reliability of the sample is not enough to give the pseudo-label. However, from the classification predictions, we can also know that the sample does not belong to the label with the classification probability of 0.06 to a large extent, and this label is called the complementary label. We generate complementary labels by the following formula:

$\displaystyle N(p^{i}(x))=\left\{\begin{array}[]{ll}1,&\text{if }p^{i}(x)<\tau% ,\\ 0,&\text{otherwise}.\end{array}\right.$ (10)

where $\tau$ denotes the threshold of complementary labels. The negative learning loss function for unreliable samples is defined as:

$\displaystyle L_{NL}={\textstyle\frac{-1}{S}}\sum\limits_{i=1}^{N_{\textit{tur% }}}\sum\limits_{j=1}^{C}N(p^{j}(x_{i}))\log(1-p^{j}(x_{i}))$ (11)

where ${S}$ denotes the number of complementary labels of all unreliable samples practicing in self-training and ${N_{\textit{tur}}}$ denotes the number of unreliable samples. By optimizing Eq. (11), the probability of complement labels can be optimized to 0, and the value of probabilities of other classes can increase accordingly.

: BADA Algorithm[1] Labeld source domain $\mathcal{D}_{s}$ , unlabeled target domain $\mathcal{D}_{tu}$ , labeled target domain $\mathcal{D}_{tl}={\emptyset}$ , a labeling budget $B$ , selection rounds $R$ and selection ratio $t$ . model $h_{\theta}$ . $i=1$ to $R$ $\forall x\in\mathcal{D}_{tu}$ , compute $\textit{Ins}(x)$ via Eq. (1); $\mathcal{D}_{tl}\leftarrow$ select $t{\%}$ of Ins with the highest scores; $\forall x\in\mathcal{D}_{tl}$ , compute $\textit{Div}(x)$ via Eq. (3); $\mathcal{D}_{tl}\leftarrow$ select $B/R$ of Div with the highest scores; Update $\mathcal{D}_{tl}$ and $\mathcal{D}_{tu}$ ; Update $h_{\theta}$ via Eq. (4); $\forall x\in\mathcal{D}_{tu}$ , compute the instability of its pseudo-label via Eq. (7); $\mathcal{D}_{tr}\leftarrow$ select reliable samples via Eq. (8); $\forall x\notin\mathcal{D}_{tl}\cup D_{tr}$ , generate its complementary labels via Eq. (10); Update $h_{\theta}$ via Eq. (12);

To be clear, the selection and training processes based on above description is summarized as Algorithm 3.3. And the overall objective function is defined as:

$\displaystyle\min\limits_{h_{\theta}}L_{\textit{all}}=L_{ce}+L_{PL}+L_{NL}$ (12)

4. Experiment

We begin by describing our experimental setup: datasets, implementation details and baselines. Next, we present our results and compare them with other state-of-art methods. Finally, we perform ablation experiments, examine the rationality of our methods, analyse parameters sensitivity and perform feature visualization.

4.1 Datasets

Office31 [5]: It is a small-scale dataset, which contains three different domains: Amazon (A), DSLR (D) and Webcam (W), including 4110 images from 31 categories. The images in Amazon are of medium resolutions, taken in an environment with studio lighting conditions. The second domain, DSLR, contains images taken in an office environment room under natural light with high resolutions. The images contained in Webcam are taken in an ordinary room exposed to natural light, but of low resolutions. We study six domain adaptation tasks on A $\rightarrow$ W, A $\rightarrow$ D, W $\rightarrow$ A, W $\rightarrow$ D, D $\rightarrow$ A and D $\rightarrow$ W.

OfficeHome [41]: It is an image classification-based medium-sized domain adaptation benchmark containing 15, 500 images, spanning 4 domains: Art (A), Clipart (C), Product (P) and Real World (R). Each domain has images from 65 categories of everday objects. We take one domain as the source domain and the other domains as the target domains, dividing 12 domain adaptation tasks on A $\rightarrow$ C, A $\rightarrow$ P, A $\rightarrow$ R, C $\rightarrow$ A, C $\rightarrow$ P, C $\rightarrow$ R, P $\rightarrow$ A, P $\rightarrow$ C, P $\rightarrow$ R, R $\rightarrow$ A, R $\rightarrow$ C and R $\rightarrow$ P.

VisDA [42]: It is a large domain adaptation benchmark which contains over 280K images across 12 categories belong to 2 domains: Synthetic and Real. Synthetic contains images of 3D models under different angles and lighting constitute Synthetic, while the domain, Real, consists of images from the Microsoft COCO dataset. We take the training set (Synthetic) as source domain and the validation set (Real) as target domain, namely S $\rightarrow$ R.

4.2 Implementation details

We perform all the experiments based on PyTorch. On Office31, OfficeHome and VisDA, we adopt ResNet-50 [43] pre-trained on the ImageNet [44] as our backbone network. All the networks are optimized by applying a mini-batch Stochastic Gradient Descent optimizer (SGD) [45], where the momentum is set to 0.9 and the weight decay is set to 5e-3. In sample selection, we conduct five rounds of sample sampling, each round selecting 1% target domain samples, so the labeling budget $B$ is 5% target domain samples. In CIC, we adopt RandAugment [46], which applies image transformations randomly sampled from a set of 14 transforms. We use a committee of $k=$ 3 transforms for all experiments and set $t=$ 5 to build the preliminary candidate sample pool on OfficeHome and VisDA. The value of $t$ on Office31 is set to 10. In UDC, we set $M=$ 15 for OfficeHome, $M=$ 5 for Office31, and $M=$ 3 for VisDA repsecitvely. We set $r=$ 0.7 for OfficeHome, $r=$ 0.9 for Office31 and VisDA. We dynamically set the values of $\kappa$ and $\Delta\kappa$ for different domain adaptation tasks. The value of $\tau$ is set to 1e-3 for all experiments.

4.3 Baselines

We compare BADA with several active learning methods and state-of-art Active DA methods. For active learning methods, we compare BADA with RANDOM, UCN [28], QBC [47], Cluster [48], BADGE [36]. And for Active DA methods, we compare BADA with ADMA [14], AADA [16], TQS [18], S3VAADA [19], CLUE [37] and SDM-AG [39]. Among these methods, AADA, TQS and S3VAADA are based on domain adversarial learning. BADA-Selection is a special case of BADA, where only sample selection is performed and the self-training framework is not performed.

Table 1
Results on Office31 with 5% target domain samples as the labeling budget. We highlight the best result

Method	Office31
	A $\rightarrow$ W	A $\rightarrow$ D	W $\rightarrow$ A	W $\rightarrow$ D	D $\rightarrow$ A	D $\rightarrow$ W	Avg
ResNet-50	81.5	75.0	63.1	95.2	65.7	99.4	79.9
RAN	87.1	84.1	75.5	98.1	75.8	99.6	86.7
UCN	89.8	87.9	78.2	99.0	78.6	100.0	88.9
QBC	89.7	87.3	77.1	98.6	78.1	99.6	88.4
Cluster	88.1	86.0	76.2	98.3	77.4	99.6	87.6
BADGE	90.2	89.7	80.2	98.4	80.3	100.0	89.8
ADMA	90.0	88.3	79.2	100.0	79.1	100.0	89.4
AADA	89.2	87.3	78.2	99.5	78.7	100.0	88.8
S3VAADA	93.0	93.7	75.9	99.4	78.2	100.0	90.0
TQS	92.2	92.8	80.4	100.0	80.6	100.0	91.0
CLUE	88.7	92.3	79.5	100.0	78.2	100.0	89.8
BADA-Selection	91.3	92.4	79.8	100.0	80.4	100.0	90.7
BADA	92.1	92.7	82.8	100.0	83.2	100.0	91.8

Table 2

Results on OfficeHome and VisDA with 5% target domain samples as the labeling budget. We highlight the best result

Method	OfficeHome													VisDA
	A $\rightarrow$ C	A $\rightarrow$ P	A $\rightarrow$ R	C $\rightarrow$ A	C $\rightarrow$ P	C $\rightarrow$ R	P $\rightarrow$ A	P $\rightarrow$ C	P $\rightarrow$ R	R $\rightarrow$ A	R $\rightarrow$ C	R $\rightarrow$ P	Avg	S $\rightarrow$ R
ResNet-50	42.6	66.3	73.3	50.7	59.2	62.6	51.9	37.9	71.2	64.2	42.6	76.6	58.3	42.4
RAN	52.5	78.1	77.7	58.9	70.7	70.5	60.9	53.2	76.8	71.5	57.5	81.8	67.5	78.4
UCN	57.3	78.6	79.3	60.2	74.0	70.9	59.5	52.6	77.2	71.2	56.4	84.5	68.5	80.3
QBC	56.9	78.0	78.4	58.5	73.3	69.6	60.2	53.3	76.1	70.3	57.1	83.1	67.9	80.5
Cluster	60.9	77.8	78.1	58.4	74.6	69.2	59.4	54.2	75.8	70.7	56.4	84.7	68.4	79.8
BADGE	59.2	79.9	81.6	61.8	75.1	73.3	64.7	54.8	79.4	73.1	59.7	85.7	70.7	84.3
ADMA	57.2	79.0	79.4	58.2	74.0	71.1	60.2	52.2	77.6	71.0	57.5	85.4	68.6	83.1
AADA	56.6	78.1	79.1	58.5	73.7	71.2	60.1	53.1	77.2	70.6	57.2	84.5	68.3	81.3
S3VAADA	57.3	73.9	76.6	60.3	76.5	71.1	57.6	56.3	78.7	71.4	63.1	83.3	68.8	77.7
TQS	58.6	81.1	81.5	61.1	76.1	73.3	61.2	54.7	79.7	73.4	58.9	86.1	70.5	83.1
CLUE	63.6	79.3	80.9	68.8	77.5	76.7	66.3	57.9	81.0	76.0	60.8	86.3	72.9	82.7
SDM-AG	61.2	83.9	82.7	66.1	77.9	76.1	66.1	58.4	81.0	76.0	62.5	87.0	73.2	80.3
BADA-Selection	60.7	81.2	79.5	62.1	77.6	75.1	61.7	58.5	78.8	70.6	63.7	85.8	71.3	84.4
BADA	64.3	82.6	81.6	65.2	80.2	75.4	67.2	64.7	81.1	73.4	65.4	87.9	74.1	85.6

4.4 Results

The results of experiments on Office31, OfficeHome and VisDA are shown in Tables 1 and 2. We can see that our method achieves better performance than compared methods on three datasets. In addition, although traditional active learning methods do not take domain shift into account, they achieve better performance than ResNet-50, which proves that Active DA has broad prospects.

On Office31, BADA-Selection obtains comparable mean accuracy with current state-of-art Active DA methods. The performance of BADA can be better than all compared methods, improving 0.8% than the second best. On OfficeHome, we can observe that BADA-Selection can outperform state-of-art active learning methods and Active DA methods based on DANN, such as AADA, TQS and S3VAADA. When combined with self-training, the mean accuracy can be further improved from 71.3% to 74.1%. Obviously, we can see that on tasks A $\rightarrow$ C, P $\rightarrow$ C and R $\rightarrow$ C, BADA outperforms other methods by a large margin (more than 2%). As expected, we can obtain the similar conclusion on VisDA. From Table 2, we can also observe that BADA-Selection outperforms all compared methods. It is evident that BADA performs slightly better than BADA-Selection as with the other datasets.

Figure 3.

(a) Comparsion results of varying the percent of selected samples on VisDA. (b) Mean accuracy of BADA and its components varying the percent of selected samples on OfficeHome.

4.5 Analyses

4.5.1 Varying budget

To demonstrate the superiority of our method, we perform experiments with different methods under different labeling budgets. Four other methods are compared with BADA-Selection and BADA. In addition, we combine BADA-Selection with MME to further study the effect of MME on BADA-Selection as increasing the labeling budget. The results on VisDA are shown in Fig. 3a. We can observe that as the labeling budget goes from 1% to 10%, the accuracy of BADA-Selection increases steadily and BADA can achieve better performance than the compared methods consistently. We can also observe that under 1% labeling budget, combing BADA-Selection with MME outperforms BADA-Selection and BADA. However, when the labeling budget reaches to 5% and then continues to increase, the effect of MME on BADA-Selection tends to be saturated and is not as good as BADA-Selection. We conjecture the reason is that in the experimental setting of MME, there exists a small amount of labeled samples per class, but the labeling budget of Active DA far exceeds that of MME. This phenomenon also indicates that directly combining Active DA with MME may not be as good as expected.

Table 3
Ablation results on OfficeHome with 5% target domain samples as the labeling budget

Selection		Training		A $\rightarrow$ C	A $\rightarrow$ P	A $\rightarrow$ R	C $\rightarrow$ A	C $\rightarrow$ P	C $\rightarrow$ R	P $\rightarrow$ A	P $\rightarrow$ C	P $\rightarrow$ R	R $\rightarrow$ A	R $\rightarrow$ C	R $\rightarrow$ P	Avg
CIC	UDC	$L_{PL}$	$L_{NL}$
✓				58.3	78.3	77.9	60.3	75.7	73.6	60.2	55.7	78.7	69.8	62.1	83.5	69.5
	✓			59.9	80.3	78.2	60.8	75.2	72.6	61.4	57.5	78.6	69.2	62.1	82.5	69.9
✓	✓			60.7	81.2	79.5	62.1	77.6	75.1	61.7	58.5	78.8	70.6	63.7	85.8	71.3
✓	✓	✓		64.0	82.4	81.7	64.4	80.1	75.6	66.2	63.5	80.6	73.1	65.3	85.6	73.5
✓	✓		✓	62.5	79.9	80.7	63.2	77.0	76.8	62.3	59.6	80.7	71.2	62.6	85.4	71.8
✓	✓	✓	✓	64.3	82.6	81.6	65.2	80.2	75.4	67.2	64.7	81.1	73.4	65.4	87.9	74.1

4.5.2 Ablation study

To quantify the effect of each component of our proposed method, we conduct ablation experiments on all 12 tasks of OfficeHome in this subsection. The results under 5% labeling budget can be found in Table 3 and the mean accuracy with the increasing labeling budget on OfficeHome is shown in Fig. 3b. For sample selection criteria, we study the effect of each criterion. When performing UDC or CIC alone, we directly select samples according to respective scores, not setting the parameter $t$ . As shown in Table 3 and Fig. 3b, we can observe that UDC helps better than CIC on most of tasks. Comparing the results under the single selection criterion with those under the combination of the two, it is obvious that relying on one selection criterion alone results in sub-optimal sample selection. Generally, UDC, equipped with CIC, can improve mean accuracy from 69.9% to 71.3%. After sample selection, we further refine the function of each loss in self-training. It is also clear that taking full advantage of the samples with high reliability and high stability in the remaining unlabeled samples is beneficial to the model, obtaining 2.2% improvement, which facilitates the next round of selection to the most informative samples. From Fig. 3b, we can also observe that when the labeling budget gradually grows to 5%, its improvement on the model gradually declined, which is not as good as the improvement of the model under 1% labeling budget. The reason is that the model can learn more effective information from the samples given ground-truth labels in the case of gradual increase in the amount of the labeling budget, and then the effect of pseudo-labels can decrease. And we can know that complementary labels can slightly improve mean accuracy by negative learning. In total, applying $L_{PL}$ with $L_{NL}$ helps fully explore the value of unlabeled samples.

4.5.3 Sensitivity of parameters

Considering that different values of parameters have different influence on the performance of the model, the values of parameters $t$ and $M$ need to be determined in BADA-Selection. This subsection analyzes the influence of two parameters on the results of BADA-Selection. We study the effects of parameters on tasks A $\rightarrow$ C and P $\rightarrow$ C of OfficeHome under 1% labeling budget. The reason for choosing these two tasks as the example is that the accuracy rate of these two tasks is the lowest when adopting ResNet-50 for domain adaptation, which indicates that they are hard domain adaptation tasks. If we demonstrate the sensitivity of the performance varying $t$ and $M$ , we can obtain similar conclusions on other tasks. And a higher value of $t$ means that we have more samples in the candidate pool, which makes CIC less useful and UDC more important, because UDC can eventually filter out more samples, and vice versa. For different datasets, it is important to experiment with different values of $t$ to balance the effects of the two criteria and achieve good results. In addition, a lower value of $M$ means that fewer neighbors will be selected to account for UDC. As with $t$ , it is essential to choose the right value of $M$ for different datasets. The results are presented in Fig. 4. We can know that the performance changes slightly varying $t$ and $M$ , which indicates that our method is a little bit sensitive to $t$ and $M$ . We choose $t\in\{5,10\}$ and $M\in\{10,15\}$ for experimental purpose.

Figure 4.

Sensitivity analysis of $t$ and $M$ under 1% labeling budget on OfficeHome.

Figure 5.

Results of different values of $\kappa$ when setting different values of $r$ .

Figure 6.

(a) Results on the task A $\rightarrow$ C under 1% labeling budget when setting $r$ to 0.7 and 0.3. (b) Results of three variants on the task A $\rightarrow$ C with increasing labeling budget.

4.5.4 Effects of sample screening mechanism

In this subsection, we study the effects of the sample screening mechanism in detail. The key to the mechanism is the values of $\kappa$ and $r$ . So we analyze the results with different values of $\kappa$ when setting different values of $r$ . For each of values of $r$ , we analyze the accuracy of pseudo-labels and the number of reliable samples with different values of $\kappa$ under 1% labeling budget. The results can be seen in Fig. 5. We can observe that by decreasing $\kappa$ , the accuracy of pseudo-labels can effectively increase. Meanwhile the number of reliable samples can decrease, which indicates that by decreasing $\kappa$ , we can screen out samples with higher reliability. As shown in Fig. 6a, we further experiment the results with different values of $\kappa$ under 1% labeling budget when setting $r$ to 0.5 and 0.7, which also indicates that our method is sensitive to the values of $r$ and $\kappa$ and setting $r$ to 0.7 is better. Although we can improve the accuracy of pseudo-labels by decreasing $\kappa$ , it does not mean that the lower the value of $\kappa$ , the better the final effect of the model. To better present the necessity of relaxing $\kappa$ when increasing the labeling budget, we desgin three variants: fixing $\kappa$ , removing $\kappa$ and relaxing $\kappa$ . The initial value of $\kappa$ is set to 0.6 and the value of $\Delta\kappa$ is set to 0.1. The results on the task A $\rightarrow$ C can be seen in Fig. 6b. We can observe that the accuracy of removing $\kappa$ is lower than that of fixing $\kappa$ and that of relaxing $\kappa$ under 1% labeling budget. As increasing the labeling budget, we can also observe that the accuracy of fixing $\kappa$ is lower than that of removing $\kappa$ . The reason is that the model as increasing the labeling budget has better classification performance, resulting in that the effect of samples with high reliability and high stability tends to be saturated. By relaxing $\kappa$ , we can observe that its performance has been superior to other variants.

Figure 7.

Visualization of the feature distribution on the task A $\rightarrow$ D on Office31 by adopting ResNet-50, UCN and BADA. Red denotes source domain samples, and Green denotes target domain samples.

4.5.5 Feature visualization

In this subsection, we further use feature visualization to show experimental effects intuitively. Specifically, we perform T-SNE for ResNet-50, UCN and BADA on the task A $\rightarrow$ D on Office31. As shown in Fig. 7, we can observe that when adopting ResNet-50 alone, the source domain and the target domain are almost not aligned, which results in the samples being scattered in the feature space. By adopting UCN, the samples in the feature space becomes more compact, but there are still a number of samples scattered in the class boundary. Compared with the results of the proposed method, we can see that the effect of our method outperforms better, and the boundaries between classes are clearer.

5. Conclusion

In this paper, we introduced a new method named Boosting Active Domain Adaptation with Exploration of Samples (BADA) to solve the problems of Active DA. We proposed comprehensive sample selection criteria considering the instability and the diversity of the sample. Firstly, candidate samples were determined based on CIC, and then the most diverse samples were selected through UDC to overcome the redundancy in sample selection. And we further designed a self-training framework to explore the value of unlabeled samples after sample selection. We adopted the sample screening mechanism to distinguish reliable samples and unreliable samples, and took respective loss functions. Through lots of experiments, we showed that BADA could outperform other state-of-art methods. In the future, we will explore how to extend the proposed method to source free domain adaptation and open-set domain adaptation.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 62176128, the Open Projects Program of State Key Laboratory for Novel Software Technology of Nanjing University under Grant KFKT2022B06, the Fundamental Research Funds for the Central Universities No. NJ2022028, the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD) fund, as well as the Qing Lan Project.

References

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.

Huang

Liu

Van Der Maaten

and Weinberger

K.Q.

, Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.

Chen

L.-C.

Papandreou

Kokkinos

Murphy

and Yuille

A.L.

, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4) (2017), 834–848.

Chen

L.-C.

Zhu

Papandreou

Schroff

and Adam

, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801–818.

Saenko

Kulis

Fritz

and Darrell

, Adapting visual category models to new domains, in: Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part IV 11, Springer, 2010, pp. 213–226.

Ganin

and Lempitsky

, Unsupervised domain adaptation by backpropagation, in: International Conference on Machine Learning, PMLR, 2015, pp. 1180–1189.

Long

Cao

Wang

and Jordan

M.I.

, Conditional adversarial domain adaptation, Advances in Neural Information Processing Systems 31 (2018).

Tsai

Y.-H.

Hung

W.-C.

Schulter

Sohn

Yang

M.-H.

and Chandraker

, Learning to adapt structured output space for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7472–7481.

Rai

Saha

Daumé III

and Venkatasubramanian

, Domain adaptation meets active learning, in: Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, 2010, pp. 27–32.

10.

Sener

and Savarese

, Active learning for convolutional neural networks: A core-set approach, arXiv preprint arXiv:1708.00489, 2017.

11.

Settles

, Active learning literature survey, 2009.

12.

Gal

Islam

and Ghahramani

, Deep bayesian active learning with image data, in: International Conference on Machine Learning, PMLR, 2017, pp. 1183–1192.

13.

Kirsch

Van Amersfoort

and Gal

, Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, Advances in Neural Information Processing Systems 32 (2019).

14.

Huang

S.-J.

Zhao

J.-W.

and Liu

Z.-Y.

, Cost-effective training of deep cnns with active model adaptation, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1580–1588.

15.

Ovadia

Fertig

Ren

Nado

Sculley

Nowozin

Dillon

Lakshminarayanan

and Snoek

, Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift, Advances in Neural Information Processing Systems 32 (2019).

16.

J.-C.

Tsai

Y.-H.

Sohn

Liu

Maji

and Chandraker

, Active adversarial domain adaptation, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 739–748.

17.

Creswell

White

Dumoulin

Arulkumaran

Sengupta

and Bharath

A.A.

, Generative adversarial networks: An overview, IEEE Signal Processing Magazine 35(1) (2018), 53–65.

18.

Cao

Wang

and Long

, Transferable query selection for active domain adaptation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7272–7281.

19.

Rangwani

Jain

Aithal

S.K.

and Babu

R.V.

, S3vaada: Submodular subset selection for virtual adversarial active domain adaptation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7516–7525.

20.

Ganin

Ustinova

Ajakan

Germain

Larochelle

Laviolette

Marchand

and Lempitsky

, Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17(1) (2016), 2096–2030.

21.

Bahat

Irani

and Shakhnarovich

, Natural and adversarial error detection using invariance to image transformations, arXiv preprint arXiv:1902.00236, 2019.

22.

Zhang

Ouyang

and Xu

, Collaborative and adversarial network for unsupervised domain adaptation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3801–3809.

23.

Chapelle

and Zien

, Semi-supervised classification by low density separation, in: International Workshop on Artificial Intelligence and Statistics, PMLR, 2005, pp. 57–64.

24.

Cicek

and Soatto

, Unsupervised domain adaptation via regularized conditional alignment, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1416–1425.

25.

Rastrow

Jelinek

Sethy

and Ramabhadran

, Unsupervised model adaptation using information-theoretic criterion, in: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, pp. 190–197.

26.

Tan

Peng

and Saenko

, Class-imbalanced domain adaptation: an empirical odyssey, in: Computer Vision-ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, Springer, 2020, pp. 585–602.

27.

Zou

Liu

Kumar

and Wang

, Confidence regularized self-training, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5982–5991.

28.

Joshi

A.J.

Porikli

and Papanikolopoulos

N.P.

, Scalable active learning for multiclass image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11) (2012), 2259–2273.

29.

Lewis

D.D.

and Catlett

, Heterogeneous uncertainty sampling for supervised learning, in: Machine Learning Proceedings 1994, Elsevier, 1994, pp. 148–156.

30.

Balcan

M.-F.

Broder

and Zhang

, Margin based active learning, in: Learning Theory: 20th Annual Conference on Learning Theory, COLT 2007, San Diego, CA, USA; June 13–15, 2007. Proceedings 20, Springer, 2007, pp. 35–50.

31.

Donmez

Carbonell

J.G.

and Bennett

P.N.

, Dual strategy active learning, in: Machine Learning: ECML 2007: 18th European Conference on Machine Learning, Warsaw, Poland, September 17–21, 2007. Proceedings 18, Springer, 2007, pp. 116–127.

32.

Fazakis

Kostopoulos

Karlos

Kotsiantis

and Sgarbas

, An active learning ensemble method for regression tasks, Intelligent Data Analysis 24(3) (2020), 607–623.

33.

Seung

H.S.

Opper

and Sompolinsky

, Query by committee, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 287–294.

34.

Tresp

and Wang

, Representative sampling for text classification using support vector machines, in: Advances in Information Retrieval: 25th European Conference on IR Research, ECIR 2003, Pisa, Italy, April 14–16, 2003. Proceedings 25, Springer, 2003, pp. 393–407.

35.

Dasgupta

and Hsu

, Hierarchical sampling for active learning, in: Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 208–215.

36.

Ash

J.T.

Zhang

Krishnamurthy

Langford

and Agarwal

, Deep batch active learning by diverse, uncertain gradient lower bounds, arXiv preprint arXiv:1906.03671, 2019.

37.

Prabhu

Chandrasekaran

Saenko

and Hoffman

, Active domain adaptation via clustering uncertainty-weighted embeddings, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8505–8514.

38.

Saito

Kim

Sclaroff

Darrell

and Saenko

, Semi-supervised domain adaptation via minimax entropy, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8050–8058.

39.

Xie

Wang

Luo

Gan

Sun

Chi

Wang

and Wang

, Learning distinctive margin toward active domain adaptation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7993–8002.

40.

Xie

Dai

Hovy

Luong

and Le

, Unsupervised data augmentation for consistency training, Advances in Neural Information Processing Systems 33 (2020), 6256–6268.

41.

Venkateswara

Eusebio

Chakraborty

and Panchanathan

, Deep hashing network for unsupervised domain adaptation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5018–5027.

42.

Peng

Usman

Kaushik

Hoffman

Wang

and Saenko

, Visda: The visual domain adaptation challenge, arXiv preprint arXiv:1710.06924, 2017.

43.

Zhang

Ren

and Sun

, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

44.

Krizhevsky

Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, Communications of the ACM 60(6) (2017), 84–90.

45.

Bottou

, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade: Second Edition, 2012, 421–436.

46.

Cubuk

E.D.

Zoph

Shlens

and Le

Q.V.

, Randaugment: Practical automated data augmentation with a reduced search space, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 702–703.

47.

Dagan

and Engelson

S.P.

, Committee-based sampling for training probabilistic classifiers, in: Machine Learning Proceedings 1995, Elsevier, 1995, pp. 150–157.

48.

Nguyen

H.T.

and Smeulders

, Active learning using pre-clustering, in: Proceedings of the Twenty-First International Conference on Machine Learning, 2004, p. 79.

Boosting active domain adaptation with exploration of samples

Abstract

Keywords

1. Introduction

2.1 Domain adaptation

2.2 Active learning

2.3 Active domain adaptation

3. Method

Reliable Samples

Unreliable Samples

4.1 Datasets

4.2 Implementation details

4.3 Baselines

Table 1 Results on Office31 with 5% target domain samples as the labeling budget. We highlight the best result

4.5.1 Varying budget

Table 3 Ablation results on OfficeHome with 5% target domain samples as the labeling budget

4.5.3 Sensitivity of parameters

5. Conclusion

Footnotes

Acknowledgments

References

Table 1
Results on Office31 with 5% target domain samples as the labeling budget. We highlight the best result

Table 3
Ablation results on OfficeHome with 5% target domain samples as the labeling budget