Correlation alignment with attention mechanism for unsupervised domain adaptation

Abstract

Domain adaptation aims to solve the problems of lacking labels. Most existing works of domain adaptation mainly focus on aligning the feature distributions between the source and target domain. However, in the field of Natural Language Processing, some of the words in different domains convey different sentiment. Thus not all features of the source domain should be transferred, and it would cause negative transfer when aligning the untransferable features. To address this issue, we propose a Correlation Alignment with Attention mechanism for unsupervised Domain Adaptation (CAADA) model. In the model, an attention mechanism is introduced into the transfer process for domain adaptation, which can capture the positively transferable features in source and target domain. Moreover, the CORrelation ALignment (CORAL) loss is utilized to minimize the domain discrepancy by aligning the second-order statistics of the positively transferable features extracted by the attention mechanism. Extensive experiments on the Amazon review dataset demonstrate the effectiveness of CAADA method.

Keywords

Domain adaptation CORAL attention mechanism domain discrepancy

1. Introduction

Neural network based models have achieved great success in many fields, such as natural language processing [1,2] and computer vision [3]. However, because of the domain shift, these models trained on the large-scale labeled datasets have a low generation in novel tasks and datasets. Domain adaptation [4], a field belonging to transfer learning, aims to utilize the knowledge extracted from the label-rich source domain to improve the performance on the target domain with scarce annotation data, which can reduce the negative effect raised from domain shift.

Earlier methods of domain adaptation focus to learn domain-invariant feature representations by minimizing the domain discrepancy between the source and target domain, such as Transfer Kernel Learning [5], Transfer Component Analysis [6]. Recently, neural network based models have been widely used in domain adaptation. Long et al. [7] proposed a deep convolutional neural network for domain adaptation which utilizes multiple kernels selection for averaging embedding matching. Ganin et al. [8] proposed an adversarial framework for domain adaptation which introduces adversarial training to extract domain-invariant features. The feature extractor is trained to minimize the classification loss and fool the discriminator, which is trained to distinguish the data from the source or target domain. Zhou et al. [9] proposed a Deep Unsupervised Convolutional Domain Adaptation (DUCDA) method which consists of two correlation statistics loss for aligning distributions and measuring the domain discrepancy between the source and target domain.

The aforementioned works mainly align the feature distributions between the source and target domain. However, not all features of the source domain should be transferred. For example, “convenient” and “clean” are frequency words in the kitchen domain; however, these words are rare in the books domain. Moreover, some of the words in different domains convey different sentiment. For instance, in DVDs domain, the word “unpredictable” in “unpredictable storyline!” is a positive word. But in the electronics domain, “unpredictable” is a negative word. If the model aligns the untransferable features, it may occur negative transfer.

To address the above problem, we propose an attention transfer process for domain adaptation which can align features in a selective way. An attention mechanism is introduced into our model to capture the positively transferable features of source domain which is benefit for target domain. In order to minimize the domain discrepancy, we utilize CORAL loss to align the second-order statistics of the positively transferable features extracted by the attention mechanism. Experimental results demonstrate that our model can effectively improve the performance of domain adaptation.

In summary, the main contributions of this paper are as follows:

We propose an attention transfer process for domain adaptation which introduces an attention mechanism into the model to capture the positively transferable features in the source and target domain. Besides, CORAL loss is introduced into this model for minimizing the domain discrepancy which further improves the robustness of CAADA.

We evaluate our model on Amazon reviews datasets, and experimental results show that our approach outperforms other baseline methods.

The remaining of the paper is organized as follows. In Section 2, we presents the related works on domain adaptation. In Section 3, we briefly describe the proposed method. In Section 4, we present implementation process and experimental results. Finally, Section 5 provides the conclusion of this study.

2. Related work

In recent years, domain adaptation has become a promising direction. We focus on unsupervised domain adaptation methods and attention mechanism.

2.1. Unsupervised domain adaptation

Unsupervised domain adaptation, as a special case of transfer learning, aims to utilize the knowledge extracted from the source domain with abundant labeled data to improve the performance of target domain with no labeled data. Training a classifier using labeled source data and then fine-tuning the classifier in target labeled data is the most common practice. Many works use the Maximum Mean Discrepancy (MMD) to minimize the distance between source and target domain. Specifically, Tzeng et al. [10] proposed a CNN-based model for domain adaptation. This model introduces an adaptation layer and a domain confusion loss which aim to learn semantically meaningful and domain-invariant features respectively. However, this model only used single-kernel MMD to decrease the domain discrepancy. Based on this model, Zhou et al. [11] proposed to use multi-kernel Maximum Mean Discrepancy (MK-MMD) for minimizing the domain discrepancy between the source and target domain. In contrast to this two methods, our method introduce CORAL to minimize the domain discrepancy which can align the second-order statistics of source and target domain. The proposed method also achieves better results on the popular benchmark dataset.

2.2. Attention mechanism

Attention mechanism has made great success in network architectures which can capture important features. Successive works on attention mechanism include machine translation [12], image segmentation [13]. The key idea of attention mechanism is that not all low-level position contributes a equal importance for the high-level representation. Recently, attention mechanism is also used in domain adaptation. Wang et al. [14] proposed a multi-adversarial domain adaptation method. This model introduces transferable local attention and transferable global attention which aim to highlight transferable regions and highlight transferable images respectively. Li et al. [15] proposed hierarchical attention transfer mechanism for domain adaptation which can transfer attentions for emotions across domains to decrease the domain discrepancy. These methods has a similar working procedure to ours. However, our approach is simpler. The feature learning, classifier learning and domain adaptation are in a unified architecture with a single back-propagation.

Table 1
The notation and denotation

Symbol Description

$D_{s}$ , $D_{t}$ The source and target domains

$n^{s}$ The number of instances in the source domain

$n^{t}$ The number of instances in the target domain

C Classifier C is trained on $D_{s}$

$(x^{s}, y^{s})$ The instances from the source domain

$x^{t}$ The instances from the target domain

Symbol	Description
$D_{s}$ , $D_{t}$	The source and target domains
$n^{s}$	The number of instances in the source domain
$n^{t}$	The number of instances in the target domain
C	Classifier C is trained on $D_{s}$
$(x^{s}, y^{s})$	The instances from the source domain
$x^{t}$	The instances from the target domain

Fig. 1.

An overview of CAADA approach. Dashed lines indicate fixed network parameters.

3. Model

3.1. Task description and notations

In unsupervised domain adaptation, we have a source domain $D_{s} = {(x_{i}^{s}, y_{i}^{s})}_{i = 1}^{n^{s}}$ with $n^{s}$ labeled samples, and a target domain $D_{t} = {x_{i}^{t}}_{i = 1}^{n^{t}}$ with $n^{t}$ unlabeled samples. The source domain and target domain follow distribution $p_{s} (x, y)$ and $p_{t} (x, y)$ respectively, and $p_{s} (x, y) \neq p_{t} (x, y)$ . The goal is to build a robust classifier $y = θ (x)$ using source supervision, which can minimize target risk $ϵ_{t} (θ) = \Pr_{(x, y) ∽ p_{t}} [θ (x) \neq y]$ . Table 1 summarized the frequent notations and their descriptions.

3.2. Overview

CAADA method consists of feature extractors $E_{S}$ and $E_{T}$ , classifier C, and CORAL loss which are used to extract features of both domain, classify instances from source domain and minimize the distance between the second-order statistics of both domain distributions. For better learning transferable features, we introduce an attention mechanism into transfer process for domain adaptation which can capture the positively transferable features in the source and target domain. The training procedure is shown in Fig. 1. We first pre-train the $E_{S}$ and C using labeled source data. This step is to obtain a reference feature space and a reference classifier. During training, we fix the parameters of $E_{S}$ and train the $E_{T}$ , CORAL loss and C. This step introduces an attention transfer process and CORAL loss for domain adaptation which can force feature extractor $E_{T}$ learning the positively transferable features and domain-invariant features. Finally, target data are mapped with the $E_{T}$ and classified by the C during testing. All modules are neural networks, whose architectures are detailed in Section 4.3.

3.3. Attention mechanism

Since not all features of the source domain should be transferred, and it would cause negative transfer when aligning the untransferable features. In this paper, we introduce an attention mechanism into the transfer process for domain adaptation to capture the positively transferable features of the source domain which is benefit for target domain. The attention mechanism on the source stream can be formulated as $\begin{array}{l} (1) & g_{s} = σ (W_{s} v_{s} + b_{s}) \\ (2) & z_{s} = g_{s} ⊙ v_{s} \end{array}$ where σ is the softmax function, $W_{s}$ is the weight matrix, $b_{s}$ is the bias term, ⊙ is the element-wise multiplication, $v_{s}$ is the output of feature extractor $E_{S}$ on the source stream. Similarly, $z_{t}$ is the output of attention on the target stream.

3.4. Correlation alignment

To minimize the domain discrepancy, we introduce the CORrelation ALignment (CORAL) [16] loss into our model by aligning the second-order statistics of the positively transferable features extracted by the attention mechanism on the source and target stream, which can align the positively transferable features of both domain. CORAL has achieved good performance in domain adaptation [17,18]. We suppose that $M_{S}$ is a batch selected from $z_{s}$ with $N_{S}$ instances randomly, and $M_{T}$ is a batch selected from $z_{t}$ with $N_{T}$ instances randomly. The CORAL loss is defined as the distance between the second-order statistics of the source and target domain. $\begin{matrix} (3) & L_{CORAL} = \frac{1}{4 d^{2}} ‖ C_{S} - C_{T} ‖_{F}^{2} \end{matrix}$ where $‖ \cdot ‖_{F}^{2}$ is the squared matrix Frobenius norm, $C_{S}$ and $C_{T}$ are the covariance matrices of the source and target domain respectively, which are denoted by $\begin{array}{l} C_{S} = & \frac{1}{N_{S} - 1} \\ (4) & \times ({M_{S}}^{T} M_{S} - \frac{1}{N_{S}} {(1^{T} M_{S})}^{T} (1^{T} M_{S})) \\ C_{T} = & \frac{1}{N_{T} - 1} \\ (5) & \times ({M_{T}}^{T} M_{T} - \frac{1}{N_{T}} {(1^{T} M_{T})}^{T} (1^{T} M_{T})) \end{array}$ where 1 is a column vector with all elements equal to 1.

3.5. Overall objective

The overall objective function of our proposed model can be formulated as follows. $\begin{matrix} (6) & L = L_{C} + α L_{CORAL} \end{matrix}$ where $L_{C}$ is the classification loss, α is the weight to balance the adaptation and classifier. The loss function of classifier can be formulated as $\begin{array}{l} L_{C} & = - E_{(x_{s}, y_{s}) \sim (X_{s}, Y_{s})} log (C (x^{s})) \\ (7) & = - \frac{1}{n^{s}} \sum_{i = 1}^{n^{s}} \sum_{k = 1}^{l} 1 (y_{i}^{s} = k) \cdot log C (x^{s}) \end{array}$ where k is the number of source data categories, $1 (y_{i}^{s} = k)$ is the indicator function.

4. Experiments

4.1. Dataset

We evaluate our model on the Amazon reviews benchmark datasets [19], which have been widely used in domain adaptation. The dataset1

¹
https://www.cs.jhu.edu/~mdredze/datasets/sentiment/

contains reviews of four different categories of products: books (B), DVDs (D), electronics (E) and kitchen appliances (K) from Amazon.com . Each review is assigned with a positive polarity (higher than 3 stars) or with a negative polarity (3 stars or lower). Each domain consists of 2,000 labeled reviews and approximately 4,000 unlabeled ones (varying slightly between domains) and the two classes are exactly balanced. In our experiments, we follow [20] to use the 5,000 most frequent terms of unigrams and bigrams as the input and construct 12 domain adaptation tasks: B → D, B → E, B → K, D → B, D → E, D → K, E → B, E → D, E → K, K → B, K → D, K → E, where the notation before arrow corresponds to the source domain and the notation after arrow corresponds to the target domain. We use 2,000 labeled reviews from the source domain and all unlabeled reviews from the target domain for training, and test on the 2,000 labeled data from the target domain. The statistics of the dataset is shown in Table 2.

Table 2

Statistics of the Amazon review dataset including the number of document, labeled and unlabeled reviews for each domain as well as the ration of negative samples in the unlabeled data

Domain	Docs	Labeled	Unlabeled	% Neg.
Books	6,465	2,000	4,465	50%
DVD	5,586	2,000	3,586	50%
Electronics	7,681	2,000	5,681	50%
Kitchen	7,945	2,000	5,945	50%

4.2. Baseline methods

We compare our proposed model CAADA with the several baseline approaches. The compared methods are listed as follows:

SONLY As a non-adaptive basic baseline, we train a model only on the source domain and test it on the target test data directly.

JDOT Joint distribution optimal transport [21] performs domain adaptation with optimal transport which transforms the joint distribution of the source domain into the target domain for aligning the feature space of both.

MMD Maximum Mean Discrepancy (MMD) [22] is a metric to measure the domain discrepancy between the source and target domain.

CADA Remove the attention mechanism from the model CAADA in the training phase.

4.3. Implementation details

In this paper, we use multi-layer perceptron (MLP) as our basic network. For each task, feature extractors $E_{S}$ and $E_{T}$ consist of a one hidden layer of 64 nodes with a tanh activation function and a dropout layer of 0.5. Classifier C consists of a one hidden layer of 64 nodes followed by a sigmoid output function. Batch-size is set as 64, the learning rate of β is set to 0.0001. The coefficient parameter α is set to 5. For the tasks B → D, D → B, D → E and E → D, the coefficient parameter α is set to 10.

4.4. Experimental results

The results on the 12 unsupervised domain adaptation tasks of different methods are shown in Table 3. From this table, we can see that our proposed model CAADA achieves the best performances among all the models. SONLY method achieves a poor performance with 75.69% on average because of no adaptive. JDOT method achieves 78.70% on average due to its poor transformation between the source and target domain, while CAADA can capture the positively transferable features and align them better. Compared to the MMD method, CAADA outperforms it by 4.13% on average which demonstrates CORAL superior to MMD in some extent. In order to validate the effectiveness of attention mechanism, we compare with variants of CAADA. We can see that CAADA outperforms CADA by 1.03% on average. This proves that attention mechanism can capture the positively transferable features and further enhance feature transferability of CAADA.

Table 3
Accuracy (%) of our method with competing approaches on the Amazon reviews dataset

Domain SONLY JDOT MMD CADA CAADA

B → D 79.70 79.50 79.95 81.65 82.34

B → E 73.61 78.10 76.54 81.25 82.39

B → K 76.00 79.40 79.01 85.01 85.58

D → B 73.59 76.30 76.23 76.86 79.64

D → E 72.93 78.80 76.69 81.94 83.36

D → K 76.28 82.10 79.06 85.18 85.67

E → B 72.09 74.90 73.28 76.98 78.34

E → D 71.58 73.70 73.48 77.23 77.44

E → K 85.21 87.20 86.11 88.51 88.91

K → B 71.31 72.80 75.38 75.12 75.99

K → D 73.73 76.50 75.15 76.91 78.83

K → E 82.34 84.50 84.53 85.97 86.48

AVG 75.69 78.70 77.95 81.05 82.08

Domain	SONLY	JDOT	MMD	CADA	CAADA
B → D	79.70	79.50	79.95	81.65	82.34
B → E	73.61	78.10	76.54	81.25	82.39
B → K	76.00	79.40	79.01	85.01	85.58
D → B	73.59	76.30	76.23	76.86	79.64
D → E	72.93	78.80	76.69	81.94	83.36
D → K	76.28	82.10	79.06	85.18	85.67
E → B	72.09	74.90	73.28	76.98	78.34
E → D	71.58	73.70	73.48	77.23	77.44
E → K	85.21	87.20	86.11	88.51	88.91
K → B	71.31	72.80	75.38	75.12	75.99
K → D	73.73	76.50	75.15	76.91	78.83
K → E	82.34	84.50	84.53	85.97	86.48
AVG	75.69	78.70	77.95	81.05	82.08

4.5. Parameter analysis

In the following, we investigate the effects of the hyper-parameter α which is sampled from ${0.1, 0.5, 1, 5, 10, 50, 100}$ . The classification performance with different values of α on tasks D → B, K → D and E → B is shown in Fig. 2. α is a weight to balance the adaptation and classifier. The higher value of α would make CORAL loss more important. we can observe that the CAADA accuracy increases at beginning and then decreases as α varies in three tasks. When the α is set to 10, the task D → B can obtain the best performance. Meanwhile, when the α is set to 5, the tasks K → D and E → B can obtain the best performance. This demonstrates that a good trade-off between adaptation and classifier can enhance feature transferability.

Fig. 2.

Performance at different α of CAADA.

4.6. Influence of labeled data size

In order to investigate the influence of labeled data size, we experiment with different ratios (20%, 40%, 60%, 80%, 100%) of labeled data. The dynamic performance of our method (CAADA) and SONLY on task E → K is shown in Fig. 3. We can observe that the accuracy of CAADA and SONLY methods increase as the ratio of labeled data increases. However, the CAADA outperforms SONLY whatever the ratio of labeled data is. This demonstrates that our approach can transfer knowledge from source domain to improve the performance of target domain. Moreover, the accuracy of our method CAADA with 60% labeled data still outperforms SONLY with 100% labeled data. This further proves that our proposed model can reduce the need for large-scale labeled datasets by utilizing domain adaptation.

Fig. 3.

Accuracy on task E → K of our method (CAADA) and SONLY with different ratios of labeled data.

Fig. 4.

The t-SNE visualization on the distribution of source and target domain in the hidden space. The red and blue points denote the positive and negative instances of source domain, purple and darkcyan points denote the positive and negative instances of target domain.

4.7. Visualization

To get an intuitive understanding of the learned features of CAADA, we visualize the distribution of the source and target domain learned by CADA and CAADA on tasks D → B and E → K using t-SNE embeddings [23] in Fig. 4. We can observe that the representations learned by CADA has a great overlap between the domain D and B, E and K. The positive and negative instances of source and target domain are mix together very well. While the representations learned by CAADA has both overlapping and non-overlapping areas which selectively align features between the source and target domain. Owing to not all features of source domain should be transferred, and it would cause negative transfer when aligning the untransferable features. Attention mechanism can force model to learn the positively transferable features which can enhance the feature transferability and robustness of CAADA. Moreover, the category boundaries of the source and target domain in both methods are clear. The red (purple) and blue (darkcyan) points are separated very well.

5. Conclusions

In this paper, we propose a Correlation Alignment with Attention mechanism for unsupervised Domain Adaptation (CAADA) method. In our approach, we introduce an attention mechanism into transfer process for domain adaptation to capture the positively transferable features which is benefit for target domain. Besides, CORAL loss is used to minimize the domain discrepancy by aligning the second-order statistics of the positively transferable features extracted by attention mechanism on the source and target stream which can align the positively transferable features between both domain and further enhance the robustness of CAADA. The experiments on Amazon reviews benchmark corpora demonstrate that our method CAADA outperforms other baseline methods and can effectively improve the performance of domain adaptation.

This work introduces a simple transfer process and achieves a good results. In the future, we plan to combine the domain adaptation with reinforcement learning to capture the positively transferable features in source and target domain.

Footnotes

Acknowledgements

This work is supported in part by the Shandong Natural Science Foundation (No. ZR2017LF004, No. ZR2018LF002), National Natural Science Foundation of China (No. 31500669), Shandong Provincial Higher Education Science and Technology Plan Project (No. J16LN20).

References

Tang,

Qin and

Liu, Document modeling with gated recurrent neural network for sentiment classification, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1422–1432. doi:10.18653/v1/D15-1167.

Yang,

Dyer,

He,

Smola and

Hovy, Hierarchical attention networks for document classification, in: Proceedings of NAACL-HLT 2016, 2016, pp. 1480–1489.

Feng,

Zhang,

Zhao,

Ji and

Gao, GVCNN: Group-view convolutional neural networks for 3D shape recognition, in: 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 264–272. doi:10.1109/CVPR.2018.00035.

Razzaghi,

Razzaghi and

Abbasi, Transfer subspace learning via low-rank and discriminative reconstruction matrix, Knowledge-Based Systems 163 (2019), 174–185. doi:10.1016/j.knosys.2018.08.026.

Long,

Wang,

Sun and

P.S.

Yu, Domain invariant transfer kernel learning, IEEE Transactions on Knowledge and Data Engineering 27(6) (2014), 1519–1532. doi:10.1109/TKDE.2014.2373376.

S.J.

Pan,

I.W.

Tsang,

J.T.

Kwok and

Yang, Domain adaptation via transfer component analysis, IEEE Transactions on Neural Networks 22(2) (2010), 199–210. doi:10.1109/TNN.2010.2091281.

Long,

Cao,

Wang and

M.I.

Jordan, Learning transferable features with deep adaptation networks, in: Proceedings of the Thirty-Second International Conference on Machine Learning, 2015, pp. 97–105.

Ganin,

Ustinova,

Ajakan,

Germain,

Larochelle,

Laviolette,

Marchand and

Lempitsky, Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17(1) (2016), 2096–3030.

Zhuo,

Wang,

Zhang and

Huang, Deep unsupervised convolutional domain adaptation, in: Proceedings of the Twenty-Fifth ACM International Conference on Multimedia, 2017, pp. 261–269. doi:10.1145/3123266.3123292.

10.

Tzeng,

Hoffman,

Zhang,

Saenko and

Darrell, Deep domain confusion: Maximizing for domain invariance, 2014, arXiv preprint arXiv:1412.3474.

11.

Long,

Wang,

Cao,

Sun and

P.S.

Yu, Deep learning of transferable representation for scalable domain adaptation, IEEE Transactions on Knowledge and Data Engineering 28(8) (2016), 2027–2040. doi:10.1109/TKDE.2016.2554549.

12.

Li,

Zhang,

Xiong,

W.M.

Hwu and

Chen, Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS, in: Proceedings of the Twenty-Fourth Asia and South Pacific Design Automation Conference, 2019, pp. 693–698. doi:10.1145/3287624.3287717.

13.

Kaul,

Manandhar and

Pears, FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation, 2019, arXiv preprint arXiv:1902.03091.

14.

Wang,

Li,

Ye,

Long and

Wang, Transferable attention for domain adaptation, in: AAAI Conference on Artificial Intelligence (AAAI), 2019.

15.

Li,

Wei,

Zhang and

Yang, Hierarchical attention transfer network for cross-domain sentiment classification, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 5852–5859.

16.

Sun,

Feng and

Saenko, Return of frustratingly easy domain adaptation, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 2058–2065.

17.

Morerio,

Cavazza and

Murino, Minimal-entropy correlation alignment for unsupervised deep domain adaptation, in: International Conference on Learning Representations, 2018, pp. 1–15.

18.

Sun,

Feng and

Saenko, Correlation alignment for unsupervised domain adaptation, Domain Adaptation in Computer Vision Applications (2017), 153–171. doi:10.1007/978-3-319-58347-1_8.

19.

Blitzer,

McDonald and

Pereira, Domain adaptation with structural correspondence learning, in: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 2006, pp. 120–128.

20.

Chen,

Xu,

Weinberger and

Sha, Marginalized denoising autoencoders for domain adaptation, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.

21.

Courty,

Flamary,

Habrard and

Rakotomamonjy, Joint distribution optimal transportation for domain adaptation, in: Advances in Neural Information Processing Systems, 2017, pp. 3730–3739.

22.

Gretton,

K.M.

Borgwardt,

M.J.

Rasch,

B.S.

Lkopf and

Smola, A kernel two-sample test, Journal of Machine Learning Research 13 (2012), 723–773.

23.

Donahue,

Jia,

Vinyals,

Hoffman,

Zhang,

Tzeng and

Darrell, Decaf: A deep convolutional activation feature for generic visual recognition, in: International Conference on Machine Learning, 2014, pp. 647–655.

Correlation alignment with attention mechanism for unsupervised domain adaptation

Abstract

Keywords

1. Introduction

2. Related work

2.1. Unsupervised domain adaptation

2.2. Attention mechanism

3.1. Task description and notations

3.2. Overview

3.3. Attention mechanism

3.4. Correlation alignment

3.5. Overall objective

4. Experiments

4.1. Dataset

1 https://www.cs.jhu.edu/~mdredze/datasets/sentiment/

4.3. Implementation details

4.4. Experimental results

5. Conclusions

Footnotes

Acknowledgements

References

¹
https://www.cs.jhu.edu/~mdredze/datasets/sentiment/