Adversarial unsupervised domain adaptation based on generative adversarial network for stock trend forecasting

Abstract

Stock trend forecasting, which refers to the prediction of the rise and fall of the next day’s stock price, is a promising research field in financial time series forecasting, with a large quantity of well-performing algorithms and models being proposed. However, most of the studies focus on trend prediction for stocks with a large number of samples, while the trend prediction problem of newly listed stocks with only a small number of samples is neglected. In this work, we innovatively design a solution to the Small Sample Size (SSS) trend prediction problem of newly listed stocks. Traditional Machine Learning (ML) and Deep Learning (DL) techniques are based on the assumption that the available labeled samples are substantial, which is invalid for SSS trend prediction of newly listed stocks. In order to break out of this dilemma, we propose a novel Adversarial Unsupervised Domain Adaptation Network (AUDA-Net), based on Generative Adversarial Network (GAN), ad hoc for SSS stock trend forecasting. Different from the traditional domain adaptation algorithms, we employ a GAN model, which is trained on basis of the target stock dataset, to effectively solve the absence problem of available samples. Notably, AUDA-Net can reasonably and successfully transfer the knowledge learned from the source stock dataset to the newly listed stocks with only a few samples. The stock trend forecasting performance of our proposed AUDA-Net model has been verified through extensive experiments conducted on several real stock datasets of the U.S. stock market. Using stock trend forecasting as a case study, we show that the SSS forecasting results produced by AUDA-Net are favorably comparable to the state-of-the-art.

Keywords

Small sample size problem transfer learning generative adversarial network adversarial unsupervised domain adaptation stock trend forecasting

1. Introduction

Stock trend forecasting has always attracted the attention of artificial intelligence and finance industry. Different from the stock price prediction, the stock trend prediction refers to the prediction of stock price direction. In other words, we need to predict the rise and fall of the next day’s stock price. Hence, the nature of stock trend prediction is a binary classification problem rather than a regression problem. However, stock time series data is essentially unstable, complicated, and nonlinear [1]. In addition, stock trend is easily affected by many external factors, including national policies, international events, personnel transfer of the company, and economic conditions of the firm, etc. Therefore, effectively predicting the stock trend has become a challenging and valuable task. The traditional time series forecasting models, such as Stochastic Volatility (SV) model, Moving Average (MA) model, and other statistical ones, have their limitations in stock trend prediction, since the stock time series data, which cannot be reflected by analytic equations with parameters, is complex and noisy [2].

Machine Learning (ML) models, which possess strong feature extraction capability, are suitable for dealing with complicated and nonlinear data. Therefore, ML models have a wide range of applications in the field of stock trend prediction. In the past few years, DL algorithms have become the best ML algorithms in various research fields. Sezer et al. [3] showed that the performance of the DL models notably surpasses traditional ML models in financial time series forecasting.

Stock trend prediction, as one of the hot research topics in financial time series prediction, is no exception [4, 5]. The existing research on stock trend forecasting is mainly focused on stocks with a large number of samples while neglecting those newly minted public stocks with few samples. There exist a number of newly listed stocks in the stock market that have only very few samples. However, the general ML models, such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM), etc., require massive samples. Consequently, it is a challenging and valuable task to investigate how to predict the trend of the newly listed stocks with a few samples.

There exists a special research field of ML, which is called the few-shot learning, to solve the problem of too few available samples, specifically [6]. However, the few-shot learning paradigm is not employed to tackle such problems in this work, with the reasons being explained as follows.

In the general few-shot learning scenario, the datasets usually contain many classes with a few samples [7]. We assume that the number of classes is $N$ . In the training phase, $M$ ( $M<N)$ classes with $K$ samples are randomly selected from the training set during each iteration. These $M\ast K$ samples are used to build a task as the support-set of the model. The task refers to the training set during each iteration in the training phase. Then, several samples are chosen from the remaining samples of the $M$ classes as the query-set of the model [8]. The purpose of training model, in the general few-shot learning scenario, is to learn the commonness of different tasks, so that the model can obtain a strong generalization ability. However, in the scenario of this research work, all the datasets only contain two classes, and each task contains the same category combination. Therefore, there is no way to make the model trained with the few-shot learning algorithms to acquire strong generalization ability in the SSS stock trend forecasting research carried out in this work.

Another serious problem lies in that, few-shot learning algorithms are developed based on the assumption that, although there are not massive samples in the target classification task, there are a quantity of similar classification tasks with a small number of samples. However, in the scenario of this work, this assumption is not tenable, because there are no other similar classification tasks in our datasets. Based upon the above analysis, we did not adopt the few-shot learning paradigm for this research work.

Generally, there are many stocks with numerous samples in the stock market, with the distributions of which being different. Motivated by this observation, we innovatively introduce the domain adaptation paradigm into the research of SSS stock trend forecasting. We use the stock with massive samples as the source domain, and the newly listed stock, with a few samples, to be predicted, which is our concern, as the target domain. Domain adaptation paradigm is used to fulfil the target learning task for the target domain, under the circumstances that the distributions of source domain and target domain are inconsistent.

Nowadays, DL algorithms play a crucial role in domain adaptation. One typical domain adaptation method projects the source and target domains into a common representation space by learning a deep transformation [9, 10]. Another classical method reconstructs the target features according to the source domain [11], with the adversarial adaptation method being its mainstream [12]. However, most studies of the domain adaptation methods focus on the problem setting that the target domain has a large number of samples, while, in contrast, there are few studies of domain adaptation on the problem setting that the target domain has few samples.

Motiian et al. [13] proposed a supervised domain adaptation algorithm using a few samples. However, this algorithm trains the model by randomly selecting a few labeled samples from a large number of labeled samples. In essence, it still requires that there exist massive labeled samples in the target domain. In reality, newly listed stocks do not possess many samples, and therefore, supervised domain adaptation algorithms for small samples are not suitable for the research topic carried out in this work. While we innovatively propose here, based on Generative Adversarial Network (GAN), and specifically for SSS stock trend forecasting and other similar application problems, a novel Adversarial Unsupervised Domain Adaptation Network (AUDA-Net), which breaks the dilemma of requiring a large number of samples in the field of stock trend forecasting, and can effectively predict the trend of newly listed stocks with a small number of samples.

Our proposed AUDA-Net model consists of two parts. The design aim of the first part, which is composed of one GAN, is to solve the problem of insufficient samples in the target domain. We attempt to train a GAN model to generate a large quantity of fake samples with the same distribution as the target domain samples. Then, the fake samples and real target samples are combined to form a new target domain dataset. The second part is an Adversarial Unsupervised Domain Adaptation (AUDA) model. We embed the domain adaptation phase into the representation learning process. We attempt to train the feature extractor $F$ and the domain discriminator $D_{d}$ to make the distributions of the source and target features, extracted by the feature extractor $F$ , be the same or very similar. Meanwhile, we also train the feature extractor $F$ and the class classifier $C$ to minimize the error of the class classifier $C$ on the source features. The second part is constituted with the feature extractor $F$ , the class classifier $C$ and the domain discriminator $D_{d}$ , all together. In the testing phase, we use the feature extractor $F$ and the class classifier $C$ , which is trained on basis of the source features, to predict the label of the target samples. Figure 1 shows the overall architecture of the proposed AUDA-Net model, intuitively.

Figure 1.

The overall architecture of AUDA-Net.

Data augmentation methods in few-shot learning usually leverage some prior knowledge, such as hand-crafted rules and transformation procedure, to augment samples in order to enrich the supervised information in sample set [8]. In computer vision field, the hand-crafted rules include flipping images [14, 15], rotation images [16], and scaling images [17]. According to what samples are selected to transform and added to the target dataset, data augmentation methods based on transformation procedure are categorized into three classes [8]: transforming samples from the target dataset [18], transforming samples from an unlabeled large dataset which contains target label [19] and transforming samples from other similar large datasets [20]. In general, the augmentation strategy using the prior knowledge of the target classification task is often specific to a dataset, such that we cannot easily use this strategy on other datasets. To solve the Small Sample Size (SSS) problem, the sample generation part in our proposed AUDA-Net model trains a GAN model to generate abundant fake samples. Different from the data augmentation methods in few-shot learning, the augmentation methods of our model are not specific to a dataset, and can be well applied to the other datasets.

The major innovations and contributions of this work are summarized as follows:

First of all, we, to the best of our knowledge, are the first to predict the trend of newly listed stocks with a few samples, which fills the gap in the research of SSS stock trend prediction. The trend forecasting effectiveness of the proposed AUDA-Net model has been verified through extensive experiments conducted in this work.

Secondly, we, to our knowledge, are the first to introduce the domain adaptation paradigm into the research field of stock trend prediction. We seek to implement transfer learning across different stock datasets. It has been proved through the largely conducted experiments that, knowledge transfer can be realized between stocks, and the knowledge learned from one stock can be used to help implement trend forecast for another stock.

Thirdly, we propose a novel Adversarial Unsupervised Domain adaptation Network (AUDA-Net) model based upon GAN, ad hoc for SSS stock trend forecasting and other similar applications. AUDA-Net can effectively tackle the problem of lacking sufficient samples of newly listed stocks, and can desirably forecast their trends.

It is worth noting that, the proposed AUDA-Net model can also be applied to solve other SSS time series trend prediction problems, whereas the research focus of this work is on SSS stock trend forecasting.

In this paper, we evaluate and compare the SSS stock trend forecasting performance of our AUDA-Net model and some other state-of-the-art rival models by carrying out experiments on real stock data of the U.S. stock market, i.e., Oracle Bone Inscriptions (ORCL), Inter (INTC), eBay (EBAY) and Qualcomm (QCOM). Experimental results demonstrate the performance superiority of the proposed model for SSS stock trend prediction.

The remaining part of paper is organized as follows. In Section 2, existing research works on stock trend forecasting and domain adaptation are introduced. In Section 3, our proposed AUDA-Net model is described in detail. Experimental results and an analysis of the results are presented in Section 4. Finally, Section 5 concludes the paper and discusses prospects of future research work.

2. Related work

2.1 Stock trend forecasting

In recent years, stock trend forecasting has always been the research focus of financial time series forecasting. In financial and artificial intelligence circles, there are three main methods to predict stock trend.

The first method is to use econometrics, technical analysis of stock trend and other theories to analyze the stock trend [21, 22, 23]. The second method is to use the traditional statistical models, such as SV model, Autoregressive Integrated Moving Average (ARIMA) model and MA model, to predict stock price trend [24, 25, 26]. The third method is to use the ML models to forecast stock trend. Because we use the ML model to forecast the stock trend, we mainly introduce the research of stock trend prediction using ML algorithms.

There are many different models and methods in the research of stock trend prediction using ML models. Some studies use original stock data, while others use news, technical indicators, twitter mood, social media data, etc. Some studies use traditional ML algorithms, such as SVM and Single Layer Perceptron (SLP), while others use DL algorithms, such as LSTM, CNN and Recurrent Neural Network (RNN) [27, 28, 29].

There are many methods on stock trend prediction using original stock data. In [30], Das et al. used Neural Network (NN) model to forecast the trend of S&P 500 Index. In [31], Probabilistic Neural Network (PNN), RNN and Time Delay Neural Network (TDNN) are used to forecast the trend of stocks. Of course, there are some relatively new methods to predict the trend only using the original stock data. In [32], the raw stock data is transformed into two-dimensional images, and then the CNN model is used for trend prediction. In [33], the Empirical Mode Decomposition and Factorization Machine based Neural Network (EMD2FNN) is used to predict the stock trend.

There are also some studies on stock trend forecasting using raw stock data, technical indicators and other types of data simultaneously. In [34], four different artificial neural networks, SVM, Naïve Bayes and Decision Tree (NBDT) are compared for stock trend prediction. In [35], technical indicators, raw stock data and LSTM model are used to forecast the trend of stocks from Brazil stock exchange. In [36], Liang et al. used Restricted Boltzmann Machine (RBM), technical indicators and OCHLV to forecast stock trends. In [37], the feature combinations are used as the model input, and the CNN model is used to forecast the stock trend.

In addition, some text information, such as news and twitter, have an important impact on the stock price [38]. Therefore, there exist some studies on the prediction of stock price trend using text information. In [39], Peng et al. proposed an algorithm for stock trend forecasting using Deep Neural Network (DNN) and word embedding technology. In [40], DNN model and Twitter sentiment are used for trend prediction.

At present, almost all the existing stock trend prediction studies are focused on the stocks with a large number of samples. However, the research on the trend prediction of newly listed stocks, which is both valuable and challenging, is virtually blank. And our work fills this gap.

2.2 Domain adaptation

2.2.1 Definition

In this section, we introduce the formal definition of domain adaptation [41]. Assume there exist two datasets: a source domain $S$ which consisted of abundant labeled training samples, and a target domain $T$ which consisted of a few labeled or unlabeled samples. We assume that the source domain $S$ is sampled from a probability distribution $p_{S}$ and the target domain $T$ is sampled from a probability distribution $p_{T}$ . In the setting of domain adaptation, the source domain and target domain are sampled from different probability distributions, i.e. $p_{S}\neq p_{T}$ . The learning problem of domain adaptation is aimed to find a mapping function which can realize an effective transfer from the source domain $S$ to the target domain $T$ , i.e. the model, which is trained on samples sampled from a distribution $p_{S}$ , generalizes well on samples sampled from a distribution $p_{T}$ .

2.2.2 Existing works on domain adaptation

Domain adaptation is a very popular field in transfer learning [42, 43, 44]. The early domain adaptation methods focused on the problem in the field of natural language processing [45, 46, 47]. But recently, domain adaptation methods are widely used in the field of computer vision [48, 49, 50]. At present, domain adaptation methods have not been applied in stock trend prediction. To the best of our knowledge, we are the first one to use domain adaptation methods in the field of stock trend forecasting.

Because our algorithm uses the adversarial domain adaptation method, this paper mainly introduces the adversarial domain adaptation.

In [51], through the confrontation between generator and discriminator, the performance of generator and discriminator is improved. The goal of the generator is to generate fake samples that can fool the discriminator. The goal of the discriminator is to accurately judge whether the samples come from the fake samples or the real samples. The thought of adversarial domain adaptation is very similar to this algorithm. The adversarial domain adaptation methods exploit the confrontation between domain discriminator and feature extractor to acquire domain-invariant features [52].

In [12], the Adversarial Discriminative Domain Adaptation (ADDA) algorithm achieves domain adaptation by training the target encoder and the discriminator to make the features extracted by the target encoder closer to the source features. In [53], CDAN learns the transferable and disentangled features by embedding the adversarial learning into the deep architectures. In [54], Rios et al. proposed an adversarial domain adaptation algorithm for solving the problem of relation classification. In [55], ADA matches the distributions between the real-world domain and the simulation domain through the adversarial domain adaptation process.

In the Unsupervised Domain Adaptation by Backpropagation (UDAB) algorithm, innovatively, the gradient reversal layer is used to maximize the loss of domain classifier, and the domain adaptation process is integrated into the deep feedforward network [56]. This idea gave me a lot of inspiration. We use their ideas for reference in the domain adaptation part of our model. But our work is very different from UDAB. Its work is for the target domain with a large number of unlabeled samples, while we focus on the target domain with few unlabeled samples. The issue we address is much more difficult than UDAB, and the performance on the target domain with a few samples is much better than it.

3. Proposed method

In this section, firstly, we briefly introduce the working principle of GAN. Secondly, we explain our proposed Adversarial Unsupervised Domain Adaptation Network (AUDA-Net), in detail.

3.1 GAN

The goal of GAN is to generate fake samples with the same distribution as real samples. In order to achieve this goal, GAN uses a generator for generating fake samples and a discriminator for judging whether the samples are real or fake [51]. GAN improves the ability of generator and discriminator through the game between them. GAN is mainly used in the field of computer vision. Next, we explain how we train a GAN model to generate fake images that are as similar to real images as possible. Firstly, we generate some fake images by inputting random noise vectors into the generator. Secondly, we input both fake images and real images into the discriminator, at the same time. After that, we update the generator and the discriminator according to the feedback of the discriminator, so that the fake images generated by the generator can fool the discriminator, and, simultaneously, the discriminator can judge whether the images are real or fake. Figure 2 illustrates the general architecture of GAN, intuitively.

Figure 2.

The general architecture of GAN.

3.2 The proposed Adversarial Unsupervised Domain Adaptation Network (AUDA-Net)

In this section, we describe, in detail, our proposed model for the trend prediction of stocks with few samples. Firstly, we expound the two parts of the proposed AUDA-Net model, i.e., a sample generation part and a domain adaptation part. Secondly, we explain the process of searching for the optimal parameters of our model.

3.2.1 Sample generation part

We suppose that the input of our model comes from the source and target domains with different distributions. The source domain is assumed to contain $N_{s}$ different samples, i.e., ${\bm{X}}_{s}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s}}\right),\left({{\bm{x}}% _{2}^{s},y_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{N_{s}}^{s},y_{N_{s}}^{s}}% \right)}\right\}$ . The distribution of ${\bm{X}}_{s}$ is $p_{s}\left({{\bm{x}},y}\right)$ , and the marginal distribution of ${\bm{x}}$ is $p_{s}\left({\bm{x}}\right)$ . The target domain is assumed to contain $N_{tr}$ different samples, i.e., ${\bm{X}}_{t}=\left\{{\left({{\bm{x}}_{1}^{t},y_{1}^{t}}\right),\left({{\bm{x}}% _{2}^{t},y_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{tr}}^{t},y_{N_{tr}}^{t}}% \right)}\right\}$ . The distribution of ${\bm{X}}_{t}$ is $p_{t}\left({{\bm{x}},y}\right)$ , and the marginal distribution of ${\bm{x}}$ is $p_{t}\left({\bm{x}}\right)$ . Because the sample size of the target domain is very small, we have to solve the Small Sample Size (SSS) problem specifically for the target domain. For addressing this issue, in this work, we train a GAN model to generate massive fake samples with the same or very similar distribution as the target samples, i.e., the transferable samples.

The sample generation part of our proposed AUDA-Net model is composed of a generator $G$ and a discriminator $D$ . As with [51], the input of the generator $G$ is the random noise vectors $\left\{{{\bm{r}}_{1},{\bm{r}}_{2},\ldots,{\bm{r}}_{N_{r}}}\right\}$ and the input of the discriminator $D$ is the real target samples and fake target samples generated by the generator $G$ . In order to learn the distribution $p_{G}\left({\bm{x}}\right)$ of generator $G$ over the input samples ${\bm{x}}$ , firstly, we suppose that the distribution of random noise vectors is $p\left({\bm{r}}\right)$ . Then, it is assumed that all the parameters of the generator $G$ and discriminator $D$ are $\theta_{G}$ and $\theta_{D}$ , and their corresponding mappings are $M_{G}\left({{\bm{r}};\theta_{G}}\right)$ and $M_{D}\left({{\bm{x}};\theta_{D}}\right)$ , respectively. We input a random noise vector ${\bm{r}}$ into the generator $G$ to generate fake samples $M_{G}\left({\bm{r}}\right)$ . The fake samples generated by the generator $G$ and the real samples are passed into the discriminator $D$ to determine whether the incoming samples are real or fake.

The generator $G$ looks for the optimum parameters $\theta_{G}^{\ast}$ that can maximize the error of the discriminator $D$ on the generated fake samples. At the same time, the discriminator $D$ seeks the optimum parameters $\theta_{D}^{\ast}$ which can minimize the error in determining whether the samples are real or fake. Its mathematical expression is as follows:

$\displaystyle{\min}_{\theta_{G}}{\max}_{\theta_{D}}V\left({M_{D},M_{G}}\right)% =E_{1}+E_{2}$ (1) $\displaystyle E_{1}=\mathbb{E}_{{\bm{x}}\sim p_{t}\left({\bm{x}}\right)}\left[% {\text{log}M_{D}\left({\bm{x}}\right)}\right]$ (2) $\displaystyle E_{2}=\mathbb{E}_{{\bm{r}}\sim p\left({\bm{r}}\right)}\left[{% \mbox{log}\left({1-M_{D}\left({M_{G}\left({\bm{r}}\right)}\right)}\right)}\right]$ (3)

Obviously, this is an optimization problem. Firstly, we optimize discriminator $D$ and then optimize generator $G$ , which is essentially two optimization problems. As shown in the following formulae:

$\displaystyle\hskip-8.535827pt{\max}_{\theta_{D}}V\left({M_{D},M_{G}}\right)=E% _{1}+E_{2}=\mathbb{E}_{{\bm{x}}\sim p_{t}\left({\bm{x}}\right)}\left[{\text{% log}M_{D}\left({\bm{x}}\right)}\right]+\mathbb{E}_{{\bm{r}}\sim p\left({\bm{r}% }\right)}\left[{\text{log}\left({1-M_{D}\left({M_{G}\left({\bm{r}}\right)}% \right)}\right)}\right]$ (4) $\displaystyle\hskip-8.535827pt{\min}_{\theta_{G}}V\left({M_{D},M_{G}}\right)=E% _{2}=\mathbb{E}_{{\bm{r}}\sim p\left({\bm{r}}\right)}\left[{\text{log}\left({1% -M_{D}\left({M_{G}\left({\bm{r}}\right)}\right)}\right)}\right]$ (5)

For the optimization problems of Eqs (4) and (5), we adopt the stochastic gradient descent method to solve.

3.2.2 Domain adaptation part

According to the above series of processes, we obtain the generation distribution $p_{G}\left({M_{G}\left({\bm{r}}\right)}\right)$ of random noise ${\bm{r}}$ , and $p_{G}\left({M_{G}\left({\bm{r}}\right)}\right)=p_{t}\left({\bm{x}}\right)$ . Generator $G$ can generate a large number of fake samples with the same distribution as the target domain samples according to $p_{G}\left({M_{G}\left({\bm{r}}\right)}\right)$ . We combine the fake samples and real target samples to form a new target domain. We assume that there are $N_{t}$ samples in the new target domain. At this time, there are only a little real samples in the new target domain, and the rest of samples are fake samples with the same distributions as the target domain samples. In order to distinguish whether each sample belongs to the source domain or the target domain, we stipulate that if the sample ${\bm{x}}$ belongs to the source domain, its domain label $d=0$ , otherwise, the domain label $d=1$ . Therefore, the new source domain ${\bm{X}}_{s}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}% \right),\left({{\bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm% {x}}_{N_{s}}^{s},y_{N_{s}}^{s},d_{N_{s}}^{s}}\right)}\right\}$ and the new target domain ${\bm{X}}_{t}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left(% {{\bm{x}}_{2}^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{t}}^{t},d_{N_{t% }}^{t}}\right)}\right\}$ . Because we use the unsupervised method, the samples in the new target domain do not contain class labels.

Our ultimate goal is to accurately predict the real class labels of the unseen samples from the target domain. Since we lack the class labels of the training target samples, we have no way to train our model in a supervised way. In other words, our main goal is to train the class classifier $C$ trained on the source features and the feature extractor $F$ , so that the class classifier $C$ can be used to predict the real class labels of the target features generated by the feature extractor $F$ .

In order for the class classifier $C$ trained on the source features to accurately predict the real class label of the target features, we must make the distributions of the source and target features, extracted by the feature extractor $F$ , the same or very similar, and we must also ensure that the class classifier $C$ can accurately predict the real label of the source features. This goal is achieved by the feature extractor $F$ , the class classifier $C$ and the domain discriminator $D_{d}$ .

We suppose all the parameters of feature extractor $F$ , class classifier $C$ and domain discriminator $D_{d}$ are $\theta_{F}$ , $\theta_{C}$ and $\theta_{D_{d}}$ . In the training phase, we input the training samples of source and target domains ${\bm{I}}=\left\{{{\bm{x}}_{1}^{s},{\bm{x}}_{2}^{s},\ldots,{\bm{x}}_{N_{s}}^{s}% ,{\bm{x}}_{1}^{t},{\bm{x}}_{2}^{t},\ldots,{\bm{x}}_{N_{t}}^{t}}\right\}$ to the feature extractor $F$ . We suppose that the feature extractor $F$ maps each sample $x\in R^{m}$ to an $n$ -dimensional feature ${\bm{f}}_{n}\in{\bm{R}}^{n}$ , i.e., ${\bm{f}}_{n}=M_{F}\left({{\bm{x}};\theta_{F}}\right)$ . Then, the $n$ -dimensional features of source and target domains are inputted into the domain discriminator $D_{d}$ to predict the domain labels of the samples, i.e., $d=M_{D_{d}}\left({{\bm{f}}_{n};\theta_{D_{d}}}\right)\left\{{0,1}\right\}$ . We input the $n$ -dimensional features of source domain into the class classifier $C$ to predict the real class labels of the samples, i.e., $y=M_{C}\left({{\bm{f}}_{n};\theta_{C}}\right)$ .

Next, we present the formal definitions of feature extractor $F$ and domain discriminator $D_{d}$ . For simplicity, we only consider a single layer neural network architecture. The feature extractor $F$ is designed to learn a mapping function $M_{F}\left({{\bm{x}};\theta_{F}}\right):{\bm{I}}\to{\bm{R}}^{n}$ , and the parameters $\theta_{F}$ are composed of a matrix-vector pair $\left({{\bm{W}},{\bm{b}}}\right)\in{\bm{R}}^{n\times m}\times{\bm{R}}^{n}$ :

$\displaystyle M_{F}\left({{\bm{x}};\theta_{F}}\right)=M_{F}\left({{\bm{x}};{% \bm{W}},{\bm{b}}}\right)=\text{sigm}\left({{\bm{W}}{\bm{x}}+{\bm{b}}}\right)$ (6)

where in Eq. (6),

$\displaystyle\text{sigm}\left({\bm{\alpha}}\right)=\left[{\frac{1}{1+\exp\left% ({-\alpha_{i}}\right)}}\right]_{i=1}^{\left|\alpha\right|}$ (7)

Similarly, the domain discriminator $D_{d}$ learns a mapping function $M_{D_{d}}\left({{\bm{f}}_{n};\theta_{D_{d}}}\right):{\bm{R}}^{n}\to\left[{0,1}\right]$ , and the parameters $\theta_{D_{d}}$ are consisted of a vector-scalar pair $\left({{\bm{\sigma}},z}\right)\in{\bm{R}}^{n}\times{\bm{R}}$ :

$\displaystyle M_{D_{d}}\left({{\bm{f}}_{n};\theta_{D_{d}}}\right)=\text{sigm}% \left({{\bm{\sigma}}^{T}{\bm{f}}_{n}+z}\right)=\text{sigm}\left({{\bm{\sigma}}% ^{T}M_{F}\left({{\bm{x}};{\bm{W}},{\bm{b}}}\right)+z}\right)$ (8)

In the training stage, we need to find the optimum parameters $\theta_{F}^{\ast}$ , $\theta_{C}^{\ast}$ of feature extractor $F$ and the class classifier $C$ , respectively, to minimize the error of the class classifier $C$ on the source features. In order to make the source features and target features have the same distribution, we have to seek the optimum parameters $\theta_{F}^{\ast}$ that maximize the error of the domain discriminator $D_{d}$ on the features extracted by feature extractor $F$ . Besides, we also need to find the optimum parameters $\theta_{D_{d}}^{\ast}$ of domain discriminator $D_{d}$ to minimize the error of domain discriminator $D_{d}$ . In order to find these optimum parameters $\theta_{F}^{\ast}$ , $\theta_{C}^{\ast}$ and $\theta_{D_{d}}^{\ast}$ , we need to optimize the following value function:

$\displaystyle\mathop{\max}\limits_{\theta_{D_{d}}}\mathop{\min}\limits_{\theta% _{F},\theta_{C}}\textit{Loss}=L_{1}-\alpha\left({L_{2}+L_{3}}\right)$ (9) $\displaystyle L_{1}=\mathop{\sum}\limits_{i=1}^{N_{s}}L_{C}\left({M_{C}\left({% M_{F}\left({{\bm{x}}_{i}^{s};\theta_{F}}\right);\theta_{C}}\right),y_{i}^{s}}\right)$ (10) $\displaystyle L_{2}=\mathop{\sum}\limits_{i=1}^{N_{s}}L_{D_{d}}\left({M_{D_{d}% }\left({M_{F}\left({{\bm{x}}_{i}^{s};\theta_{F}}\right);\theta_{D_{d}}}\right)% ,d_{i}^{s}}\right)$ (11) $\displaystyle L_{3}=\mathop{\sum}\limits_{i=1}^{N_{t}}L_{D_{d}}\left({M_{D_{d}% }\left({M_{F}\left({{\bm{x}}_{i}^{t};\theta_{F}}\right);\theta_{D_{d}}}\right)% ,d_{i}^{t}}\right)$ (12)

where $L_{C}$ represents the loss function of class classifier $C$ , $L_{D_{d}}$ represents the loss function of the domain discriminator $D_{d}$ and the parameter $\alpha$ controls the balance of classification task and discrimination task. Obviously, Eq. (9) can be simplified into two optimization problems. As shown in the following formula:

$\displaystyle{\min}_{\theta_{F},\theta_{C}}\textit{Loss}=L_{1}-\alpha\left({L_% {2}+L_{3}}\right)$ (13) $\displaystyle{\max}_{\theta_{D_{d}}}\textit{Loss}=-\alpha\left({L_{2}+L_{3}}\right)$ (14)

According to Eqs (9)–(12), the loss function Loss is composed of three parts. $L_{1}$ represents the loss of the class classifier $C$ on the source feature extracted by the feature extractor $F$ . In order to reduce the classification error of source features, we need to seek $\theta_{F}^{\ast}$ and $\theta_{C}^{\ast}$ that can minimize $L_{1}$ . $L_{2}$ and $L_{3}$ represent the loss of the domain discriminator $D_{d}$ on the source features and the target features respectively. In order to make the source features and target features have the same distribution, we need to find the parameters $\theta_{F}^{\ast}$ that can maximize $L_{2}$ and $L_{3}$ . Meanwhile, we need to find the parameters $\theta_{D_{d}}^{\ast}$ that can minimize $L_{2}$ and $L_{3}$ to improve the discrimination ability of the domain discriminator $D_{d}$ . We multiply the sum of $L_{2}$ and $L_{3}$ by a negative number in order to achieve the above goals. Therefore, we only need to seek the parameters $\theta_{F}^{\ast},\theta_{C}^{\ast}$ that can minimize the value function Loss and the parameters $\theta_{D_{d}}^{\ast}$ that can maximize the value function Loss.

3.2.3 Parameter optimization part

Next, we introduce how to adopt the Stochastic Gradient Descent (SGD) method to solve all the optimization problems in Eqs (4)–(5) and Eqs (13)–(14). We suppose that $n$ samples are randomly sampled from the target domain and the random noise respectively, i.e., ${\bm{X}}=\left\{{{\bm{x}}_{1},{\bm{x}}_{2},\ldots,{\bm{x}}_{n}}\right\},{\bm{x% }}_{i}\in{\bm{X}}_{t},i=1,2,\ldots,n$ , ${\bm{R}}=\left\{{{\bm{r}}_{1},{\bm{r}}_{2},\ldots,{\bm{r}}_{n}}\right\}$ . Consequently, the gradient of Eq. (4) for $\theta_{D}$ is as bellow:

$\displaystyle\nabla\theta_{D}=\nabla_{\theta_{D}}V\left({M_{D},M_{G}}\right)=% \frac{\partial V}{\partial\theta_{D}}=\frac{\partial}{\partial\theta_{D}}$

(15) $\displaystyle\quad\left\{{\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\left[\log% \left(M_{D}\left({{\bm{x}}_{i}}\right)\right)+\log\left(1-M_{D}\left({M_{G}% \left({{\bm{r}}_{i}}\right)}\right)\right)\right]}\right\}$

The gradient of Eq. (5) for $\theta_{G}$ is as bellow:

$\displaystyle\nabla\theta_{G}=\nabla_{\theta_{G}}V\left({M_{D},M_{G}}\right)=% \frac{\partial V}{\partial\theta_{G}}=\frac{\partial}{\partial\theta_{G}}\left% [\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\mbox{log}\left({1-M_{D}\left({M_{G}% \left({{\bm{r}}_{i}}\right)}\right)}\right)\right]$ (16)

We suppose $m$ samples are respectively sampled from the new source domain and the new target domain, i.e., ${\bm{S}}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}\right),\left({{% \bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{m}^{s},y_{% m}^{s},d_{m}^{s}}\right)}\right\},\left({{\bm{x}}_{i}^{s},y_{i}^{s},d_{i}^{s}}% \right)\in{\bm{X}}_{s}^{\prime},i=1,2,\ldots,m$ , ${\bm{T}}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left({{\bm{x}}_{2}% ^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{m}^{t},d_{m}^{t}}\right)}\right% \},\left({{\bm{x}}_{i}^{t},d_{i}^{t}}\right)\in{\bm{X}}_{t}^{\prime},i=1,2,% \ldots,m$ . Consequently, the gradients of Eq. (13) for $\theta_{F}$ and $\theta_{C}$ are as bellow:

$\displaystyle\nabla\theta_{F}=\nabla_{\theta_{F}}\textit{Loss}=\frac{\partial Loss% }{\partial\theta_{F}}=\frac{\partial}{\partial\theta_{F}}\left[{L_{1}-\alpha% \left({L_{2}+L_{3}}\right)}\right]=\frac{\partial L_{1}}{\partial\theta_{F}}-% \alpha\left(\frac{\partial L_{2}}{\partial\theta_{F}}+\frac{\partial L_{3}}{% \partial\theta_{F}}\right)$ (17) $\displaystyle\nabla\theta_{C}=\nabla_{\theta_{C}}Loss=\frac{\partial\textit{% Loss}}{\partial\theta_{C}}=\frac{\partial}{\partial\theta_{C}}\left[{L_{1}-% \alpha\left({L_{2}+L_{3}}\right)}\right]=\frac{\partial L_{1}}{\partial\theta_% {C}}$ (18)

The gradient of Eq. (14) for $\theta_{D_{d}}$ is as bellow:

$\displaystyle\nabla\theta_{D_{d}}=\nabla_{\theta_{D_{d}}}\textit{Loss}=\frac{% \partial\textit{Loss}}{\partial\theta_{D_{d}}}=\frac{\partial}{\partial\theta_% {D_{d}}}\left[{-\alpha\left({L_{2}+L_{3}}\right)}\right]=-\alpha\left(\frac{% \partial L_{2}}{\partial\theta_{D_{d}}}+\frac{\partial L_{3}}{\partial\theta_{% D_{d}}}\right)$ (19)

In the process of solving all optimization problems by using SGD, the update of each parameters is shown in the following formula:

$\displaystyle\theta_{D}=\theta_{D}-\beta\nabla\theta_{D}$ (20) $\displaystyle\theta_{G}=\theta_{G}-\beta\nabla\theta_{G}$ (21) $\displaystyle\theta_{F}=\theta_{F}-\gamma\nabla\theta_{F}$ (22) $\displaystyle\theta_{C}=\theta_{C}-\gamma\nabla\theta_{C}$ (23) $\displaystyle\theta_{D_{d}}=\theta_{D_{d}}-\gamma\nabla\theta_{D_{d}}$ (24)

where the parameters $\beta,\gamma$ represent the learning rate of sample generation part and domain adaptation part, respectively. The implementation details of our algorithm are shown in Algorithm 1. In a word, our algorithm can be roughly divided into two parts. In the first part, we train a GAN model to generate plenty of unlabeled fake samples with the same distribution as the target samples. In the second part, in order to accurately predict the real class label of the target samples using the class classifier $C$ trained on the source samples, we adopt the adversarial unsupervised domain adaptation method. In this part, the source features and target features have the same distribution by training the feature extractor $F$ and domain discriminator $D_{d}$ . And we need to train the class classifier $C$ which can accurately predict the class of the source features.

The computational cost of our AUDA-Net model comes from its sample generation part and domain adaptation part. In order to generate massive fake target samples, the time spent by the sample generation part is ${\bm{O}}\left({\textit{num\_iter}_{1}\ast N_{\textit{tr}}}\right)$ , where $\textit{num\_iter}_{1}$ represents the number of iterations in the sample generation part, and $N_{\textit{tr}}$ is the number of samples in the target domain. Since there are very few samples in the target domain, the calculation speed of this part is fast. To decrease the discrepancy the source and target domains and use the classifier trained on the source domain to predict the labels of source samples, the calculation complexity of the domain adaptation part is ${\bm{O}}\left(\textit{num\_iter}_{2}\ast\min\left\{N_{t},N_{s}\right\}\right)$ , where $\textit{num\_iter}_{2}$ is the iteration number in the domain adaptation part, $N_{t}$ represents the sample number of the new target domain, which is formed by combining the few real target samples and the massive fake target ones, $N_{s}$ represents the sample number of the source domain, and $\min\left\{{N_{t},N_{s}}\right\}$ is the minimum value of $N_{t}$ and $N_{s}$ . Therefore, the computational complexity of our proposed AUDA-Net model is ${\bm{O}}\left(\textit{num\_iter}_{1}\ast N_{\textit{tr}}+\textit{num\_iter}_{2% }\ast\min\left\{N_{t},N_{s}\right\}\right)$ . The computational complexity of the other benchmark algorithms is ${\bm{O}}\left({\textit{num\_iter}\ast N_{\textit{tr}}}\right)$ , where num_iter is the iteration number of their model. Since the predictive accuracy of our model is improved by generating a large number of fake target samples, the calculation complexity of our model is relatively higher than the other benchmark algorithms.

Algorithm 1: The AUDA-Net Algorithm

Input: ${\bm{X}}_{s}$ -source domain with massive samples ${\bm{X}}_{s}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s}}\right),\left({{\bm{x}}% _{2}^{s},y_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{N_{s}}^{s},y_{N_{s}}^{s}}% \right)}\right\}$ .

${\bm{X}}_{t}$ -target domain with a few samples ${\bm{X}}_{t}=\left\{{\left({{\bm{x}}_{1}^{t},y_{1}^{t}}\right),\left({{\bm{x}}% _{2}^{t},y_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{tr}}^{t},y_{N_{\textit{% tr}}}^{t}}\right)}\right\}$ .

learning rate $\beta,\gamma$ , free parameter $\alpha$

Outputs: Feature extractor $F$ , Adapted class classifier $C$

Phase 1:

1: for each iteration do

2: for each mini-batch do

3: Sample $n$ samples ${\bm{R}}$ from random noise, i.e., ${\bm{R}}=\left\{{{\bm{r}}_{1,},{\bm{r}}_{2},\ldots,{\bm{r}}_{n}}\right\}$

4: Sample $n$ samples ${\bm{X}}$ from target domain ${\bm{X}}_{t}$ , i.e., ${\bm{X}}=\left\{{{\bm{x}}_{1},{\bm{x}}_{2},\ldots,{\bm{x}}_{n}}\right\},{\bm{x% }}_{i}\in{\bm{X}}_{t},i=1,2,\ldots,n.$

5: Compute the gradient of Eq. (4) for $\theta_{D}$ according Eq. (3.2.3)

6: Compute the gradient of Eq. (5) for $\theta_{G}$ according Eq. (16)

7: Update the parameters $\theta_{D}$ of the discriminator $D$ according to Eq. (20)

8: Update the parameters $\theta_{G}$ of the generator $G$ according to Eq. (21)

9: end for

10: end for

Phase 2:

11: According to Phase 1, generate a large number of fake samples with the same distribution as the target samples.

12: The fake samples and real target samples are combined to form a new target domain ${\bm{X}}_{t}^{\prime}$ , and each sample in the new target domain is labeled with a domain label, i.e., ${\bm{X}}_{t}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left(% {{\bm{x}}_{2}^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{t}}^{t},d_{N_{t% }}^{t}}\right)}\right\}$

13: Label each sample in the source domain with a domain label to form a new source domain ${\bm{X}}_{s}^{\prime}$ , denoted as ${\bm{X}}_{s}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}% \right),\left({{\bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm% {x}}_{N_{s}}^{s},y_{N_{s}}^{s},d_{N_{s}}^{s}}\right)}\right\}$

14: for each iteration do

15: for each mini-batch do

16: Sample $m$ samples ${\bm{S}}$ from the new source domain $X_{s}^{\prime}$ i.e.,

${\bm{S}}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}\right),\left({{% \bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{m}^{s},y_{% m}^{s},d_{m}^{s}}\right)}\right\},\left({{\bm{x}}_{i}^{s},y_{i}^{s},d_{i}^{s}}% \right)\in{\bm{X}}_{s}^{\prime},i=1,2,\ldots,m$

17: Sample $m$ samples ${\bm{T}}$ from the new target domain $X_{t}^{\prime}$ , i.e.,

${\bm{T}}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left({{\bm{x}}_{2}% ^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{m}^{t},d_{m}^{t}}\right)}\right% \},\left({{\bm{x}}_{i}^{t},d_{i}^{t}}\right)\in{\bm{X}}_{t}^{\prime},i=1,2,% \ldots,m$

18: Compute gradients of Eq. (13) for $\theta_{F}$ and $\theta_{C}$ according to Eqs (17)–(18).

19: Compute the gradient of Eq. (14) for $\theta_{D_{d}}$ according to Eq. (19).

20: Update the parameters $\theta_{F}$ of the feature extractor $F$ according to Eq. (22).

21: Update the parameters $\theta_{C}$ of the class classifier $C$ according to Eq. (23).

22: Update the parameters $\theta_{D_{d}}$ of the domain discriminator $D_{d}$ according to Eq. (24).

23: end for

24: end for

3.2.4 Application of AUDA-Net in stock trend prediction

Algorithm 1: The AUDA-Net Algorithm
1:	for each iteration do
2:	for each mini-batch do
3:	Sample $n$ samples ${\bm{R}}$ from random noise, i.e., ${\bm{R}}=\left\{{{\bm{r}}_{1,},{\bm{r}}_{2},\ldots,{\bm{r}}_{n}}\right\}$
4:	Sample $n$ samples ${\bm{X}}$ from target domain ${\bm{X}}_{t}$ , i.e., ${\bm{X}}=\left\{{{\bm{x}}_{1},{\bm{x}}_{2},\ldots,{\bm{x}}_{n}}\right\},{\bm{x% }}_{i}\in{\bm{X}}_{t},i=1,2,\ldots,n.$
5:	Compute the gradient of Eq. (4) for $\theta_{D}$ according Eq. (3.2.3)
6:	Compute the gradient of Eq. (5) for $\theta_{G}$ according Eq. (16)
7:	Update the parameters $\theta_{D}$ of the discriminator $D$ according to Eq. (20)
8:	Update the parameters $\theta_{G}$ of the generator $G$ according to Eq. (21)
9:	end for
10:	end for
Phase 2:
11:	According to Phase 1, generate a large number of fake samples with the same distribution as the target samples.
12:	The fake samples and real target samples are combined to form a new target domain ${\bm{X}}_{t}^{\prime}$ , and each sample in the new target domain is labeled with a domain label, i.e., ${\bm{X}}_{t}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left(% {{\bm{x}}_{2}^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{t}}^{t},d_{N_{t% }}^{t}}\right)}\right\}$
13:	Label each sample in the source domain with a domain label to form a new source domain ${\bm{X}}_{s}^{\prime}$ , denoted as ${\bm{X}}_{s}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}% \right),\left({{\bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm% {x}}_{N_{s}}^{s},y_{N_{s}}^{s},d_{N_{s}}^{s}}\right)}\right\}$
14:	for each iteration do
15:	for each mini-batch do
16:	Sample $m$ samples ${\bm{S}}$ from the new source domain $X_{s}^{\prime}$ i.e.,
	${\bm{S}}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}\right),\left({{% \bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{m}^{s},y_{% m}^{s},d_{m}^{s}}\right)}\right\},\left({{\bm{x}}_{i}^{s},y_{i}^{s},d_{i}^{s}}% \right)\in{\bm{X}}_{s}^{\prime},i=1,2,\ldots,m$
17:	Sample $m$ samples ${\bm{T}}$ from the new target domain $X_{t}^{\prime}$ , i.e.,
	${\bm{T}}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left({{\bm{x}}_{2}% ^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{m}^{t},d_{m}^{t}}\right)}\right% \},\left({{\bm{x}}_{i}^{t},d_{i}^{t}}\right)\in{\bm{X}}_{t}^{\prime},i=1,2,% \ldots,m$
18:	Compute gradients of Eq. (13) for $\theta_{F}$ and $\theta_{C}$ according to Eqs (17)–(18).
19:	Compute the gradient of Eq. (14) for $\theta_{D_{d}}$ according to Eq. (19).
20:	Update the parameters $\theta_{F}$ of the feature extractor $F$ according to Eq. (22).
21:	Update the parameters $\theta_{C}$ of the class classifier $C$ according to Eq. (23).
22:	Update the parameters $\theta_{D_{d}}$ of the domain discriminator $D_{d}$ according to Eq. (24).
23:	end for
24:	end for

In this section, we introduce an example of using the proposed AUDA-Net model to predict the stock trend. We assume that the newly listed stock, which is needed to be predicted, is the target stock, and the stock with large samples, which is used to help implement trend forecast, is the source stock.

The raw stock data and technical indicators of source and target stocks are combined as the source and target domains respectively. And we divide the target domain into two parts: i.e. training set and test set. Since the value ranges of different variables in source and target domains are very different, we standardize each variable. Similar to the other stock trend prediction methods [57], we reconstruct samples collected from the source and target domains according to time window size $l$ . In other words, we use the historical data over the past $l$ days to predict the stock trend. The mathematical expression of source or target domains is described below:

$\displaystyle{\bm{D}}=\left\{{\left({{\bm{x}}_{1},y_{1}}\right),\left({{\bm{x}% }_{2},y_{2}}\right),\ldots,\left({{\bm{x}}_{n},y_{n}}\right)}\right\}$ (25)

where in Eq. (25),

$\displaystyle{\bm{x}}_{i}=\left\{{{\bm{s}}_{i-l+1},{\bm{s}}_{i-l+2},\ldots,{% \bm{s}}_{i},{\bm{t}}_{i-l+1},{\bm{t}}_{i-l+2},\ldots,{\bm{t}}_{i}}\right\}$ (26)

where $x_{i}$ is the $i$ -th sample of source or target domains, ${\bm{s}}_{i}$ and ${\bm{t}}_{i}$ respectively denote the raw stock data and technical indicators on the $i$ -th day, and $l$ is the time window size. The specific settings of experiments, such as technical indicators, time window size and data standardization method are explained in Section 4.

Next, we introduce the training process of the proposed AUDA-Net model. In each iteration of the sample generation part, we firstly input a random noise vector into the generator $G$ to generate fake samples. The fake samples generated by the generator $G$ and the real target domain samples are passed into the discriminator $D$ to determine whether the incoming samples are real or fake.

After the sample generation phase, the trained generator $G$ can generate substantial unlabeled fake samples which have the same probability distribution as the target samples, i.e. fake target samples. Then we combine the fake target samples generated by the generator $G$ and real target samples to form a new target domain.

In each iteration of the domain adaptation part, we firstly input the training samples of source and target domains into the feature extractor $F$ to extract the source and target features respectively. Then, the source and target features are inputted into the domain discriminator $D_{d}$ to distinguish whether the input feature is from the source features or the target features. Finally, we input the source features into the class classifier $C$ to predict the real class labels of the samples.

In the testing phase, we use the feature extractor $F$ and the class classifier $C$ , which is trained on basis of the source features, to predict the labels of the target test samples.

4. Experiments

4.1 Experimental data

In our work, we evaluate the trend prediction performance of our proposed AUDA-Net model on eight groups of real stock data of the U.S. stock market, with the specific setting of these eight groups of experiments being explained in Section 4.3. The stocks used in this paper include Amazon (AMZN), ORCL, Apple Inc. (AAPL), EBAY, INTC, Alibaba Group (BABA) and QCOM. All the stocks are from Yahoo Finance.

In order to better show the favorable trend prediction performance of our algorithm in the trend forecasting of newly listed stocks, we compare our model with some supervised learning algorithms that require a large number of labeled samples. Therefore, the stock datasets with massive samples are selected as the target domain, i.e., ORCL, INTC, EBAY and QCOM. While the focus of this research is on SSS trend forecasting for newly listed stocks, therefore, only 23 samples from July 10, 2019 to August 12, 2019 are selected as the training set of target domain. And 42 samples from August 12, 2019 to October 10, 2019 are selected as the test set of target domain. The samples incorporated into the source domain are from the listing date of stocks to July 10, 2019.

In financial community, a lot of investors and fund managers use technical indicators to study the trend of stock price [58]. We select the original stock data and some technical indicators as the input of the model. The model output is one of the rise and fall of the next day’s stock price. The raw stock data include the lowest price, the highest price, close price, adjusted close price, open price and volume.

We now introduce our selected technical indicators in detail. The first technical indicator is the Simple Moving Average (SMA). SMA reflects the trend of stock price change. If SMA rises within a period of time, it indicates that the stock price has a tendency to rise; otherwise, it indicates that the stock price has a downward trend. The second technical indicator is the Exponentially Weighted Moving Average (EWMA). EWMA, which is an improvement of SMA, gives a different weight to the data of each day. The weight of daily data decreases with the increase of time intervals. The third technical indicator is the Force Index (FI). FI reflects the strength of the upward or downward trend of the stock price. The fourth technical indicator is the Commodity Channel Index (CCI). CCI is used to detect whether the stock price is abnormal. The fifth technical index is Ease of Movement Value (EMV). EMV reflects the trend of stock price fluctuation according to the change of stock price and trading volume. The sixth technical indicator is the Price Rate of Change (ROC). ROC is used to detect whether the trend of the stock will change. The seventh technical indicator is the Bollinger Bands (BB). BB reflects the normal wave range of stock price by calculating the standard deviation of stock price.

We assume that, $\textit{Low}_{t}$ , $\textit{High}_{t}$ , $\textit{Close}_{t}$ , $\textit{AdjClose}_{t}$ , $\textit{Open}_{t}$ , and $\textit{Volume}_{t}$ represent the lowest price, the highest price, close price, adjusted close price, open price, and volume of the stock on day $t$ , respectively. We assume $\textit{SMA}\left(n\right)_{t}$ , $\textit{EWMA}\left(n\right)_{t}$ , $\textit{FI}_{t}$ , $\textit{CCI}\left(n\right)_{t}$ , $\textit{EMV}\left(n\right)_{t}$ , $\textit{ROC}\left(n\right)_{t}$ , $\textit{UpperBB}\left(n\right)_{t}$ and $\textit{LowerBB}\left(n\right)_{t}$ represent SMA, EWMA, FI, CCI, EMV, ROC, upper trajectory of BB and lower trajectory of BB on day $t$ , where $n$ is the time interval of technical indicators. Then the mathematical expression of the above technical indicators is shown in the following formulas:

$\displaystyle\textit{SMA}\left(n\right)_{t}=\frac{\mathop{\sum}\nolimits_{i=1}% ^{n}\textit{Close}_{t-i+1}}{n}$ (27) $\displaystyle\textit{EWMA}\left(n\right)_{t}=\textit{EWMA}\left(n\right)_{t-1}% +\frac{2}{n+1}\left({\textit{Close}_{t}-\textit{EWMA}\left(n\right)_{t-1}}\right)$ (28) $\displaystyle\textit{FI}_{t}=\textit{Volume}_{t}\ast(\textit{Close}_{t}-% \textit{Close}_{t-1})$ (29) $\displaystyle\textit{CCI}\left(n\right)_{t}=\frac{\textit{TP}_{t}-\textit{MATP% }_{t}}{0.015\textit{MD}_{t}}$ (30)

where in Eq. (30),

$\displaystyle\left\{{{\begin{array}[]{l}{\textit{TP}_{t}=\frac{\textit{High}_{% t}+\textit{Low}_{t}+\textit{Close}_{t}}{3}}\\ {\textit{MATP}_{t}=\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\textit{TP}_{t-i+1% }}\\ {\textit{MD}_{t}=\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\left|{\textit{TP}_{% t-i+1}-\textit{MATP}_{t}}\right|}\\ \end{array}}}\right.$ (31) $\displaystyle\textit{EMV}\left(n\right)_{t}=\frac{\textit{DM}_{t}}{\textit{BR}% _{t}}$ (32)

where in Eq. (32),

$\displaystyle\left\{{{\begin{array}[]{l}{\textit{DM}_{t}=\frac{\textit{High}_{% t}+\textit{Low}_{t}}{2}-\frac{\textit{High}_{t-1}+\textit{Low}_{t-1}}{2}}\\ {\textit{BR}_{t}=\frac{\textit{Volume}_{t}}{100000000\ast\left({\textit{High}_% {t}-\textit{Low}_{t}}\right)}}\\ \end{array}}}\right.$ (33) $\displaystyle\textit{ROC}\left(n\right)_{t}=\frac{\textit{Close}_{t}-\textit{% Close}_{t-n}}{\textit{Close}_{t-n}}$ (34) $\displaystyle\textit{UpperBB}\left(n\right)_{t}=\textit{MA}\left(n\right)_{t}+% 2\ast\textit{SD}\left(n\right)_{t}$ (35) $\displaystyle\textit{LowerBB}\left(n\right)_{t}=\textit{MA}\left(n\right)_{t}-% 2\ast\textit{SD}\left(n\right)_{t}$ (36)

where in Eqs (35) and (36),

$\displaystyle\left\{{{\begin{array}[]{l}{\textit{MA}\left(n\right)_{t}=\frac{% \mathop{\sum}\nolimits_{i=t}^{t-n+1}\textit{Close}_{i}}{n}}\\ {\textit{SD}\left(n\right)_{t}=\frac{\sqrt[2]{\mathop{\sum}\nolimits_{i=1}^{n}% \left({\textit{Close}_{t-i+1}-\textit{MA}\left(n\right)_{t}}\right)^{2}}}{n}}% \\ \end{array}}}\right.$ (37)

4.2 Experimental settings

The model we proposed in this work is a new Adversarial Unsupervised Domain Adaptation Network (AUDA-Net), therefore, we compare it with two other adversarial unsupervised domain adaptation algorithms, including UDAB and ADDA. The stocks for trend prediction only contain a small number of samples, consequently, we also choose two classical few-shot learning algorithms for comparison, including MAML [59] and Prototypical Networks [60].

In order to solve the Small Sample Size (SSS) problem, the traditional statistical methods such as Bootstrapping also can be used. However, general traditional methods are based on some statistical assumptions, such as assuming that the parent sample distributions are normal [61]. Due to the instability and complexity of stock time series data, these assumptions may not be valid for the SSS stock trend prediction problems. Consequently, the traditional approach for SSS problems is not compared in this paper.

In addition, we also compare our proposed model with the DL methods which assume that the target domain has a lot of labeled samples, including LSTM and CNN. The experimental performance of these DL algorithms using abundant labeled samples can serve as the experimental performance upper boundary of our algorithm and the other baseline algorithms using only a small number of samples. Experimental comparison with the DL algorithms can intuitively reflect the predictive performance of our proposed algorithm and the benchmark algorithms. These two DL algorithms use four stock datasets as training sets, including samples from March 12, 1986 to August 10, 2019 in ORCL dataset; samples from January 2, 1999 to August 10, 2019 in INTC dataset; samples from September 24, 1998 to August 10, 2019 in EBAY dataset; and samples from December 13, 1991 to August 10, 2019 in QCOM dataset.

The technical indexes we used in the experiment are 3-day SMA, 5-day SMA, 7-day SMA, 3-day EWMA, 5-day EWMA, 7-day EWMA, 7-day CCI, 7-day EMV, 5-day ROC, 7-day upper trajectory of BB, 7-day lower trajectory of BB and 1-day FI. In our experiments, the selected time window size is 3. Consequently, in the target domain, the training set contains 12 samples and the test set contains 31 samples.

Since the value ranges of different variables in our input data are very different, we standardize each variable. We suppose $X_{i}$ is the $i$ -th variable, and $x_{i,j}$ is the value of the $i$ -th variable on the $j$ -th day. The mathematical expression of data standardization is shown in the following formula:

$\displaystyle\overline{x_{i,j}}=\frac{x_{i,j}-X_{\textit{i,mean}}}{X_{\textit{% i,std}}}$ (38)

where in Eq. (38),

$\displaystyle\left\{{{\begin{array}[]{l}{X_{\textit{i,mean}}=\frac{\mathop{% \sum}\nolimits_{j=1}^{n}x_{i,j}}{n}}\\ {X_{\textit{i,std}}=\sqrt[2]{\frac{\mathop{\sum}\nolimits_{j=1}^{n}\left({x_{i% ,j}-X_{\textit{i,mean}}}\right)^{2}}{n}}}\\ \end{array}}}\right.$ (39)

In order to compare the performance of different algorithms intuitively, we use trend prediction accuracy as the indicator of algorithm performance.

4.3 Model architectures

Our proposed AUDA-Net model consists of two parts, i.e. the sample generation part and the domain adaptation part. The sample generation part is composed of a generator and a discriminator. The generator comprises four fully-connected layers, followed by a BatchNorm layer and a LeakyReLU layer, and one fully-connected layer with tanh non-linearity ( $x\to 128\to 256\to 512\to 1024\to y)$ . The discriminator consists of two fully-connected layers with a LeakyReLU layer and one fully-connected layer, followed by a sigmoid activation function ( $y\to 512\to 256\to 1$ ). $x$ and $y$ represent the dimension of the random noise vectors and the fake target samples generated by the generator, respectively.

The second part, i.e., the domain adaptation part, is constituted with a feature extractor, a class classifier and a domain discriminator, all together. The feature extractor consists of two convolution layers, followed by a BatchNorm layer, a max-pooling layer and a ReLU layer. The class classifier consists of two fully-connected layers, combined with a BatchNorm layer and a ReLU layer, and one fully-connected layer, followed by a LogSoftmax function ( $z\to 100\to 100\to 2$ ). The domain discriminator incorporates one fully-connected layer, followed by a BatchNorm layer and a ReLU layer, and one fully-connected layer, followed by a LogSoftmax function ( $z\to 100\to 2$ ). $z$ represents the feature dimension of source and target samples.

4.4 Experimental results

Next, we will report our experimental results. As stated in Section 4.1, ORCL, INTC, EBAY and QCOM are selected as the target domains. Eight groups of experiments are planned and implemented, with their specific settings being presented in Table 1. And the trend prediction accuracies acquired by each corresponding algorithm in these eight groups of experiments are reported in Table 2.

Table 1
The specific settings of source and target domains in each group of experiments

Groups	Source domain	Target domain
Group 1	AMZN	ORCL
Group 2	AAPL	ORCL
Group 3	EBAY	INTC
Group 4	AMZN	INTC
Group 5	BABA	EBAY
Group 6	ORCL	EBAY
Group 7	INTC	QCOM
Group 8	EBAY	QCOM

Table 2

Trend predictive accuracy of each corresponding algorithm in each group of experiments

Source $\to$ Target	AUDA-Net	UDAB	ADDA	MAML	Prototypical networks	LSTM	CNN
AMZN $\to$ ORCL	90.32%	80.65%	77.42%	77.42%	74.19%	87.10%	87.10%
AAPL $\to$ ORCL	93.55%	80.65%	74.19%	80.65%	77.42%
EBAY $\to$ INTC	93.55%	74.19%	70.97%	67.74%	70.97%	83.87%	93.55%
AMZN $\to$ INTC	90.32%	80.65%	74.19%	64.52%	70.97%
BABA $\to$ EBAY	90.32%	74.19%	70.97%	77.42%	80.65%	90.32%	87.10%
ORCL $\to$ EBAY	93.55%	77.42%	70.97%	77.42%	70.97%
INTC $\to$ QCOM	93.55%	83.87%	77.42%	74.19%	80.65%	93.55%	90.32%
EBAY $\to$ QCOM	90.32%	80.65%	74.19%	70.97%	80.65%

Remark: Since there are only 31 test samples in the target test sets, it is common for the corresponding algorithms to obtain the same prediction accuracies in different groups of experiments.

Figure 3.

Different algorithms verified on ORCL (AMZN as the source domain).

The real trends of stocks and the stock trend prediction results obtained by different algorithms in the eight groups of experiments are displayed in the Figs 3–10, clearly and intuitively. In Figs 3–10, the x-coordinate, ranging from 0 to 30, represents the specific target test samples, while the y-coordinate represents the class of the target test samples, i.e., the rise or fall of the next day’s stock price, with “1” indicating the rise and “0” the opposite.

Figure 4.

Different algorithms verified on ORCL (AAPL as the source domain).

Figure 5.

Different algorithms verified on INTC (EBAY as the source domain).

Figure 6.

Different algorithms verified on INTC (AMZN as the source domain).

Figure 7.

Different algorithms verified on EBAY (BABA as the source domain).

Figure 8.

Different algorithms verified on EBAY (ORCL as the source domain).

Figure 9.

Different algorithms verified on QCOM (INTC as the source domain).

Figures 3–10 intuitively show the real stock trend and the stock trend predicted by the corresponding algorithms. It can be found from the experimental results reported in Figs 3–10 and Table 2 that, among all the conducted algorithms suitable for SSS problems, our proposed model achieves the best trend prediction accuracy, far better than those of the other comparison algorithms. While the UDAB model achieves the suboptimal accuracy in all but the fifth group of experiments. In the second and sixth groups, the prediction accuracy of the MAML algorithm is suboptimal, while the Prototypical Networks achieve the second best accuracy in the fifth and eighth groups. In the three groups including the second, fifth and sixth groups, the ADDA model achieves the worst accuracy, while the accuracy of the MAML algorithm is the worst in four groups, i.e., the third, fourth, seventh and eighth groups. The Prototypical Networks achieve the worst trend prediction accuracy in two groups of experiments, i.e., the first and sixth groups.

In order to prove whether the prediction performance of our proposed AUDA-Net model is significantly higher than other benchmark algorithms using only a small number of samples, $t$ -tests between our proposed AUDA-Net model and other benchmark algorithms trained on a few samples are implemented. The significance level of $t$ -test used in the experiments is 5%. The $t$ s-test results based on the prediction accuracies of stock trend are reported in Table 3. It can be found from the $t$ -test results reported in Table 3 that, the predictive performance of our proposed AUDA-Net model is obviously superior to other benchmark algorithms using a few samples on the eight groups of datasets. The $t$ -test results demonstrate the significant advantages of our model in predicting the trend of Small Sample Size (SSS) stocks.

Table 3

Predictive accuracies t-test results between AUDA-Net and the other benchmark algorithms on the eight groups of stock datasets

Source $\to$ Target	Indicators	UDAB	ADDA	MAML	Prototypical networks
AMZN $\to$ ORCL	$p$ -value	6.4329e-05	4.9357e-08	4.7831e-07	2.2997e-08
	H	1	1	1	1
AAPL $\to$ ORCL	$p$ -value	6.5802e-05	2.7691e-07	8.0895e-11	6.5276e-10
	H	1	1	1	1
EBAY $\to$ INTC	p-value	1.5585e-06	3.9007e-08	2.3445e-10	1.8418e-08
	H	1	1	1	1
AMZN $\to$ INTC	$p$ -value	1.1183e-05	2.0799e-07	2.1025e-07	1.4226e-06
	H	1	1	1	1
BABA $\to$ EBAY	$p$ -value	1.8376e-10	2.0748e-12	1.1933e-12	8.8204e-07
	H	1	1	1	1
ORCL $\to$ EBAY	$p$ -value	1.7208e-08	1.186e-08	7.6206e-09	2.4094e-09
	H	1	1	1	1
INTC $\to$ QCOM	$p$ -value	4.4505e-08	0.0007	1.1603e-11	0.0476
	H	1	1	1	1
EBAY $\to$ QCOM	$p$ -value	1.2475e-07	0.0072	7.6632e-12	0.0043
	H	1	1	1	1

Remark: In Table 3, H $=$ 1 indicates that the predictive performance of the proposed model is obviously superior to those of the other benchmark algorithms using a few samples at a 5% significance level on eight groups of stock datasets.

Figure 10.

Different algorithms verified on QCOM (EBAY as source domain).

4.5 Cross validation

In this part, five-fold cross validation is employed to further evaluate the predictive performance of our proposed model and other benchmark algorithms. Since the LSTM model and CNN model are trained on a large number of labeled samples, we only use cross validation to test the trend prediction performance of our proposed model and the other comparative algorithms using a few samples in this paper. As with Section 4.4, in each group of experiments, the stock samples from July 10, 2019 to October 10, 2019 are selected as the datasets of target domain. In Section 4.4, the dataset of target domains is divided into two parts. The stock samples of the first month are used as the training set of target domain and the remaining stock samples are selected as the test dataset of target domain. Specially, in cross validation, the dataset of target domains is divided into five equal-sized folds. For each repetition, four folds are selected as the training dataset of target domain and the remaining one is selected as the test dataset of target domain. The dataset of source domain is the same as Section 4.4. After five repetitions, the mean prediction accuracies for the five repetitions are computed to evaluate the predictive performance of our proposed model and other benchmark algorithms using a few samples. The mean prediction accuracies of the different methods trained on a few samples are presented in Table 4. The results of Table 4 show that, in cross validation, the average accuracies of our model are obviously higher than other algorithms using a few samples.

Table 4
Experimental results of cross validation obtained by each corresponding algorithm on the eight groups of stock datasets

Source $\to$ Target	AUDA-Net	UDAB	ADDA	MAML	Prototypical networks
AMZN $\to$ ORCL	90.55%	83.09%	61.09%	77.80%	79.45%
AAPL $\to$ ORCL	92.36%	79.27%	66.36%	57.47%	79.27%
EBAY $\to$ INTC	90.55%	77.82%	75.82%	68.36%	81.45%
AMZN $\to$ INTC	90.55%	77.64%	55.64%	66.75%	85.09%
BABA $\to$ EBAY	94.55%	79.64%	78.18%	66.75%	83.64%
ORCL $\to$ EBAY	92.73%	79.64%	72.18%	75.83%	81.82%
INTC $\to$ QCOM	94.55%	77.82%	79.27%	53.80%	81.45%
EBAY $\to$ QCOM	96.36%	81.45%	74%	51.80%	79.64%

Table 5

The specific settings of source and target domains in each group of experiments

Groups	Source domain	Target domain
Group 1	BABA	TPG
Group 2	INTC	AMLX
Group 3	EBAY	CRDO
Group 4	AMZN	ACDC

Table 6

The listing date of target domain stocks

Target domain	Listing date
TPG	January 13, 2022
AMLX	January 7, 2022
CRDO	January 27, 2022
ACDC	May 13, 2022

4.6 Additional experiments

In this section, four groups of experiments on newly listed stocks of the U.S. stock market are added to evaluate the performance of the AUDA-Net model in the trend prediction of newly listed stocks. As with Section 4.5, due to the lack of training samples, the LSTM and CNN models cannot be used to predict the trend of newly listed stocks. The newly listed stocks, which are selected as the target domains, include TPG Inc. (TPG), Amylyx Pharmaceuticals, Inc. (AMLX), Credo Technology Group Holding Ltd (CRDO) and ProFrac Holding Corp. (ACDC). The specific settings of source and target domains in each group of experiments are presented in Table 5. In order to simulate the trend prediction scenario of newly listed stocks in reality, the target domain is consisted of four months of data after the listing. Out of these four months of data, the data for the first two months is used as the training set and the remaining data is used in testing. Table 6 shows the listing date of newly listed stocks which are added to the target domain. The trend prediction accuracies acquired by each corresponding algorithm in these four groups of experiments are reported in Table 7.

Table 7
Trend predictive accuracy of each corresponding algorithm in each group of experiments

Source $\to$ Target	AUDA-Net	UDAB	ADDA	MAML	Prototypical networks
BABA $\to$ TPG	90.91%	78.79%	75.76%	66.65%	81.82%
INTC $\to$ AMLX	93.75%	71.88%	68.75%	68.75%	81.25%
EBAY $\to$ CRDO	90.91%	75.76%	81.82%	81.82%	72.73%
AMZN $\to$ ACDC	93.75%	75%	81.25%	71.88%	81.25%

As shown in Table 7, among all the conducted algorithms suitable for the stock prediction problem of newly listed stocks, our proposed AUDA-Net model achieves the best trend prediction accuracy, far better than those of the other comparison algorithms. It can be found from the experimental results reported in Tables 2 and 7, the AUDA-Net model has significant advantages in predicting the trend of Small Sample Size (SSS) stocks whether the target domain comes from large and older stocks or newly listed stocks.

4.7 Analysis of experimental results

The above experimental results on the eight groups of datasets demonstrate that our proposed model possesses the best SSS trend prediction performance among all the experimental models. UDAB and ADDA, which are also adversarial unsupervised domain adaptation algorithms, have lower predictive accuracy than our proposed model, and are not stable enough. The possible causes of these experimental results are analyzed as follows:

Firstly, because there are too few samples in the target domain, it is difficult for the feature extractor to extract the basic law of target samples, and the phenomenon of over-fitting is easy to occur.

Secondly, the large gap between the number of source samples and the number of target ones leads to the imbalance of domain categories. The classification results of the domain discriminator may be biased to the source domain, resulting in the different distributions between source features and target ones.

Thirdly, the imbalance of domain categories may leads to the situation that the class classifier trained on the source samples cannot be well applied to the target ones.

The predictive performance and stability of MAML and the Prototypical Networks, both of which belong to few-shot learning algorithms, are also worse than our proposed model. After our analysis, the possible problems of the MAML algorithm might be as follows:

Firstly, each task only contains the same combinations of sample categories, i.e., the rise and fall of the next day’s stock price. Consequently, it is difficult for the MAML algorithm to obtain a model with strong generalization ability by learning a large number of similar tasks.

Secondly, the few-shot learning algorithms assume that, although there are not a large number of samples in the target classification task, there are a quantity of similar classification tasks with a small number of samples. However, this assumption is not valid for the SSS stock trend prediction problems studied in this paper.

After our in-depth analysis, Prototypical Networks may also face the following issues:

Prototypical Networks project all the training samples of the target domain into a shallow space, and calculate the mean values of the samples of different classes as the prototypes of classes. Then, the class of the test sample will be determined depending on the class of its nearest prototype. However, the number of training samples in the target domain is too small, and consequently, it may lead to inaccurate calculation of the prototype of each class in the target domain.

By comparing with the DL algorithms using a large number of labeled samples, we find that our proposed model can basically achieve the same level of predictive accuracy as them, or even higher accuracy on some groups of experiments. The possible reasons are analyzed as follows:

Firstly, in the few-shot problems, due to the large gap between the numbers of the source and target samples, traditional adversarial domain adaptation algorithms may confront with the imbalance problem of domain categories. Therefore, in order to overcome the problem of insufficient samples in the target domain, we generate massive fake samples with the same distribution as target samples by training a GAN model.

Secondly, in order to minimize the difference between the source and target domains, we train a feature extractor and a domain discriminator to make the features extracted from the source samples and the target ones have the same probability distribution. Consequently, the classifier trained based upon the source samples can be well applied to the target ones.

5. Conclusions and future works

In recent years, there are many innovative attempts in the field of stock trend forecasting. But almost all of the studies are about the stocks with massive samples, and there are few studies on newly listed stocks with a few samples. In order to fill the blank of stock trend prediction in this aspect, we propose a novel Adversarial Unsupervised Domain Adaptation Network (AUDA-Net) based on GAN.

The main advantages of our model are summarized as follows:

Firstly, newly listed stocks only have a small number of samples. Therefore, if we seek to effectively predict the trend of newly listed stocks, the first problem to be solved is the SSS problem. To this end, we train a GAN model to generate massive fake samples with the same distribution as the target samples, and then design the AUDA-Net model for trend prediction. Our proposed model can effectively solve the problem of insufficient samples of newly listed stocks, and accurately forecast their trend.

Secondly, we embed a domain adaptation sub-procedure into the representation learning process. By minimizing the difference between the extracted source features and the extracted target ones, the class classifier trained on basis of the source samples can make trend prediction, reasonably and effectively, for the target samples.

Thirdly, only a small number of samples can be obtained in many other actual application scenarios of time series classification. Our model can also be applied to the time series classification scenarios with only a small number of samples. Moreover, we do not need to use any labels, which reduces the cost of labeling.

Looking forward to the future, we will improve our model in the following aspects. Firstly, in this work, we employ the most basic GAN model to generate transferable samples. In our future research, we will attempt to replace GAN with its improved variant models, such as Conditional Generative Adversarial Nets (CGAN) and Wasserstein GAN (WGAN). Secondly, the single-source domain adaptation paradigm intensively studied in this work will be extended to a multi-source one, aiming at better implementing SSS stock trend prediction.

References

Abu-Mostafa

Y.S.

and Atiya

A.F.

, Introduction to financial forecasting, Applied Intelligence 6(3) (1996), 205–213.

Taylor

G.W.

, Composable, distributed-state models for high-dimensional time series, Ph.D. Dissertation, University of Toronto, 2009.

Sezer

O.B.

Gudelek

M.U.

and Ozbayoglu

A.M.

, Financial time series forecasting with deep learning: A systematic literature review: 2005–2019, Applied Soft Computing 90 (2020), 106181.

Long

Chen

and Ren

, An integrated framework of deep learning and knowledge graph for prediction of stock price trend: An application in Chinese stock exchange market, Applied Soft Computing 91 (2020), 106205.

Hao

P.Y.

Kung

C.F.

Chang

C.Y.

and Ou

J.B.

, Predicting stock price trends based on financial news articles and using a novel twin support vector machine with fuzzy hyperplane, Applied Soft Computing 98 (2020), 106806.

Ding

Jia

and Zhao

, Meta deep learning based rotating machinery health prognostics toward few-shot prognostics, Applied Soft Computing 104 (2021), 107211.

Vinyals

Blundell

Lillicrap

Kavukcuoglu

and Wierstra

, Matching networks for one shot learning, in: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 3637–3645.

Wang

Yao

Kwok

J.T.

and Ni

L.M.

, Generalizing from a Few Examples: A Survey on Few-shot Learning, ACM Computing Surveys 53(3) (2020), 1–34.

Long

Cao

Wang

and Jordan

, Learning Transferable Features with Deep Adaptation Networks, in: the 32th International Conference on Machine Learning, 2015, pp. 97–105.

10.

Sun

Feng

and Saenko

, Return of Frustratingly Easy Domain Adaptation, in: the 30th AAAI Conference on Artificial Intelligence, 2016, pp. 2058–2065.

11.

Ghifary

Kleijn

W.B.

Zhang

Balduzzi

and Wen

, Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation, in: the 14th European Conference on Computer Vision, 2016, pp. 597–613.

12.

Tzeng

Hoffman

Saenko

and Darrell

, Adversarial Discriminative Domain Adaptation, in: the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2962–2971.

13.

Motiian

Jones

Iranmanesh

S.M.

and Doretto

, Few-Shot Adversarial Domain Adaptation, in: the 30th Annual Conference on Neural Information Processing Systems, 2017, pp. 6670–6680.

14.

Brown

and Lowe

D.G.

, Low-shot learning with imprinted weights, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5822–5830.

15.

Shyam

Gupta

and Dukkipati

, Attentive recurrent comparators, in: International Conference on Machine Learning, 2017, pp. 3173–3181.

16.

Santoro

Bartunov

Botvinick

Wierstra

and Lillicrap

, Meta-learning with memory-augmented neural networks, in: International Conference on Machine Learning, 2016, pp. 1842–1850.

17.

Lake

B.M.

Salakhutdinov

and Tenenbaum

J.B.

, Human-level concept learning through probabilistic program induction, Science 350(6266) (2015), 1332–1338.

18.

Schwartz

Karlinsky

Shtok

Harary

Marder

Kumar

Feris

Giryes

and Bronstein

, Delta-encoder: an effective sample synthesis method for few-shot object recognition, in: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018, pp. 2850–2860.

19.

Pfister

Charles

and Zisserman

, Domain-adaptive discriminative one-shot learning of gestures, in: European Conference on Computer Vision, 2014, pp. 814–829.

20.

Gao

Shou

Zareian

Zhang

and Chang

S.-F.

, Low-shot learning via covariance-preserving adversarial augmentation networks, in: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018, pp. 983–993.

21.

Edwards

R.D.

and Magee

, Technical Analysis of Stock Trends, Springfield, Massachsetts, 1969.

22.

Olaniyi

A.S.

Adewole

K.S.

and Jimoh

R.G.

, Stock Trend Prediction Using Regression Analysis – A Data Mining Approach, ARPN Journal of Systems and Software 1(4) (2011), 154–157.

23.

J.B.

S.H.

Chen

M.Y.

and Chen

A.P.

, Applying Technical Analysis of Stock Trends to Trading Strategy of Dynamic Portfolio Insurance, in: Proceedings of the 2006 Joint Conference on Information Sciences, 2006, pp. 1–4.

24.

Devi

B.U.

Sundar

and Alli

, An Effective Time Series Analysis for Stock Trend Prediction Using ARIMA Model for Nifty Midcap-50, International Journal of Data Mining & Knowledge Management Process 3(1) (2013), 65–78.

25.

Mohankumari

Vishukumar

and Chillale

N.R.

, Analysis of daily stock trend prediction using ARIMA model, International Journal of Mechanical Engineering and Technology 10 (2019), 1772–1792.

26.

Zhang

, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50(50) (2002), 159–175.

27.

Shen

Tan

Zhang

Zeng

and Xu

, Deep learning with gated recurrent unit networks for financial sequence predictions, Procedia Computer Science 131 (2018), 895–903.

28.

Troiano

Villa

E.M.

and Loia

, Replicating a trading strategy by means of LSTM for financial industry applications, IEEE Transactions on Industrial Informatics 14(7) (2018), 3226–3234.

29.

Liu

Zeng

Yang

and Carrio

, Stock price movement prediction from financial news with deep learning and knowledge graph embedding, in: Pacific Rim Knowledge Acquisition Workshop, 2018, pp. 102–113.

30.

Das

Mokashi

and Culkin

, Are Markets Truly Efficient? Experiments Using Deep Learning Algorithms for Market Movement Prediction, Algorithms 11(9) (2018), 138.

31.

Saad

E.W.

Prokhorov

D.V.

and Wunsch

D.C.

, Comparative study of stock trend prediction using time delay, recurrent and probabilistic neural networks, IEEE Transactions on Neural Networks 9(6) (1998), 1456–1470.

32.

Chen

J.F.

Chen

W.L.

Huang

C.P.

Huang

S.H.

and Chen

A.P.

, Financial Time-Series Data Analysis Using Deep Convolutional Neural Networks, in: the 7th International Conference on Cloud Computing and Big Data, 2016, pp. 87–92.

33.

Zhou

H.M.

Yang

and Yang

, EMD2FNN: A strategy combining empirical mode decomposition and factorization machine based neural network for stock market trend prediction, Expert Systems with Applications 115 (2019), 136–151.

34.

Raza

, Prediction of Stock Market performance by using machine learning techniques, in: the 2017 International Conference on Innovations in Electrical Engineering and Computational Technologies (ICIEECT), 2017, pp. 1–1.

35.

Nelson

D.M.Q.

Pereira

A.C.M.

and Oliveira

R.A.D.

, Stock market’s price movement prediction with LSTM neural networks, in: the 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 1419–1426.

36.

Liang

Rong

Zhang

Liu

and Zhang

, Restricted Boltzmann machine based stock market trend prediction, in: the 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 1380–1387.

37.

Gunduz

Yaslan

and Cataltepe

, Intraday prediction of Borsa Istanbul using convolutional neural networks and feature correlations, Knowledge-Based Systems 137 (2017), 138–148.

38.

Kim

Park

E.L.

and Cho

, Stock price prediction through sentiment analysis of corporate disclosures using distributed representation, Intelligent Data Analysis 22(6) (2018), 1395–1413.

39.

Peng

and Jiang

, Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks, in: the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL), 2016, pp. 374–379.

40.

Huang

Wang

Zhang

Guan

and Zhou

, Exploiting Twitter Moods to Boost Financial Trend Prediction Based on Deep Network Models, in: International Conference on Intelligent Computing (ICIC), 2016, pp. 449–460.

41.

Glorot

Bordes

and Bengio

, Domain adaptation for large-scale sentiment classification: A deep learning approach, in: Proceedings of the 28th International Conference on Machine Learning, 2011, pp. 513–520.

42.

Zhang

Yao

and Wang

, Learning transferable and discriminative features for unsupervised domain adaptation, Intelligent Data Analysis 26(2) (2022), 407–425.

43.

Park

Lee

and Lee

, Learning of indiscriminate distributions of document embeddings for domain adaptation, Intelligent Data Analysis 23(4) (2019), 779–797.

44.

Wen

Cao

Wang

and Liu

, Biased transfer matching for less overlapping degree for unsupervised domain adaptation, Intelligent Data Analysis 24(2) (2020), 409–425.

45.

Iii

H.D.

, Frustratingly Easy Domain Adaptation, in: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007, pp. 256–263.

46.

Blitzer

McDonald

and Pereira

, Domain Adaptation with Structural Correspondence Learning, in: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 2006, pp. 120–128.

47.

Jiang

and Zhai

, Instance Weighting for Domain Adaptation in NLP, in: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007, pp. 264–271.

48.

Noori Saray

and Tahmoresnezhad

, Iterative joint classifier and domain adaptation for visual transfer learning, International Journal of Machine Learning and Cybernetics 13(4) (2022), 947–961.

49.

Rezaei

Tahmoresnezhad

and Solouk

, A transductive transfer learning approach for image classification, International Journal of Machine Learning and Cybernetics 12(3) (2021), 747–762.

50.

Abdi

and Hasehmi

, Binary domain adaptation with independence maximization, International Journal of Machine Learning and Cybernetics 12(9) (2021), 2559–2578.

51.

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

and Bengio

, Generative Adversarial Nets, in: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 (NIPS), 2014, pp. 2672–2680.

52.

Lee

Kim

and Jeong

S.G.

, Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation, in: 2019 IEEE/CVF International Conference on Computer Vision, 2019, pp. 91–100.

53.

Long

Cao

Wang

and Jordan

M.I.

, Conditional Adversarial Domain Adaptation, in: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NIPS), 2018, pp. 1647–1657.

54.

Rios

Kavuluru

and Lu

, Generalizing Biomedical Relation Classification with Neural Adversarial Domain Adaptation, Bioinformatics 34(17) (2018), 2973–2981.

55.

Jin

Song

and Dai

, Unsupervised Adversarial Domain Adaptation for Micro-Doppler Based Human Activity Classification, IEEE Geoscience Remote Sensing Letters 17(1) (2019), 62–66.

56.

Ganin

and Lempitsky

, Unsupervised Domain Adaptation by Backpropagation, in: Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 1180–1189.

57.

Usmani

Adil

S.H.

Raza

and Ali

S.S.A.

, Stock market prediction using machine learning techniques, in: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 2016, pp. 322–327.

58.

Kim

K.-J.

, Financial Time Series Forecasting Using Support Vector Machines, Neurocomputing 55(1) (2003), 307–319.

59.

Finn

Abbeel

and Levine

, Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, in: Proceedings of the 34th International Conference on Machine Learning (ICML), 2017, pp. 1126–1135.

60.

Snell

Swersky

and Zemel

R.S.

, Prototypical Networks for Few-shot Learning, in: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (NIPS), 2017, pp. 4077–4087.

61.

Connolly

T.G.

and Sluckin

, Small-sample Statistics. In: An Introduction to Statistics for the Social Sciences, London: Palgrave Macmillan, 1971.

Algorithm 1: The AUDA-Net Algorithm
Input: ${\bm{X}}_{s}$ -source domain with massive samples ${\bm{X}}_{s}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s}}\right),\left({{\bm{x}}% _{2}^{s},y_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{N_{s}}^{s},y_{N_{s}}^{s}}% \right)}\right\}$ .
${\bm{X}}_{t}$ -target domain with a few samples ${\bm{X}}_{t}=\left\{{\left({{\bm{x}}_{1}^{t},y_{1}^{t}}\right),\left({{\bm{x}}% _{2}^{t},y_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{tr}}^{t},y_{N_{\textit{% tr}}}^{t}}\right)}\right\}$ .
learning rate $\beta,\gamma$ , free parameter $\alpha$
Outputs: Feature extractor $F$ , Adapted class classifier $C$
Phase 1:
1:	for each iteration do
2:	for each mini-batch do
3:	Sample $n$ samples ${\bm{R}}$ from random noise, i.e., ${\bm{R}}=\left\{{{\bm{r}}_{1,},{\bm{r}}_{2},\ldots,{\bm{r}}_{n}}\right\}$
4:	Sample $n$ samples ${\bm{X}}$ from target domain ${\bm{X}}_{t}$ , i.e., ${\bm{X}}=\left\{{{\bm{x}}_{1},{\bm{x}}_{2},\ldots,{\bm{x}}_{n}}\right\},{\bm{x% }}_{i}\in{\bm{X}}_{t},i=1,2,\ldots,n.$
5:	Compute the gradient of Eq. (4) for $\theta_{D}$ according Eq. (3.2.3)
6:	Compute the gradient of Eq. (5) for $\theta_{G}$ according Eq. (16)
7:	Update the parameters $\theta_{D}$ of the discriminator $D$ according to Eq. (20)
8:	Update the parameters $\theta_{G}$ of the generator $G$ according to Eq. (21)
9:	end for
10:	end for
Phase 2:
11:	According to Phase 1, generate a large number of fake samples with the same distribution as the target samples.
12:	The fake samples and real target samples are combined to form a new target domain ${\bm{X}}_{t}^{\prime}$ , and each sample in the new target domain is labeled with a domain label, i.e., ${\bm{X}}_{t}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left(% {{\bm{x}}_{2}^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{N_{t}}^{t},d_{N_{t% }}^{t}}\right)}\right\}$
13:	Label each sample in the source domain with a domain label to form a new source domain ${\bm{X}}_{s}^{\prime}$ , denoted as ${\bm{X}}_{s}^{\prime}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}% \right),\left({{\bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm% {x}}_{N_{s}}^{s},y_{N_{s}}^{s},d_{N_{s}}^{s}}\right)}\right\}$
14:	for each iteration do
15:	for each mini-batch do
16:	Sample $m$ samples ${\bm{S}}$ from the new source domain $X_{s}^{\prime}$ i.e.,
	${\bm{S}}=\left\{{\left({{\bm{x}}_{1}^{s},y_{1}^{s},d_{1}^{s}}\right),\left({{% \bm{x}}_{2}^{s},y_{2}^{s},d_{2}^{s}}\right),\ldots,\left({{\bm{x}}_{m}^{s},y_{% m}^{s},d_{m}^{s}}\right)}\right\},\left({{\bm{x}}_{i}^{s},y_{i}^{s},d_{i}^{s}}% \right)\in{\bm{X}}_{s}^{\prime},i=1,2,\ldots,m$
17:	Sample $m$ samples ${\bm{T}}$ from the new target domain $X_{t}^{\prime}$ , i.e.,
	${\bm{T}}=\left\{{\left({{\bm{x}}_{1}^{t},d_{1}^{t}}\right),\left({{\bm{x}}_{2}% ^{t},d_{2}^{t}}\right),\ldots,\left({{\bm{x}}_{m}^{t},d_{m}^{t}}\right)}\right% \},\left({{\bm{x}}_{i}^{t},d_{i}^{t}}\right)\in{\bm{X}}_{t}^{\prime},i=1,2,% \ldots,m$
18:	Compute gradients of Eq. (13) for $\theta_{F}$ and $\theta_{C}$ according to Eqs (17)–(18).
19:	Compute the gradient of Eq. (14) for $\theta_{D_{d}}$ according to Eq. (19).
20:	Update the parameters $\theta_{F}$ of the feature extractor $F$ according to Eq. (22).
21:	Update the parameters $\theta_{C}$ of the class classifier $C$ according to Eq. (23).
22:	Update the parameters $\theta_{D_{d}}$ of the domain discriminator $D_{d}$ according to Eq. (24).
23:	end for
24:	end for

Adversarial unsupervised domain adaptation based on generative adversarial network for stock trend forecasting

Abstract

Keywords

1. Introduction

2.1 Stock trend forecasting

2.2 Domain adaptation

2.2.1 Definition

2.2.2 Existing works on domain adaptation

3. Proposed method

3.1 GAN

3.2.1 Sample generation part

4.1 Experimental data

4.4 Experimental results

Table 1 The specific settings of source and target domains in each group of experiments

Table 4 Experimental results of cross validation obtained by each corresponding algorithm on the eight groups of stock datasets

Table 7 Trend predictive accuracy of each corresponding algorithm in each group of experiments

5. Conclusions and future works

References

Table 1
The specific settings of source and target domains in each group of experiments

Table 4
Experimental results of cross validation obtained by each corresponding algorithm on the eight groups of stock datasets

Table 7
Trend predictive accuracy of each corresponding algorithm in each group of experiments