Abstract
Accurately predicting soybean futures fluctuations can benefit various market participants such as farmers, policymakers, and speculators. This paper presents a novel approach for predicting soybean futures price that involves adding sequence decomposition and feature expansion to an Long Short-Term Memory (LSTM) model with dual-stage attention. Sequence decomposition is based on the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) method, a technique for extracting sequence patterns and eliminating noise. The technical indicators generated enrich the input features of the model. Dual-stage attention are finally employed to learn the spatio-temporal relationships between the input features and the target sequence. The research is founded on data related to soybean contract trading from the Dalian Commodity Exchange. The suggested method surpasses the comparison models and establishes a fresh benchmark for future price forecasting research in China’s agricultural futures market.
Keywords
Introduction
A futures contract for agricultural commodities establishes the price and quantity to be delivered in the future by buyers and sellers. One of its functions is to anticipate the future market price of the commodity, known as its price discovery function [1]. Trading on futures contracts in advance can also hedge against price risk. For instance, farmers can protect themselves against losses resulting from falling soybean prices during harvest time, while food manufacturers can ensure a steady supply of soybeans at a known price. Similar to other financial products, agricultural futures can also be profitable for both speculators and arbitrageurs [2]. Accurate forecasting of agricultural futures prices or trends will have a positive impact on all parties involved in agricultural futures trading.
China’s agricultural futures market has experienced more policy interventions, including the “yellow box” policy in 2014, as well as the later “green box” and “blue box” policies, compared to international futures markets. This refers to the “blue box” policy intervention [3]. China’s agricultural futures trading is not accessible to the rest of the world due to capital controls and investment restrictions [4]. These factors make China’s agricultural futures market of special value for research. Currently, there is a limited amount of research on forecasting agricultural commodity futures prices, with the majority focused on futures markets in developed countries [5]. The forecasting of Chinese agricultural commodity futures prices is still in its early stages. The purpose of this paper is to focus on the price forecasting of Chinese agricultural futures, particularly soybean futures. Due to the position of soybeans as a significant agricultural product in the global supply chain, price fluctuations could considerably impact producers and consumers [6]. Due to the rapid development of China’s soybean futures market, it has now become the global leader in soybean futures trading [7]. Chinese soybean supply and demand closely rely on the international soybean market. Researching China’s soybean futures price prediction is pertinent for both national and international soybean futures forecasting research.
Soybean futures prices are a common example of financial time series. Due to their high noise and volatility, financial time series have been a long-standing concern of financial market participants and researchers regarding predictability. Random walk theory is a fundamental principle of traditional financial theory. It assumes that price changes in financial markets are random, implying a lack of predictability in future price movements [8]. Nevertheless, this theory has a limitation as it disregards the behavior and information of market participants and the irrationality present in the market. Market participant behavior and information significantly impact market price fluctuations, while irrational factors can also contribute to price volatility [9, 10]. Therefore, financial time series are not entirely random and are influenced by various factors. Financial market participants use two primary methods to predict future price movements: technical analysis and fundamental analysis [11]. Technical analysis examines market data charts to predict price movements, while fundamental analysis analyses economic, corporate, and industrial data to predict price movements. Despite their limitations, these methods have been effective in practice and have made positive contributions to financial time series forecasting [12–14]. Recent years have seen numerous studies combining several methods, such as technical analysis [15, 16], econometric methods [17], and machine and deep learning [18], to forecast financial time series. As such, forecasting is regarded as a valuable research problem. Since agricultural futures prices are a typical financial time series, they require more extensive and in-depth research compared to other well-studied financial time series types, such as stocks.
The soybean market is affected by various factors, such as weather, supply and demand, spot dynamics, policies, and trade agreements. Certain impacts, like changes in weather [19], policy announcements [20], or investor sentiment [21], may not be easily quantifiable, given their discontinuous or textual nature. Furthermore, various unknown influences cannot be detected or captured, and some data sources are unstable and prone to abrupt changes. For instance, the content of news and commentary pages covering investor sentiment could be altered by content providers. Moreover, data processing entails substantial time and labor costs. These challenges impede the creation of an end-to-end agricultural futures price forecasting system. Consequently, researchers have paid considerable attention to forecasting methods that use historical traded price data. These methods typically rely on traditional technical analysis-based signal data. Nevertheless, the high noise present in pure price time series may negatively affect model performance [22]. More precisely, it increases the randomness and uncertainty of the time series, worsens the prediction error, and masks its signals, making it challenging for the model to capture the patterns and trends accurately. Additionally, the noise makes the time series vulnerable to the influence of outliers. Signal decomposition techniques can eliminate noise and generate multiple sub-sequences with varying frequencies to extract complex features. Sequence learning models can employ these features to capture implicit patterns within sequences effectively [23]. Thus, signal decomposition techniques have widely been used as an aid for time-series forecasting.
For the problem of how to use the prediction model to learn the sub-series after data decomposition, many studies have used the “divide and conquer” approach. The main idea of this method is to decompose the original time series signals into multiple sub-sequences, and then use Neural Network (NN) models to learn and predict each sub-sequence separately, and finally reconstruct all the predicted values to get the final prediction results. Guo et al. [24] proposed a hybrid model called VMD-ARIMA-TEF for financial time series prediction, in which the financial series is decomposed into sub-sequences using the Variational Mode Decomposition (VMD) algorithm, and then an Autoregressive Integrated Moving Average (ARIMA) model is built to predict each mode. After the financial series is decomposed into sub-sequences, an ARIMA model is built to predict the linear component of each mode, and a Technology-and-Engineering-Focused (TEF) model based on tracking differentiator is applied to predict the nonlinear component, and finally the predictions of all sub-sequences are aggregated to obtain the final prediction, and the empirical results of the model show that it outperforms several existing hybrid models. Yang et al. [25] proposed a hybrid approach using Long Short-Term Memory (LSTM) and integrated Empirical Mode Decomposition (EMD) by integrated EMD decomposes the original stock price time series into multiple smooth, regular and stable sub-sequences, and then trains and predicts each sub-sequence using the LSTM method, and fuses the prediction results of multiple sub-sequences to obtain the prediction value of the final stock price time series, which is experimentally proved to have higher prediction accuracy. The problem with this approach is that after decomposing the time series signal into multiple sub-sequences, the neural network model needs to be learned and predicted for each sub-sequence separately, which is computationally intensive. Moreover, what is most likely to affect the prediction performance is that there may be certain interactions between the components of different frequencies, and these interactions may be ignored during the model learning process, resulting in partial loss of information. For example, in financial markets, certain events may affect time series components at multiple frequencies simultaneously, and the effects of these events may be dispersed or ignored in the signal decomposition process.
Technical indicators generated based on historical trading data, including opening price, closing price, high price, low price, volume, etc, have also been used in many studies to learn as augmented features of price series and achieve more stable and accurate prediction results [16, 26]. Shynkevich [27] also used technical indicators as machine learning input variables to predict stock price trends. They investigated the effect of different input window sizes on price direction prediction, i.e., the optimal input window should be roughly equal to the prediction horizon to obtain the best results. Hsu et al. [28] proposed a hybrid approach based on back-propagation neural networks, genetic programming, and feature selection techniques with the incorporation of technical indicator generation, and verified that the use of technical indicators to improve prediction accuracy is proven effective. Thus, technical indicators are an effective indicator to improve prediction accuracy in financial time series forecasting and can be used as a choice for feature enhancement.
In this paper, we propose a combined learning method, SDFE-DALSTM, based on the decomposition of CEEMDAN sequences and feature expansion of technical indicators, as well as a multiple attention mechanism, for predicting soybean futures prices in China. First, we perform feature expansion on daily futures trading data, i.e., generate technical indicators, which are added to the dataset as exogenous sequences; and then decompose the prediction target sequences using integrated CEEMDAN to obtain intrinsic mode functions as part of the dataset. Finally, the dataset is fed into a dual-stage attention mechanism based pattern learning LSTM to produce predictions for soybean futures prices. This dual-stage attention consists of two stacked attention layers of spatial attention and a single layer of temporal attention. Different attention layers have different input sequences and focus on different feature sequences, e.g., spatial attention focuses on mining the relationship between different sequences as well as between the exogenous sequences and the target sequences to overcome the information loss and relationship neglect problem of the divide and conquer approach. Temporal attention extracts temporal dependencies. The dual-stage attention works together to uncover different implicit relationships, thereby improving model performance. We hypothesize that by incorporating CEEMDAN-decomposed subsequences and technical indicators, LSTM combined with dual-stage attention can better capture the implied features of soybean futures price series and improve forecasting performance. We compare the performance with a variety of underlying single models and combined methods from previous studies, and discuss the contribution of each component of our proposed model to the forecasting performance to validate the hypotheses. The results of this paper can be of interest to soybean futures traders and provide a reference for quantitative trading methods. And it complements the research on soybean futures price forecasting in China and extends the combined forecasting method. The contributions of this study are as follows: In this paper, a relatively novel SDFE-DALSTM model is proposed by applying the first signal decomposition, technical indicators and deep learning model based on dual-attention mechanism to the prediction of soybean futures prices in China. The model proposed in this paper uses generating technical indicators to enrich the feature sequences, and decomposes the different pattern subsequences of the target sequences by EEMD, and the composed dataset is inputted into the LSTM prediction model with dual-attention for decomposition. We compared with a variety of single and combined models, leading in MSE, MAE, RMSE, and R2 metrics, and performed ablation experiments to illustrate the positive contribution of each component. The prediction results were also validated using Friedman test and Mann-Whitney U test test, which demonstrated the predictive performance advantages of the proposed model. Our study and methodology make a positive addition to the field of soybean futures research in China, and the proposed methodology can also be applied to the forecasting of other agricultural futures, and can be extended to other financial time series forecasting problems.
The rest of the paper is organized as follows. Section 2 describes work related to our research topic. Section 3 elaborates the proposed soybean futures price prediction model. In Section 4, the proposed model is applied to the daily prices of soybean futures contracts on the DCE in China and compared with a benchmark model to illustrate and validate the performance and effectiveness of our method. At last, Section 5 summarize the paper and outlines the main directions for future research.
Related work
In this section, we present prior work related to our study in categories and make an evaluation. Finally, our methodology is presented in the context of the challenges encountered in the prior work. These works can be categorized into two groups, one being financial time series forecasting studies using a single model, which can be grouped under the following categories of approaches:
(1) Traditional econometric techniques. They are mainly represented by AutoRegressive (AR) models [29, 30], ARIMA models and Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models. These methods have relatively simple training and computational procedures and are able to adequately characterize the linear relationship between time series prices, and therefore can be referred to as traditional linear financial time series forecasting methods. These methods are widely used and have been shown to be effective in many studies. For example, Wirawan et al. [31] used an autocorrelation graph method and an ARIMA model to forecast the price of bitcoin, and finally found that the ARIMA (4,1,4) model was able to predict the price of bitcoin for one to seven days in the future with a high degree of accuracy; Nguyen et al. [32] proposed an intelligent system based on a time series model using an automated ARIMA model for short-term price forecasting for seasonal product forecasting and using Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) as performance evaluation metrics; Karia et al. [33] investigated the problem of non-stationarity in the long-term data of Crude Palm Oil (CPO) price and proposed the use of the AutoRegressive Fractionally Integrated Moving-Average (ARFIMA) model as a solution. The study uses daily historical free-on-board CPO prices from Malaysia and compares the ARFIMA model with the ARIMA model and shows that the ARFIMA model outperforms the ARIMA model. Garcia et al. [34] propose an electricity price forecasting methodology based on the GARCH methodology to help producers and consumers in the electricity market make bidding strategies and negotiation decisions to maximize profits. The study uses empirical data from mainland Spain and California, and the GARCH model is explained and discussed in detail. However, the main drawback of these traditional econometric methods is that they are more stringent in their assumptions and require the fulfillment of some assumptions, such as linear relationships, smoothness, and so on. And most financial time series, especially agricultural futures price series, are nonlinear, so these methods cannot capture the hidden nonlinear features of nonlinear price series in the real world [35, 36].
(2) Machine learning methods. With the continuous improvement of computing power and data volume, machine learning techniques have become a powerful tool for solving financial time series forecasting problems. Traditional machine learning methods mainly include Support Vector Machines (SVM), Decision Trees and Random Forests. These methods are able to adaptively adjust the forecasting model by learning patterns and features in the data to overcome the shortcomings of traditional linear financial time series forecasting. There are many studies that have used these techniques to solve some problems. Shen et al. [37] proposed a dynamic financial distress prediction model based on random forest and time weighting. The empirical results show that the model outperforms the SVM with high average accuracy on sample data of 41 financial and non-financial metrics from 324 Chinese listed companies. Zi et al. [38] proposed a predictive model based on weighted random forest and ant colony algorithms to effectively analyze and study the financial market and improve the expected returns. Hong et al. [39] studied the European carbon emission trading market, modeled the prediction of carbon emission prices using a variety of statistical methods, and found that the past returns of commodities and financial products have a significant impact on the prediction of current CO2 prices. The results of the study show that bagged decision trees with integrated classifiers are the most accurate in predicting CO2 price movements, which is informative for companies wishing to trade European carbon emissions. The advantages of these methods are that the models are more interpretable, require less data, and are easy to implement and interpret. However, there are some problems, first of all, these methods have a weak ability to deal with serial dependence and long-term dependence in time series, and lack of learning artifacts for time dependence, and the other methods design complex feature extraction, which requires sufficient domain knowledge and specialized engineering techniques, and the method construction process is more complicated [40].
(3) Deep learning and neural network methods. Deep learning techniques have made significant achievements in a number of areas such as Computer Vision and Natural Language Processing [41, 42]. There are methods such as restricted Boltzmann machines [43], Deep Confidence Networks [44], and Autoencoders [45] that can be used for financial time series forecasting. Among them Recurrent NN can handle time series data as it can retain internal states to handle dependencies between sequences and can handle sequences of arbitrary length, thus having the ability to handle dynamic and variable length inputs. By applying the backpropagation algorithm, Recurrent NNs can capture long term dependencies in time series data and thus can handle the problems faced by traditional machine learning for time series prediction problems. To cope with the problem of exploding or vanishing gradients in Recurrent NNs when dealing with long-term time series, variants such as LSTM and Gate Recurrent Unit (GRU) have been proposed. They use gating mechanisms to regulate the information flow and perform linear operations on neuron states. They can extract more useful information from historical data, avoiding and reducing the gradient problem. A study by Ameur et al. [46] explored the potential of deep learning algorithms for predicting commodity prices. The study used the Bloomberg Commodity Index and its five sub-indices and found that the LSTM model is an effective predictive tool, and that the Bloomberg Livestock Sub-Index and the Industrial Metals sub-index are superior in assessing other commodity indices. These results have important implications for risk management by investors and public policy adjustment by policy makers. Rodgers et al. [47] present a machine learning-based approach for generating unbiased forecasts of future price distributions in techno-economic analysis and demonstrate it using a neural network model constructed with 100 LSTM layers. They find statistically significant differences in the correction of price distributions using the approach that The shortcomings of the uncritical approach are highlighted. The innovation of this work is to propose an unbiased machine learning method for predicting long term probabilistic prices for techno-economic analysis, highlighting the shortcomings of the non-rigorous approach. The study by Ozdemir et al. [48] presents deep learning algorithms based on LSTM and GRU networks for predicting changes in the price of nickel, and the results show that both the LSTM and GRU networks are very useful and successful, with Mean Absolute Percentage Errors (MAPE) of 7.060% and 6.986%, respectively, where the computational time of the GRU network is on average 33% faster than the LSTM network.
The second category of forecasting methods is the hybrid approach, which combines different models and algorithms to fully exploit the implicit features in complex time series data for improved accuracy [49, 50]. Although a single model may only consider some factors and fail to utilize all kinds of information, hybrid models have shown promising results in financial time series forecasting. One common method is to combine signal decomposition techniques with deep learning models. For instance, Yang et al. [51] proposed a hybrid model that combines EMD with Back-Propagation Neural Networks (BPNN) to forecast crude oil prices. They first decomposed the crude oil price data using the EMD method into a series of independent Intrinsic Mode Functions (IMFs) and residual sequences, and then applied the BPNN to forecast Brent and WIT crude oil prices. The experimental results demonstrate that the EMD-BPNN model achieves better prediction accuracy than the single BPNN model, as well as other models such as Least Squares Support Vector Regression (LSSVR) and EMD-LSSVR. Similarly, Shu et al. [52] proposed a hybrid model that combines EMD with Convolutional Neural Networks (CNNs) and LSTM to predict the stock prices of the Shanghai Composite Index. The model achieves better forecasting performance by modeling different frequency components. In addition, Jin et al. [53] proposed a deep learning-based model that utilizes EMD and BPNN to predict financial market trades by automatically mining the statistical patterns of the data. The EMD-based deep learning model demonstrates excellent predictive performance and can effectively predict the future trend of financial market prices. Although empirical mode decomposition (EMD) [54] has advantages in time series analysis, it also has some limitations. EMD-generated mode functions may not be stable and may change with slight variations in the data, leading to unstable results. In addition, EMD is sensitive to noise and may decompose noise into spurious modes. To address these issues, various improvements have been proposed, such as Ensemble Empirical Mode Decomposition (EEMD), Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [55], and Variational Mode Decomposition (VMD), which aim to enhance the decomposition stability and reduce the impact of noise on the decomposition results. These improved methods have shown promising results in various applications and have been widely used in time series analysis [56–58]. However, the relationship between the subsequences and the original target sequence, as well as the relationship between the subsequences themselves, should be considered when using the EMD and its improved variants decomposition technique, as multiple subsequences may affect the price changes. This is one of the main challenges faced by existing EMDs-based forecasting methods.
Deep learning time series prediction models, such as RNN, LSTM, and GRU, have limited capacity to effectively capture relevant features and inter-feature relationships from multivariate time series. Previous studies have proposed hybrid models that incorporate CNN mechanisms to address this issue. For example, Wang et al. [59] proposed the CNN-BiSLSTM model, which utilizes CNN to extract high-level features affecting stock prices and a Bidirectional Special Long Short-Term Memory neural network (BiSLSTM) to predict the closing stock prices for the Shenzhen Component Index. Similarly, Livieris [60] presented a new deep learning prediction model that utilizes a convolutional layer to extract useful information, learns the internal representation of time series data, and identifies short-term and long-term dependencies using a LSTM layer to accurately predict the price and trend of gold. Experimental results demonstrate better prediction performance than other deep learning and machine learning models. Li et al. [61] proposed a new model based on the difficult-to-predict Bitcoin price fluctuation, which utilizes a hybrid neural network model based on CNN and LSTM. Utilizing CNN for feature extraction can effectively improve the accuracy of Bitcoin short-term price prediction.
However, attention mechanisms have gained attention in time series prediction and are considered to have better prediction results compared to CNN. The attention mechanism can adaptively select the important information in the time series and capture the long-term and short-term dependencies between the series better, improving the prediction accuracy. Moreover, it can assign different weights to different time series, thus better handling the heterogeneity between different time series and improving the generalization ability of prediction. Peng et al. [62] proposed a sequential prediction model based on an embedding mechanism and a two-stage attentional long-term memory neural network for medium- and long-term residential energy demand forecasting, which can efficiently extract key features highly correlated with the dynamics of energy consumption and outperforms other models in terms of long-term forecasting capability. Gu et al. [63] have applied the attention mechanism to the prediction of cabbage and radish prices in the South Korean market. They proposed a Dual-Input Attentional Long Short-Term Memory (DIA-LSTM) model, which is trained using various variables such as meteorological data and transaction volume data. The DIA-LSTM model achieved higher prediction accuracy compared to traditional and benchmark models. Chen et al. explored stock price prediction in the Hong Kong stock market using the attention mechanism with a LSTM network. The attention mechanism greatly improved the prediction performance compared to the LSTM model and verified the effectiveness of the attention mechanism in LSTM-based prediction methods.
Although attention-based methods have demonstrated promising results in time series prediction, they have limitations in capturing feature sequence relationships for multiple inputs, same-moment spatial relationships for different attributes, relationships between multiple attributes across time, and temporal relationships between different time series. To address these challenges and inspired by the work of Liu et al. [64], we propose a soybean futures price prediction model that combines sequence decomposition and feature expansion methods with a hybrid model for final prediction using dual-stage attention based LSTM, called SDFE-DALSTM. The proposed model consists of three stages. First, we generate 53 technical indicators based on 5 basic indicators from historical trading data and add them to the input feature set. This approach enables us to automatically construct a high-vectorized eigenset. Next, we use CEEMDAN to decompose the closing price as the target sequence for our forecast and add the resulting subsequence to the input feature set of the model. Finally, we input the feature set into a dual attention blocks with LSTM model for learning. We incorporate a dual-stage attention mechanism, which includes a two-layer spatial attention module and a temporal attention module, to learn the temporal, spatial, and more complex spatio-temporal relationships between the sequences in the feature dataset containing the CEEMDAN decomposition subsequence and technical indicators. The proposed SDFE-DALSTM model is expected to overcome the limitations of existing attention-based methods and effectively capture the complex relationships between the multiple inputs, attributes, and time series. By combining sequence decomposition and feature expansion methods, we can extract more meaningful information from the input feature set and improve the prediction accuracy. The experimental results will demonstrate the effectiveness of the proposed model in soybean futures price prediction.
Methodology
Our paper presents SDFE-DALSTM, a new sequence decomposition and feature expansion based end-to-end method for forecasting soybean futures prices. Our method combines CEEMDAN-based sequence decomposition and extended technical indicators to automatically construct a high-vite lexicon. Meanwhile, a mechanism of dual-stage attention on top of an encoder-decoder framework learns spatio-temporal relationships between input features. Specifically, we use CEEMDAN to decompose the historical values of soybean futures closing prices into simpler subsequences, each of which provides features with different physical significance. At the same time, we generate a set of extended technical indicators based on five basic indicators of historical trading prices to enrich the input feature set. In our DALSTM section, the first part of the dual-stage attention, the stacked spatial attention block, acts on the encoder. At this stage our method adaptively selects relevant sequences from the target. And, mines the relationships between the sequences. This is also different from the “divide and conquer” approach where each sub-sequence is said to be independent, our method links the sub-sequences together and helps to capture more robust spatial relationships. For each step, the temporal attention component of the decoder, which focuses on long-time dependencies, automatically selects the appropriate hidden state belonging to the encoder. By using dual-stage attention, our approach captures the spatio-temporal information of the input sequences, including the mixing relations between the exogenous and target sequences. Figure 1 illustrates the entire operational flow of our proposed SDFE-DALSTM model.

Overall framework.
EMD is a data-based adaptive signal decomposition method that decomposes a signal into a set of IMFs, each representing a signal component at one scale. EEMD, on the other hand, is an improvement of EMD.EEMD mitigates mode aliasing by utilizing the averaged frequency distribution of Gaussian white noise to automatically assign signals of different time scales to appropriate reference scales [54]. Zero-mean noise disappears after multiple averaging, thus significantly mitigating mode aliasing. Applying EEMD, we can decompose the target sequence into several accurate and meaningful IMFs as described in Eq. (1). Compared with conventional EMD, EEMD greatly reduces modal aliasing and provides more accurate and interpretable IMFs for further research.
Where r is the residue component, IMF is the intrinsic mode function component, and n is the count of IMF. The main steps in the EEMD procedure are listed below in Table 1.
The procedure of the EEMD process
The CEEMDAN is another improvement of EEMD, which not only takes into account the effect of noise, but also solves the problem of unstable decomposition of EEMD when dealing with non-stationary signals and the decomposition results are greatly affected by noise. CEEMDAN simulates the effect of noise by accumulating the noise and decomposes the decomposition once for each noise level, so that the effect of noise on the decomposition can be reduce the effect of noise on the decomposition result to some extent. CEEMDAN improves the stability of decomposition by using multiple decompositions and reduces the occurrence of modal aliasing problem by using the complete ensemble method, so that it can decompose the signal more accurately.
Improperly chosen locations for the use of decomposition algorithms may lead to information leakage. That is, if the operation of signal decomposition is performed on the dataset before dividing the whole dataset into training and test sets, then IMF modal functions that are not present in the training set may appear in the test set. These IMF modal functions may contain feature information related to the test set, thus affecting the accuracy and generalization ability of the test set and leading to information leakage. Therefore, in our approach, as shown in Fig. 2, we choose to process the target sequences of the training and test sets by CEEMDAN decomposition after the decomposition of the training and test sets, respectively. For example, the first 80% of the dataset is used as the training set, and performing independent CEEMDAN on the target sequences in the training set ensures that the training set does not contain any information from the test set. As for the test set, assuming that we are processing in practice, the training and part of the data are known, so we can naturally decompose the data of the training set and the test set (i.e., the future data set that gradually becomes the past data) by merging them together, and selecting only the portion of all the subsequences in that interval of the decomposed test set as the decomposition result of the test set. This avoids the problem of information leakage due to incorrect use of CEEMDAN.

Sequence Decomposition Methods in SDFE-DALSTM.
The next step is to expand the range of characteristics by generating technical indicators. This step can also be parallel to the sequence decomposition as CEEMDAN simply decomposes the target sequence, in our study i.e. ’close’ price, without changing the original sequence, but only adding the subsequence to the original dataset, which does not affect the computation of the technical indicators. We derive additional technical indicators based on the five fundamental basic transaction values. In high-dimensional data sets, certain problems can be easily and explicitly solved due to the “advantages of dimensionality” [65]. This paper extends the feature set by exploiting the phenomenon of high-dimensional blessings to generate technical indicators. Fifty-three new technical indicators were created from the five basic features from the soybean futures price time series: “high”, “low”, “open”, "close”, and “volume". All the generated technical indicators belong to one of the six functional groups of indicators, namely Overlap Study Indicator, Momentum Indicator, Volatility Indicator, Volume Indicator, Price Transform Indicator, and Cycle Indicator. The names of the specific technical metrics involved are given in Table 2.
All technical indicators in the feature set
All technical indicators in the feature set
Conventional neural networks, such as feed-forward neural networks or Convolutional Neural Network (CNN) [66], etc., cannot be used for sequential problems. Since the current state in the sequence is correlated with the previous state, conventional neural networks do not have this architecture. RNN is specifically proposed to solve the sequence learning problem. Its typical feature is recurrent connectivity, which gives the RNN the ability to update the current state based on the past state and the current input data. Therefore, RNN is frequently applied in speech recognition, time series, machine translation, natural language processing, and other problems. Figure 3 illustrates the simple structure of a RNN. However, due to RNN’s single structure and parameter updating algorithm based on time backpropagation, it is prone to the issues of gradient vanishing and explosion, which limits its ability to handle long-term dependencies. Gradient vanishing and explosion refer to the problem in which the gradients in the deep layers of the network gradually become smaller or larger during backpropagation due to the chain rule of differentiation, leading to slow or non-convergent network training. This problem is particularly evident in RNNs because the recurrent structure of RNNs results in the gradients being propagated over time, exacerbating the problem of gradient vanishing and explosion. LSTM is an improved RNN model that introduces memory cells and mechanisms of forgetting, input, and output gates to control the flow and loss of features (as shown in Fig. 4). This architecture greatly mitigates the issues of gradient vanishing and explosion commonly experienced by traditional RNNs. The GRU replaces the input gate of the LSTM with an update gate and a forget gate, which specify how much of the past information is kept in the current prediction. Unlike the LSTM, the GRU has a reduced number of parameters and is therefore faster to train or requires less data for generalization [67].

Simple structure of recurrent neural network.

The structure of long short-term memory cells.
There are complex feature patterns implied in the futures price series, which may need to be extracted by mining spatio-temporal relationships. By incorporating a hierarchical multistage attention mechanism onto the foundation of RNN, the dual-stage attention based LSTM is able to effectively feature extraction for the time series, capture long-term temporal dependencies, and selectively identify pertinent hidden states of encoder throughout each time steps.
Figure 5 illustrates the structural composition of the proposed SDFE-DALSTM. The left side of Fig. 5 shows the construction of the input feature set, which is represented by “SDFE” in the model name. This refers to sequence decomposition and feature expansion, as described in the previous subsections. The input feature set is segmented by slide time windows method (Fig. 6) to divide the longer time series into several shorter sub-sequences for analyzing and modeling each sub-sequence before it finally enters into the model training process. We set the window T to be 10, and our forecasting steps to be 1, 3, or 5. we use the model to forecast values 1, 3, or 5 days into the future to cover the temporal distribution of short-term forecasts. The red and blue boxes of Fig. 5 represent the dual-attention encoder module for extracting spatial features and the decoder module for learning temporal features using the attention mechanism in “DALSTM”, respectively. The model concatenates the predicted price sequence y T values corresponding to the time as input to both the encoder’s second level of attention and the decoder’s attention. In this way, the temporal and spatial relationships between the predicted target values and the associated sequences can be learned while maintaining the temporal dependence [64]. The encoder and decoder are essentially LSTM that encode the input sequence into a machine feature representation [68]. Multiple attention mechanisms in the model ensure the capture of the most relevant input features and extend their spatio-temporal relationships to learn the implied factors and patterns of soybean futures prices.

The SDFE-DALSTM model.

The slide time windows method.
The spatial relation learning of input features is based on two sequential attention modules. First, it focuses on learning the spatial relationship between the correlation features of soybean futures prices, and then further studies the spatial correlation between soybean futures prices and the correlation features. Given the kth correlation feature x
t
on time t, the first spatial attention module in the Encoder can be defined as:
Where v
f
∈ R
T
, W
f
∈ RT×2m,U
f
∈ RT×T and b
f
∈
The encoder module uses LSTM to extract the following feature representations from the entered data x t at time t:
Where
The second attention in the encoder combines the target price of soybean futures to be predicted with each relevant feature as input to further extract more stable and detailed spatial relationships. The combining operation is defined as
Where v
s
∈ R
T
, W
s
∈ RT×2q,
This is followed by another LSTM cell. Similar to the LSTM in the previous part, the LSTM part in this model follows the following mapping, only the meaning of the input and output is different, in addition, all LSTM units are independent, and the specific LSTM update method can refer to the summary of the LSTM update method in the previous part.
Hidden states within the encoder that are highly relevant for the expected outcome price amount are weighted by a temporal attention decoder to learn the long-term dependence of soybean futures prices. By combining spatial and temporal attentions, spatio-temporal relationships are taught. Attention as follows is used to learn the temporal connection for hidden state i from second-stage attention:
Unlike the feed of the LSTM in the encoder, the feed from the LSTM in the decoder combines the context vector c t with the predicted target price y of soybean futures in the corresponding period.
Where,
Where, W
y
∈ Rp×(p+q), b
y
∈ R
p
, v
y
∈ R
τ×p and bias
All models were trained using the backpropagation algorithm. To reduce the mean square error (MSE) from the predicted vector
We use three commonly used metrics Mean Absolute Error (MAE), Root Mean Square Error (RMSE), coefficient of determination (also known as R2 or R-squared) to evaluate the prediction performance of the proposed two-stage attention LSTM model based on decomposed and expanded datasets. The MAE defined by Eq. (22) is defined as the absolute difference between the actual and observed values. The RMSE described by Eq. (23) is determined by taking the square root of the MSE, where MSE is the squared difference between the actual and observed values. The coefficient of determination, R2, quantified as per Eq. (24), asserts the appropriateness of the model’s fit. Higher R2 values indicate better accuracy in predictions, while lower values of MAE and RMSE bespeak improved prediction capabilities.
Where y,
Data source
In order to verify the effectiveness of our proposed end-to-end prediction model based on LSTM with multi-attention mechanism, the historical trading data of soybean futures from Dalian Commodity Exchange in China are obtained as experimental samples using AkShare [69] library. AKShare is a Python-based financial data interface library designed to enable a suite of tools ranging from data collection and data cleaning to data landing for fundamental data, real-time and historical market data, and derivative data for financial products such as stocks, futures, options, funds, foreign exchange, bonds, indices, and cryptocurrencies. It is mainly used for academic research purposes. The experimental data span 4381 trading days, from January 4, 2005, to December 29, 2022. Each trading day generates data in the following five categories: High, Low, Open, Close, and Volume. The daily closing price of soybean futures for the past 18 years is shown in Fig. 7. The experimental data’s summary statistics are shown in Table 3.

Soybean futures historical closing price.
Statistics on the data of soybean futures
The data decomposition technique, which aims to decompose the data into different frequency components in order to denoise the errors in the data, was used to further improve the prediction accuracy of soybean futures prices. PyEMD [70] is a Python implementation of EMD and its variants, including one of the most popular variants, CEEMDAN. We use PyEMD to decompose the closing price within the soybean futures trading price. The number of trials for adding noise in the decomposition parameters is set to 300, and the width of the added white Gaussian noise is set to 0.01. CEEMDAN successfully decomposes the soybean futures closing price series into 10 IMF components and one residual, as shown in Fig. 8. The number of IMFs is computed from the raw time histories.

The result of decomposition of soybean futures closing price by CEEMDAN.
Varying frequencies are exhibited by different IMFs due to distinct time scales. The component with the highest frequency is separated from the original time series first during the screening process. High frequency IMFs capture volatility features that illustrate detailed time series structure. These features are manifested in short-term price volatility, a direct reflection of short-term factors such as futures prices. As decomposition continues, the frequency of the IMFs decreases. Low-frequency IMFs, on the other hand, highlight low-frequency fluctuation features that represent long-term trends in financial time series. These features are helpful in extracting volatility patterns on which long-term forecasts rely. Lastly, the residual is the mean trending portion of the original series. According to the examination of Chinese stock yields by Shi [71], the strategy of high-frequency IMF assumes that each component is independent, which may lose some of the information between components, such as the spatiotemporal relationship. Therefore, we add the decomposed components to the feature dataset and feed them into a two-stage attention LSTM to learn and extract the relationship between the components and the target sequence.
TA-Lib [72] is a library widely used for conducting technical analysis on financial market data. Based on the original transaction data’s five characteristics (’High’, ’Low’, ’Open’, ’Close’, and ’Volume’), we generated 53 alternative technical indicators using this library. As shown in Table 2, the newly generated technical indicators were categorized into six primary types: Overlap Studies Indicators (9), Momentum Indicators (30), Volatility Indicators (3), Volume Indicators (2), Price Transform Indicators (4), and Cycle Indicators (5). The characteristic data set consists of the original historical transaction data, the decomposed component of the closing price, and the extension of the technical indicators. We expanded the feature set and retained data from 84 input features after eliminating data from trading days with nulls. We divided the entire dataset into a training set and a test set. The data between January 4, 2005, and May 23, 2019, comprising 80% of the total sample, was used as the training set for the two-stage LSTM model training. The test set, consisting of data from May 24, 2019, to December 29, 2022, containing 877 valid observations, was used to assess the prediction performance. In order to normalize all values to within the range of [0, 1], we employed a max-min formula to transform the original values. This method of normalization ensures that features of different proportions are processed together, facilitating faster convergence of gradient descent as well as preserving all relationships in the data [73–75].
Hyperparameter setting
For training purposes, the batch size and learning rate were set to 128 and 0.001, respectively. The time window size T and the hidden state size for each attention module are two more crucial hyperparameters in our model. To ensure efficient long-term prediction, T is set to 10. Each attention module consists of a single layer LSTM network with an identical hidden state size. In line with Liu’s prior work [64], we fixed the values of m, p, q at 128 for most applicable models. Detailed hyperparameter settings of additional comparison model are provided in Table 4.
Hyper-parameter settings of the proposed model in comparison to the reference model
Hyper-parameter settings of the proposed model in comparison to the reference model
As mentioned before, in our experiments, for our proposed model, the total training time is 100 epochs, and the training set is shuffled for each epoch. After various experiments, it is established that the rolling window data construction method has better prediction results when the window length is 10, and our model has faster convergence and better performance when the count of hidden neurons of the neural network is 128. The mini-batch stochastic gradient descent algorithm is employed in our investigations. Based on the size of the dataset and computing efficiency, the batch size is configured to 128, and the learning rate is set to 0.001. Adam is utilized as the objective function optimizer, and MSE is employed as the common backpropagation algorithm for objective functions. The hyperparameters and functions of different comparison models with the same properties are set to the same values as in our model. Table 5 shows the prediction performance of the model with prediction time steps of 1, 3, and 5, and the best calculated results for each experimental condition in table are shown in bold. The results of the model’s partial predictions at various prediction time steps are depicted in Figs. 9–11, respectively.
Model performance comparison
Model performance comparison

Prediction results of all models when prediction time step is 1.

Prediction results of all models when prediction time step is 3.

Prediction results of all models when prediction time step is 5.
Predictions of soybean futures prices show that a two-stage attention LSTM model based on sequence decomposition and feature expansion outperforms the comparison model under different experimental conditions. The discussion in this section does not include the last 3 models involving ablation experiments. Overall, our model has the largest MAE improvement of 23.30% over the GRU method in the comparison experiment at prediction time step 1. respectively. The maximum RMSE index is 79.86% higher than that of GRU. MAE and RMSE showed the smallest improvement for the DA-RNN method, but also improved by 41.38% and 29.51%, respectively. When the prediction time step is 3, the gap between the different models narrows and the maximum improvement in MAE, RMSE and R2 of our model under the current conditions is 70.96%, 68.03% and 17.89% respectively. With a prediction time step of 5, our proposed model also outperforms the comparison models, reaching a maximum of 69.69% in terms of MAE index. In contrast to the two-stage single attention mechanism of DARNN, the dual-stage attention mechanism in the first stage of the proposed model is able to capture both temporal and spatial relationships of multivariate variables. Moreover, the temporal and spatial relationship between the exogenous sequence and the target sequence is obtained by concatenating the target sequence, which more effectively ensures the temporal dependence of the second-stage attention. The added sequence decomposition operation enables the model to learn further details about the target sequence, and the underlying relationship between the decomposed sequence and the target sequence can be extracted by the dual-stage attention mechanism to learn more precisely the underlying pattern of sequence changes. On the one hand, the technical indicators extension adds the sequence variation trend and clue extracted by the technical indicators itself, and on the other hand, the high-dimensional data constructed by it, combined with the nature of automatic selection of features by the attention, will jointly raise the prediction performance of the proposed method to a new height.
LSTM and GRU using historical transaction data as an improved model, which is also based on RNN, slightly differ in prediction accuracy under various conditions. Another issue is that for a one-day delay, that is, a single prediction time step, the predicted value tends to directly fit the target closing price of the previous day. As a result, both LSTM and RNN perform adequately in this case, both in terms of predictor metrics and result images. In reality, however, this is only a lagging value of the previous day’s data. The original RNN’s prediction accuracy declines significantly with increasing prediction time steps. However, for DARNN and our proposed model, the time dependence is emphasized in the second stage and the prediction performance is still robust for longer prediction time steps.
Analysis of ablation experiments
We performed ablation experiments to illustrate the role of the individual components of our proposed model for the overall model. These are the single DALSTM model, the CEEMDAN-DALSTM without feature expansion, and the SDFE-DALSTM (EEMD) model with CEEMDAN replaced by EEMD. In terms of predictive metrics, the SDFE-DALSTM model using CEEMDAN leads the other ablation models in all aspects. Among these ablation models, CEEMDAN-DALSTM performs the best, and is ahead of the other comparison models in all conditions, except for a slightly lower performance than SDFE-DALSTM (EEMD) when the prediction step size is 1. The optimization point of CEEMDAN for EEMD contributes better to our model.CEEMDAN-DALSTM’s robustness is also better than other ablation models. In the experiments with 3 prediction steps, the accuracy of CEEMDAN-DALSTM fluctuates little and presents a more robust prediction performance. The results of the ablation experiments show a certain stepwise change in prediction performance according to the degree of component ablation and the advantage of replacing the component, which can indicate that the individual components in our model have positive contributions to the performance of the overall method.
Statistical test
To ensure the superiority of our proposed methodology, we use the Friedman test, which is a nonparametric statistical test [79], and the Mann-Whitney U test, which is a nonparametric statistical test, to test for significant differences between multiple samples of interest, while the Mann-Whitney U test compares two samples for significant differences. Friedman test is used to test whether there is a significant difference between multiple samples of interest, while the Mann-Whitney U test is used to compare whether there is a significant difference between two samples.The original hypothesis of the Friedman test is that there is no significant difference in predictive accuracy between all the compared models, while the original hypothesis of the Mann-Whitney U signed rank test is that there is no significant difference in predictive accuracy between the two models. They both have a significance level of α = 0.05 and the original hypothesis is rejected when p - value is less than 0.05 [80]. Table 6 shows the results of the statistical tests. The results of the Friedman test show the significance of our proposed SDFE-DALSTM method applied to soybean futures price forecasting in China compared to other comparative models. In the Mann-Whitney U test, except for “SDFE-DALSTM (CEEMDAN) vs. SDFE-DALSTM (EEMD)”, all the other tests show that our forecasting performance is better than that of the other comparative models.The results of SDFE-DALSTM (CEEMDAN) and SDFE-DALSTM (EEMD) differ in the signal decomposition method. Although the method using CEEMDAN is superior to EEMD in terms of the literal value of the prediction rating scale, the optimization of CEEMDAN relative to EEMD is not contributing enough to our model from the test results. Further exploration of the contributions involved is needed.
Results of Friedman test and Mann-Whitney U test
Results of Friedman test and Mann-Whitney U test
In this paper, we present a new method for forecasting soybean futures prices, called SDFE-DALSTM.The method is based on decomposition and expansion and incorporates a dual-stage attention long short-term memory (DA-LSTM) network model. Unlike traditional forecasting processes and divide-and-conquer strategies, we use the CEEMDAN algorithm to decompose the soybean futures closing prices to generate more detailed derived variables. In addition, we construct a high-dimensional feature dataset for soybean futures price forecasting by generating technical indicators. This dataset is fed into an LSTM with dual-stage attention for spatio-temporal relationship learning of features.
According to the dual-stage attention mechanism, our model not only learns the time-dependent features of the input series, but also extracts the corresponding spatial features. The first stage of spatial attention performs relational learning for the decomposed subsequences and technical indicator sequences, and learns the relationship between the target sequence and the exogenous sequences through the second level of spatial attention. The second stage of temporal attention enhances long time dependence and learns cross-sequence relationships across time. This enables us to efficiently extract spatio-temporal relationships in soybean futures price features and select high-dimensional features to avoid overfitting and improve prediction performance.
Based on the experimental results, our SDFE-DALSTM model can be used as an end-to-end expert system for predicting soybean futures prices. Our model outperforms other models on all metrics under multiple prediction steps. Also based on ablation experiments, it is verified that the components of our model are positively contributing to our overall approach. Finally, hypothesis testing is used to prove our performance advantage. So our initial hypothesis is verified that by combining CEEMDAN decomposed subsequences and technical indicators, LSTM combined with dual-stage attention can better capture the implicit features of soybean futures price series and improve the forecasting performance. Our work provides a valuable reference for price forecasting in the Chinese soybean futures market. And our method can be extended to the study of other financial time series.
In future work, we plan to focus on adding more features and improving high-dimensional feature selection through a more sophisticated attention mechanism. With the advancement of natural language processing techniques, other factors such as investor sentiment and policy should also be taken into account, and the feasibility of quantification for non-numerical factors seems to be increasingly clear.
Footnotes
Acknowledgments
The work was supported by the Zhejiang Philosophy and Social Science Program of China (No.17NDJC262YB), Humanity and Social Science Foundation of Ministry of Education of China (No.18YJA630037, 21YJA630054), Zhejiang Provincial Natural Science Foundation of China (No.LY18G010005).
Declarations
Hongjiu Liu and Yanrong Hu are the joint first authors.
