Abstract
This paper attempts to apply recurrent neural networks (RNN) to price forecasts and financial trading. Compared with previous neural networks models, the recurrent neural network can better use the previous information to infer subsequent events, which is more suitable for price time series analysis. Long Short-Term Memory (LSTM) has made structural changes to the RNN to avoid long-term dependency problems. The empirical research uses the 2010–2017 price panel data of four kinds of soybean futures in China’s futures market, and confirms the model’s improved predictive ability through statistical tests. The empirical analysis of futures trading verifies the practice of these model strategies in terms of risked return. This paper improves and expands the application of recurrent neural networks model, and provides a new idea for applying artificial neural network algorithm to futures trading.
Introduction
In recent years, the emergence of artificial neural network algorithms has proposed a new idea for re-search on price prediction. The artificial neural net-work model is continuously iteratively trained to detect the relationship between the input variable and the target output variable, and can achieve ap-proximation and fitting of any continuous function. Compared with traditional statistical methods, it has its unique advantages in predicting financial market price trends: First, the model can handle irregular data without setting rules; second, it can handle uncertain, incomplete, and insufficient data with rapid convergence speed; and third, it can find the data pattern and mine the law behind the nonlinear data.
With regard to the application of artificial neural networks to the analysis of financial market price forecasts, the early literature [1, 3] verified the predictive effect of artificial neural networks on the direction of securities price fluctuations. Since then, research [4, 5] has developed a securities trading strategy based on the predictive results of the artificial neural network model. It is found that the securities are bought and sold according to the output of the forecasting model, and the gains are far superior to the simple stock holding strategy. Later studies [6, 7] developed a securities trading strategy based on the predictive results of the artificial neural net-work model, and the gains were far superior to the simple shareholding strategy. Regarding the improvement of the model, some studies [8, 9] used multilayer perception neural networks (MLP) models and adopted a learning training algorithm for backward error propagation, which can obtain relatively successful predictive results. Although the MLP model is widely used in financial time series prediction, there are certain defects in this kind of neural network, which mainly lies in the absence of memory function, and thus it is not suitable for the scenario of timing dependency. Literatures [10] has verified that the recurrent neural networks (RNN) algorithm is very good at processing such time-dependent data. The nodes of the hidden layer are connected at different times, and the current output depends on the previous state, that is, the network memorizes the previous information and applies it to the current calculation. However, due to the gradient explosion and gradient disappearance, the parameters of the RNN model are difficult to train, making it difficult for RNN to model long-term time series dependencies, which limits the application of RNN model in financial market. Some study [11] has introduced the Long Short-Term Memory model to preserve historical information through the state of cells, and use different gates to dynamically let the network learn when to forget historical information and when to update cell status with new in-formation. However, it remains to be further studied whether this improved design can be applied to the promotion of models for trading. This paper intends to design a hybrid model integrating recurrent neural network model with LSTM to improve algorithm predictions and apply them to futures trading.
In this paper, an improved recurrent neural net-work is used to establish a predictive model, which is applied to the trend trading of soybean futures in China’s commodity market. The LTSM-RNN model is trained by historical data such as the opening price, the lowest price, the highest price, and the closing price, as well as other historical data, and the model is used for predicting and trading. The empirical results show that the recurrent neural network model is applicable to trend trading, and the model with long short-term memory is better. Although the performances of different futures strategies are different, the recurrent neural network algorithm model is feasible in China’s futures market. The contribution of this paper is to improve the recurrent neural network model by using long short-term memory. And the model is applied to financial market price forecast analysis, which provides a new idea for trend trading.
The remainder of this paper is as follows: Section 2 is to illustrate the data of empirical research in this paper. Section 3 is the research method of the paper, mainly, the recurrent neural network algorithm with long short-term memory and the trend trading strategy based on this algorithm. Section 4 is the empirical result analysis, including statistical performances and trading performances of four futures trading strategies. Section 5 are the main conclusions of the paper.
Data
The research subject in this paper is the soybean futures listed on the Dalian Commodity Exchange of China (CDCE), which mainly includes four commodity futures: soybean 1, soybean 2, soy meal and soy oil. The details of futures contracts can be seen in following Table. The data-set covers the period from January 4, 2010 to December 29, 2017. The in-sample data-set covers the period from January 1, 2010 to December 31, and the out-of-sample data-set covers the period from January 4, 2016 to December 29, 2017. These panel data contain 11 explanatory variables required for artificial neural network model input, including price, transaction volume, etc. The specific information can be seen in Table 3.
Commodity futures contract specifications
Commodity futures contract specifications
Data segregation for the full sample period
Explanatory variables for the recurrent neural network models
A survey of artificial neural network
Artificial Neural Network (ANN) has been a research focus in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes a simple model, and forms different networks according to different connection methods. A neural network is an operational model consisting of a large number of neurons connected to each other. Each node represents a specific output function called an activation function. The connection between every two nodes represents a weighting value for passing the connection signal, called weight, which is equivalent to the memory of the artificial neural network. The output of the network varies depending on the connection method of the network, the weight value and the excitation function. The network itself is usually an approximation of an algorithm or function in nature, or it may be an expression of a logic strategy.
The basic form of a neuron is shown in the following mathematical expression. The D input variables and the offset terms are weighted and added. After the activation function, the output y is obtained. Commonly used activation functions are Sigmoid functions, tangent functions, and so on. The Sigmoid function and the neuron expression are as follows, where the weight coefficient w
D
is the model parameter.
Note: The delivery standard for the Soybean 1 futures contract is non-GM soybeans, and the delivery standard for the Soybean 2 futures contract is GM soybeans and non-GM soybeans.
A complete neural network model usually divides the nodes into several levels: input layer, output layer and hidden layer, as shown in the Fig. 1 above. The input layer is the given model input feature; the output layer is the content that is predicted by the neural network, such as function value of the sample; the hidden layer is equivalent to the intermediate state of the network system. For a recurrent neural network, the number of output layer nodes is the number of variables we want to predict. The σ (X) and h (X)) are the activation functions of the output layer and the hidden layer respectively. The parameters of the neural network are the network coefficients w ij of the layers, which can be collectively recorded as the vector W.

The neuron of artificial neural network.
The learning of the neural network model is to use the input and output that we already have, and optimize the parameter W. So that the output given by the model is as close as possible to the real label of the sample, that is, the following prediction error (loss function) minimized:
The optimization problem of the objective function is minimization mean square error. For the general neural network optimization problem, iterative optimization can be performed by the gradient descent method to obtain the optimal parameter W. It should be noted that α is the learning rate, indicating the step size of each iteration. In the nth iteration, the parameter of the n-1th iteration is moved by a certain step length in the gradient direction to obtain the latest parameter value.

The artificial neural network.

The recurrent neural network.

The long short-term memory block.
Traditional artificial neural networks could not implement the function of inferring subsequent events using previous information. The recurrent neural network algorithm is very good at processing such time-dependent data. Recurrent neural networks contain a network of loops that allow information to be persisted. These loops allow information to be passed from the current step to the next step, but the transmission of this information from front to back has long-term dependencies. Recurrent neural networks can easily use the previous information. In the traditional forward neural network, information from the input layer to the hidden layer and the output layer, the layer is fully connected, but the nodes at different times are connected. Such a network cannot directly deal with time series. But in a recurrent neural network, nodes of the hidden layer at different times are connected, and the current output depends on the previous state. The network memorizes the previous information, and apply to the calculation of the current output.
RNN can be thought of as a neural network shared by weights in the time dimension. The model output depends on the hidden state h t at t moment. It depends on the input x t at t moment and the hidden state ht-1 at the t - 1 moment. Since the hidden state has a timing dependency, the output of the RNN model is related to the input information at the previous time.
The output of the RNN model:
The hidden state of the RNN model:
This can be used to analyze the relationship between the RNN model and the normal time series model. Considering the case of univariate, let W
h
= β, W
x
= α, y
t
= h
t
, and let σ
h
be an identity transformation, then the RNN model can be written as:
This is a first-order autoregressive model with exogenous variables. Therefore, RNN is a complex nonlinear time series model. For parameter optimization of the above RNN model, we usually use the Back Propagation Through Time algorithm (BPTT) to find the gradient. Since the hidden state h t of the RNN is affected by the hidden state ht-1of the previous time, the gradient of the RNN is related to time factors. The BPTT is used to expand it in time series. The formula for RNN gradient calculation is:
However, using the above methods, there are gradient disappearance and gradient explosion problems in parameter learning. This will make it difficult for RNN to model long-period timing dependencies.
The proposed Long Short-Term Memory (LSTM) effectively solves the problem of gradient explosion and gradient disappearance of simple recurrent neural networks and has been successful in many machine learning fields [11]. The key to the LSTM model is the introduction of a gating unit system that preserves historical information through the state of the cells. It uses different gates to dynamically learn when the network forgets historical information and when to update with new information.
LSTM uses doors to selectively filter a portion of the information. At time t, the internal memory unit records all historical information up to the current time and is controlled by three gates: The input gate determines the new information input to the internal memory unit; the forget gate determines that the internal memory unit needs to save the previous time information; the output gate determines the internal memory unit output information.
The input gate of LSTM:
The activation function σ is the sigmoid function, x t is the input vector of the current time, ht-1 is the hidden state vector of the previous moment, W xi and W hi are parameters, and b is the bias term. When considering only one memory unit, i t is a number between 0 and 1. When i t = 1, the door is open and all information can be entered into the cell; when i t = 0, the door is closed and information cannot be entered into the cell.
The forget gate of LSTM:
f t is a number between 0 and 1. When f t = 1, the door is opened, and the cell state is all input to the cell at the previous time; when f t = 0, the door is closed, and the cell state is discarded at the previous time.
The output gate of LSTM:
o
t
is a number between 0 and 1. When o
t
= 1, the door is open and the cell state can be output; when o
t
= 0, the door is closed and the cell state cannot be output. The internal memory unit status update formula is:
The former part is the information after the cell state of the previous moment is controlled by the forgetting gate, and the latter part is the information after the input information is controlled by the input gate. The output of the LSTM unit can express as follows:
In the RNN model, the original RNN hidden state node is replaced by a different LSTM unit to construct an LSTM-based RNN network. In the LSTM-RNN network, the output layer of the RNN is added to the output of the LSTM unit, the output of the LSTM unit is used as the hidden layer state in the RNN model, or another LSTM layer is added to construct the multiple hidden layer RNN.
This paper studies the use of neural network algorithm to predict whether a trend trading strategy will be profitable. The specific strategy is as follows: When each trading day starts, the market data of the early trading is obtained, and the LSTM-RNN model is used to predict whether the daily trend strategy will be profitable. If it is judged that it will be profitable, the trend tracking will be carried out according to the trend of the early trading; if the model judges that it will not be profitable, the trading will not be opened on the same day. The predictive model needs to be trained before trading. In this paper, the model structure of “Time series input–Single label output” is adopted, that is, the input data is a multivariate time series, and the output data is label data(0, 1).In the model training, it is necessary to mark the early market of the data in the sample, and mark the market as two categories: “suitable for trend trading” and “not suitable for trend trading”. When trading data is quoted, we can use historical back-testing, that is use the opening range breakout strategy to test back every day to calculate whether the trend strategy is profitable. The profit is recorded as “1”, which is suitable for trend trading; and on the contrary, it is recorded as “0”, which is not suitable for trend trading.
The LSTM-RNN model used in this paper is designed as follows. The input layer has a total of 11 variables, including the opening price, closing price, highest price, lowest price, trading volume, main purchase volume, main sales volume, main buyer sales volume, and trading volume. rate of change, rate of change in main purchases, rate of change in main sales. The LSTM layer structure has a total of 100 layers, and the output layer is 1, that is, 0, 1 two label results, to determine whether it is suitable for transaction. The activation function is the Sigmoid function in the model:
The Error Function to be minimize is
This paper uses a simple indicator to determine whether the trend strategy is profitable. The definition of the trend strategy profit indicator is:
When the R value is large, the day trend strategy is easy to make profit; otherwise, the trend strategy is not easy to profit. In this paper, the trend strategy profit status of different trading dates is divided into two categories according to R>0.5 and R<0.5, and used as positive and negative samples to train the machine learning model. When the trading strategy is stetted, use the model to predict the trend strategy profit indicator of the day, get the profit probability p, and calculate the mean MA30(p) of the probability at T time in the past 30 trading days. If p>MA30(p), it indicates that the trend of the day’s trend is relatively high, and it is possible to conduct trend trading. After opening a position, the position is closed until the close of the day, and the position will be immediately closed when the stop loss is triggered. In the specific simulation transaction, it should be noted that we set the futures margin to 50%, that is, consider a 2x leverage. The transaction cost is 0.002% bilaterally, and the stop loss is 25% fixed loss. In addition, in order to verify the improvement effect of the model, we will empirically compare the traditional RNN model with the LSTM-RNN model.
The evaluation of the model is divided into two aspects, one is about the predictive effect of the model, and the other is about the performance of the model applied to the trading. When evaluating model predictions, the most common evaluative indicators includes Root mean square error (RMSE), Mean absolute error (MAE), Mean absolute percentage error (MAPE), which measure the difference between the predicted value and the true value of a variable. The specific explanation of these measures can be seen in Table 4. In Table 5, we list the measurement of trading performance, mainly including the measurement of the return value, on the one hand, the absolute return, including the Annualized return and Cumulative return, and on the other hand, the return volatility, including the Annualized volatility and the Maximum drawdown. On the basis of this, there is also a measure of risk return. This paper uses Shape ratio and Calmar ratio.
Statistical performance measures
Statistical performance measures
Trading performance measures
Statistical performance
Before simulating the transaction, we need to perform a quantitative analysis of the model fitting effect to confirm the prediction effect of the neural network model. The statistical performance test is mainly divided into the following four indicators: Mean absolute error, mean absolute percentage error, and Root mean square error. They are all indicators that measure the difference between the predicted and true values of a variable. According to previous literature studies [9, 12], we could conclude that the smaller the value, the more accurate the prediction.
It can be seen from the data in Tables 6 and 7 that the overall predictive effect of LSTM-RNN is better than that of T-RNN, indicating the improvement effect of the long short-memory term. From the comparison, the predictive effect in the sample is stronger than the prediction effect out-of-sample. This is also in line with previous research [10]. The prediction effect in the sample is better, but there may be an over-fitting effect. From the perspective of commodities, soy meal and soy oil have better predictive effects than soybean 1 and soybean 2, which may be closely related to the larger trading volume and more flexible price volatility. In terms of specific indicators, the MAE is the mean absolute values of the deviations of all individual observations from the arithmetic mean. The MAPE is expressed as a percentage of MAE. These two indicators can accurately reflect the actual predictive errors, and also show that the LSTM-RNN model algorithm is better than the T-RNN model algorithm. The RSME is sensitive to very large or very small errors in a set of measurements and is a good reflection of the precision of the model measurements. In this indicator comparison, the difference between LSTM-RNN and T-RNN prediction is not so obvious as former indicators.
In-sample statistical performance
In-sample statistical performance
Out-of-sample statistical performance
In this section, we examine the effects of the two models of LSTM-RNN and T-RNN applied to trend trading. For the measurement of trading performance, we must consider the absolute value of the return, but also the volatility of the income. Therefore, the average annual return and cumulative return measure the performance of the strategy, while the average annual volatility measures the volatility of the return of the strategy, especially the extreme risk of the tail of the futures trading needs to be specifically measured. And risk-return is an important criterion for comparison of evaluations.
From the data in Tables 8 and 9 above, it can be concluded that the LSTM-RNN model performs better than the T-RNN, but the performance in different futures is quite different. In the soybean oil trend trading and soybean meal trend trading, the effect of using long short-term memory models to improve the neural network is particularly obvious. It may consistent with the previous analysis, soy meal and soy oil futures with large trading volume, price are intensive, the price data distribution is continuous. And there is no price gap caused by the small transaction volume, which will not cause the deviation of the model training. Compared with the empirical results of the model in-sample and out-of-sample, the model has better fitting results because of the more training data, so the model performs better in the sample than out-of-sample. From the perspective of specific trading performance metrics, the absolute returns of the two models are not significantly different, but in terms of the annualized volatility and the maximum drawdown, the LSTM-RNN model is better than the T-RNN model. From the aspect of risk-return metrics, the LSTM-RNN model is better than the T-RNN model, which is particularly evident in the calmar ratio. It also shows that the LSTM is used to improve the robustness of the RNN model, especially in the futures trading.
In-sample trading performance
In-sample trading performance
Out-of-sample trading performance
In this paper, we have designed a recurrent neural network model based on the improvement of long short-term memory, which was applied to futures price forecasting and futures trend trading. Taking the historical futures price panel data as input, the model predicts the price increase or decline trend, and then conducted simulated trading. The empirical research results show that the model prediction effect is sound and it is effective in the futures trend trading strategy. This paper has for the first time illustrated the application of the improved LSTM-RNN model in futures trend trading. Of course, the model’s application also has an over-fitting problem, which is a common problem in artificial neural networks. In the future, the hidden layer node self-generation method can be used to shorten the network training, reduce the network structure, and improve the model’s accuracy.
