Abstract
Financial time series prediction and trading decision-making are priorities of computational intelligence for researchers in academia and the finance industry due to their broad application areas and substantial impact. However, these methods remain challenging because they retain various complex statistical properties, and the mechanism behind the processes is unknown to a large extent. A significant number of machine learning-based methods are proposed and demonstrate impressive results, especially deep learning-based models. Nevertheless, due to the high complexity of massive, nonlinear, and nonindependent data and the difficulties and time consumption of complicated training models of deep learning, the performance of online trading decisions is still inadequate for practical application. This paper proposes the Integrated Framework of Forecasting Based Online Trading Strategy (IFF-BOTS) to satisfy better prediction performance and dynamic decisions for real-world online trading systems. Our method adopts a novel isomorphic convolutional neural network (CNN)-based forecaster-classifier-executor architecture to exploit CNN-based price and trend integrated prediction and direct-reinforcement-learning-based trading decision-making. IFF-BOTS can also achieve better real-time performance for online trading. We empirically compare the proposed approach with state-of-the-art prediction and trading methods on real-world S&P and DJI datasets. The results show that the IFF-BOTS outperforms its competitors in predicting metrics, trading profits, and real-time performance.
Keywords
Introduction
Financial time series analysis and associated applications have been studied extensively for many years. Among them, algorithmic trading, i.e., quantitative trading, is a time-honored topic widely discussed in modern artificial intelligence [22] and time series analysis. Specifically, algorithmic trading refers to executing trading orders using automated preprogrammed trading instructions accounting for variables such as time, price, and volume to generate profits at high speed and frequency. As shown in Fig. 1, we first get market data for signal research and perform backtesting. Then, with signals and their aggregation, we can establish position and risk models and finally provide execution logic based on various decision-making methods. There are many advantages over human trader’s, e.g., execution at the best possible prices, avoiding significant price changes, and reduced risk of manual errors or human trader’s emotional and psychological factors. Essentially, the process of trading is well depicted as an online continuous decision-making problem involving two critical challenges of financial market representation and optimal trading decision-making. First, financial market representation is usually considered one of the most challenging issues among time series analyses due to its noise and volatile features. Financial data contain a significant amount of noise and jumps, resulting in nonstationary, high-noise, nonlinear and chaotic time series data [7]. To mitigate such complexity, handcrafted features, e.g., the relative strength index (RSI) and other stochastic technical indicators, have been extensively explored for technical analysis in quantitative finance [20]. However, a widely known limitation of technical analysis is its poor generalization ability. Rather than exploiting predefined handcrafted features, can we learn more robust feature representations directly from data? Second, due to the high dynamics of the trading process, trading decision-making is systematic work that should take several practical factors into account. Changing trading positions too frequently will only lead to significant losses due to transaction costs and slippage. Effective modeling of current market conditions and combining historical actions and the corresponding positions are crucial for trading strategy and policy learning. However, although various empirically universal properties are observed, the mechanism behind financial market dynamics is unrevealed to a large extent, which leads to modeling suffering from nearly impossible complexity. How can we incorporate such optimal dynamic decision-making into the real-world trading system without affecting real-time performance? As a result, algorithmic trading has become much more challenging and has attracted significant attention in recent decades. Moreover, conventional methods, including stochastic processes and model-based approaches, are inadequate to capture the complex temporal dependencies of financial markets or make decent decisions in a timely manner.
General model of algorithmic trading system.
Recently, with the rapid development and impressive achievements in machine learning, leveraging machine learning approaches in algorithmic trading has become a highly researched topic. Although time series forecasting has been an active topic for several decades, financial time series prediction remains quite challenging due to its complexity. Therefore, numerous studies have been published on machine-learning-based models with relatively better performances than classical techniques. Specifically, in recent years, deep learning has demonstrated tremendous learning of feature representations of complex data such as high-dimensional nonlinear data, temporal data, spatial data, and graph data, and has gained great success in various applications. Financial time series forecasting is no exception. As such, an increasing number of prediction models based on multiple deep-learning-based approaches, especially LSTM-based approaches, have been introduced and have achieved state-of-the-art performance in recent years. On the other hand, compared with conventional machine learning methods, reinforcement learning is learning behavior, which deals with what actions should be taken by subjects to achieve the highest reward in an environment. Such characteristics make reinforcement learning suitable for trading, especially when we combine reinforcement learning and deep learning, i.e., deep reinforcement learning. Deep reinforcement learning demonstrates significant advantages and many achievements in various aspects. In practical applications, the successes of deep reinforcement learning have been shown extensively in many tasks, including robot navigation and helicopter control. The application of deep learning and reinforcement learning in algorithmic trading has achieved a good number of impressive results.
However, we still face difficulties applying reinforcement learning in highly dynamic and nonindependent financial markets. First, there is low forecasting performance for noisy, nonstationary, and nonlinear financial time series. Second, convergence issues and mode collapse occur frequently during model training. Last, it is time-consuming for neural network training, which may not be acceptable for real-time online trading. Hence, it leads to an interesting question in the context of the financial market: how can we leverage both deep learning and reinforcement learning to design an efficient trading approach for financial markets?
Inspired by deep learning’s strong capability of financial market feature representation and reinforcement learning’s significant achievement in decision-making, we are motivated to propose the Integrated Framework of Forecasting Based Online Trading Strategy (IFF-BOTS). IFF-BOTS can satisfy better prediction performance and dynamic decisions for real-world online trading systems. Our method adopts a novel isomorphic convolutional neural network (CNN) based forecaster-classifier-executor architecture, which can empower temporal feature capture and make the best use of the strong pattern classification capability of convolutional neural networks. Furthermore, we adopt an integrated prediction indicator to leverage forecasting and corresponding classification precision. With such quantitative prediction, we define an improved trading strategy based on an effective combination of direct reinforcement learning and a predictive heuristic strategy. Unlike most existing methods that mainly adopt deep neural networks for feature extraction and representation of financial time series, the proposed forecaster-classifier-executor architecture can unify prediction together with trading decisions. The main contributions of the proposed IFF-BOTS are as follows:
Propose a forecaster-classifier-executor framework that exploits deep-learning-based financial time series prediction and reinforcement-learning-based decision-making to achieve a better Sharpe Ratio and real-time performance. Capture financial time series data regularities, forecasting by providing a computationally effective isomorphic CNN-based architecture, and a novel integrated prediction indicator to make the best use of forecasting and uncertainty for trading strategy. An improved direct-reinforcement-learning-based strategy can benefit from deep feature extraction and prediction-based heuristic strategy. Empirically evaluate the proposed framework on two real-world stock datasets with widespread measurements: Total Profits and Sharpe Ratio.
The rest of this paper is organized as follows. Section 2 presents a brief review of the related works. Section 3 introduces the proposed IFF-BOTS architecture and trading strategy. In Section 4, we introduce the experimental settings and demonstrate the results of the proposed IFF-BOTS with three state-of-the-art methods on the test datasets. Finally, Section 5 summarizes the whole paper and suggests possible future work.
We briefly review existing work on deep-learning-based and reinforcement-learning-based financial time series forecasting and trading strategies. More comprehensive literature reviews can be found in recent surveys [24, 26].
Deep-learning-based price prediction
Price prediction of any given stock is the most studied financial application. We observed the same trend within deep learning implementations. Depending on the prediction time horizon, different input parameters are chosen, varying from high-frequency trading (HFT) and intraday price movements to daily, weekly, or even monthly stock close prices. Additionally, technical, fundamental analysis, and social media feeds or sentiments are among the different parameters used for the prediction models. In [8], deep neural networks and lagged stock returns were used to predict the Korea Composite Stock Price Index (KOSPI). [6] applied raw price data as the input to LSTM models. Meanwhile, some studies implement multiple deep learning models for performance comparison using raw price (OCHLV) data for forecasting. Among the noteworthy studies, [12, 25] compared RNN, stacked RNN, MLP, LSTM, CNN, GRU, and ARIMA. [5] used cooperative neuro-evolution, RNN (Elman network), and DFNN to predict stock prices in NASDAQ. [16] applied CNN
There were also multiple hybrid models that used primarily technical analysis features as their inputs to the deep learning model. Recently, [14] used market microstructure-based trade indicators as inputs into an RNN with Graves LSTM detecting the buy-sell pressure of movements in the Istanbul Stock Exchange Index (BIST) to perform price prediction for intelligent stock trading. Meanwhile, some papers prefer CNN models. [2] used 250 features, including order details, to predict a private brokerage company’s real data on risky transactions. They used CNN and LSTM for stock price forecasting.
Reinforcement learning-based trading decision-making
The nature of trading requires counting the profits in an online manner. Not all reinforcement learning methods are ideal for such online decision-making. While value-function-based RL methods are plausible for offline scheduler problems, the actor-based framework is more suitable for dynamic online trading [19] due to two advantages: i) flexible objectives for optimization and ii) continuous descriptions of market conditions. [18] proposed the direct reinforcement learning trading model based on recurrent reinforcement learning, which does not shed light on the side of feature learning. Robust feature representation is vital to machine learning performances. In the context of stock data learning, various feature representation strategies have been proposed from multiple views [4, 10]. Failure to extract robust features may adversely affect the performance of a trading system in handling market data with high uncertainty. With demonstrated strong feature representation capability, deep learning is adopted for feature extraction combined with reinforcement learning for online trading. [9, 21] exploited deep reinforcement learning approaches in stock and commodity future markets using RNN and LSTM for feature engineering. [13] introduced CNN, RNN, and LSTM to improve the policy gradient function of actor-critic reinforcement learning for cryptocurrency trading.
In summary, there are impressive achievements in deep-learning-based or reinforcement-learning-based financial time series prediction and trading approaches. However, there are still three difficulties: i) the prediction performance still cannot satisfy real-world applications, especially for predictive trading decision making; ii) the real-time performance suffers from multiple problems during neural network training, such as failure or slowness to converge and mode collapse; and iii) forecasting and dynamic decision-making are not well harmonized together.
Integrated framework of forecasting-based online trading strategy
Problem formulation
CNN-based time series prediction
Consider a one-dimensional time series
The CNN [11] is a deep neural network consisting of convolutional layers based on the convolutional operation. It is the most common model used for vision and image processing-based classification problems such as image classification, object detection, and image segmentation. The advantage of CNN compared to conventional deep learning models is its strong pattern recognition capability. Furthermore, filtering with the kernel window function gives the benefit of data processing to CNN architectures with fewer parameters, which is beneficial for computing and storage. In typical CNN architectures, there are different layers: convolutional, max-pooling, dropout, and fully connected layers. It can be formulated as:
where
where
The backpropagation process is used for CNN model learning. The most commonly used optimizers, e.g., SGD and RMSProp, are used to find the optimal CNN parameters. The hyperparameters of CNN are similar to the hyperparameters of other deep learning models. CNN-based time series prediction can be implemented as shown in Fig. 2.
Reinforcement learning (RL) [22] is a type of learning that differs from supervised and unsupervised learning models. It does not require a preliminary dataset that has been labeled or clustered before. There are different areas in which it is used: game theory, control theory, multiagent systems, operations research, robotics, information theory, investment portfolio management, simulation-based optimization, playing Atari games, and statistics. This learning method mimics the basics of how humans learn.
CNN-based time series prediction.
Discrete-time markov decision process.
As shown in Fig. 3, reinforcement learning is mainly based on a Markov decision process (MDP). The objective is to choose a policy (a sequence of actions) to maximize the cumulative value function. An MDP is used to formalize the RL environment. An MDP consists of five tuples: state (a finite set of states), action (a finite set of actions), reward function (scalar feedback signal), state transition probability matrix
Based on the above classic model, we formulate the trading decision in Moody’s direct reinforcement learning trading model [18]. We define
where the first term is the profit/loss made from the market fluctuations and the second term is the transaction cost
where UT
Where
To address the first and most fundamental problem, i.e., how to exploit a prediction-decision framework to combine the advantages of the two machine learning models, we first need to examine the gaps between deep-learning-based prediction and reinforcement-learning-based trading decisions. In brief, there are two main issues to be addressed: i) as with most existing deep reinforcement learning methods for algorithm trading, deep learning is adopted mainly for feature extraction instead of a harmonized prediction-decision methodology. Similarly, reinforcement learning is leveraged in decision-making, standalone from prediction; ii) dynamic trading decisions are dependent on critical information available at the time, e.g., predicted prices, risks, trends, or sentiments. However, the performance of predictions is not static, and it varies greatly due to the dynamics and complexities. Therefore, how can we make the best use of such uncertain predictions and take optimal trading actions?
Architecture of IFF-BOTS framework.
As shown in Fig. 4, on the basis of CNN-based time series prediction and direct reinforcement learning-based dynamic decision models, we propose an integrated framework of forecasting-based online trading strategy (IFF-BOTS) to exploit and unify both models into a single framework for algorithmic trading in real-world online trading systems, which is featured by automatically providing continuous trading decisions without domain experts involvement. Specifically, the framework contains three main components: the forecaster (F), the classifier (C), and the executor (E). The left part contains isomorphic CNN-based forecaster F and classifier C, which share all neural layers except for the last layer. Convolutional neural networks can capture temporal dependencies and extract feature representations from the input financial time series. Based on different neural layers and activation functions, forecaster F and classifier C can achieve direct price forecasting and multiclassification of the trends with corresponding likelihood. The trends obtained from multiclassifier C align with trading decisions, i.e.,
Furthermore, the extracted feature representations can also contribute to executor E as hidden features for conventional reinforcement-learning-based trading decision-making methods. Such integrated prediction and feature extraction can offer more details to help executor E make better trade decisions. The integrated prediction indicator
where
However, the decision-making mechanism should tackle inaccurate predictions or even uncertainties, i.e., predictions with poor likelihood. Trade decision-making is a discrete-time Markov decision process, and reinforcement learning demonstrates impressive state-of-the-art results for such continuous trading decision scenarios. Therefore, as shown in the right part, the IFF-BOTS presents an improved direct-reinforcement-learning-based trading strategy. Unlike conventional direct reinforcement learning, the proposed trading strategy leverages integrated predictions and deep feature learning to combine prediction-based heuristic trading and the dynamics of direct reinforcement learning. Therefore, the trading decision function
where
However, considering that trading decisions should be made online for real-world trading systems, we must take real-time performance into account, which is the second key problem to be handled.
The second problem is how can precise and computationally efficient predictions be achieved to assure online trading decisions? Although in most situations, we adopt LSTM for time series prediction, here, we must address real-time performance issues seriously because they are critical for actual trading. Therefore, we adopt an isomorphic CNN-based forecaster-classifier-executor structure, as shown in Fig. 4, which can lead to i) effective calculation for the forecasted price, type of trend, and corresponding likelihood, which could be exploited to refine the prediction. According to Eq. (10), we compute the integrated prediction indicator
Dynamic trading decision
To reasonably address the third key problem, i.e., how to define an effective trading strategy based on the prediction results, especially in highly dynamic scenarios, we improve classic direct reinforcement learning by introducing the prediction-based heuristic trading strategy. Most existing approaches take advantage of reinforcement-learning-based continuous decision-making to address the complicated dynamics of realistic financial markets. Moreover, some of them also adopt deep learning for financial market feature extraction. However, the proposed framework leverages both deep-learning-based prediction and deep feature representation instead of only feature representation and adopts more proactive trading decisions, i.e., predictive decisions. On the one hand, IFF-BOTS better uses CNN’s demonstrated strong capability of nonlinear feature learning and classification for time series prediction, which is more advantageous than feature engineering for further trading execution. On the other hand, the predictive decision would contribute to a better decision for a highly dynamic financial market due to a combination of prediction and the Markov decision process. Given a decent forecast indicated by an integrated prediction indicator, a heuristic strategy could be easily selected. However, in regard to an uncertain prediction measured by the integrated prediction indicator
Evaluation
Experimental settings
Datasets
To verify the proposed algorithm trading method, we conduct experiments on real-world financial data:1
Datasets of S&P and DJI.
General information of two real-world datasets
he main goal of algorithmic trading is to gain more profits and take fewer risks. Thus, the most popular metrics, Total Profits and Sharpe Ratio, are adopted for comparison. Total Profits (TP) gained by trading is defined as follows:
where
Moreover, in modern portfolio theory, risk-adjusted profits are more widely used to evaluate a trading system’s performance than Total Profits. This paper will also consider the Sharpe Ratio (SR), commonly used in many trading-related works. The Sharpe Ratio is the ratio of the average return to the standard deviation of the returns calculated in period
where
Since the proposed IFF-BOTS is a prediction-based trading framework, we consider financial time series prediction. In the framework, both direct price forecasting and trend classification are adopted. As a result, we follow generic forecasting and classification measurements, i.e., root mean square error (RMSE) and mean absolute percentage error (MAPE), for forecasting, while the F1-score is a composite indicator of precision and recall for classification.
Another important measurement is the time cost, including training duration, measured in seconds, especially for real-world applications. It is of great importance that both prediction and trading decision-making be carried out in a real-time manner.
We evaluate the performance of the proposed IFF-BOTS on the datasets mentioned earlier. First, to demonstrate the performance of the isomorphic CNN-based forecaster-classifier-executor structure, we compare its prediction performance with three state-of-the-art methods: LSTM recurrent neural networks (LSTM-RNN) [6], dilated convolutions long short-term memory networks (DC-LSTM) [27], and convolutional LSTM (CNN-LSTM) [16] based on RMSE, MAPE, and F1-score. Then, we continue comparing the overall trading results to evaluate the decision-making performance based on the Sharpe Ratio (SR) and Total Profits (TP). In practical implementation, we compare IFF-BOTS with two sets of trading methods: i) Prediction-based heuristic trading methods, including LSTM-RNN-H, DC-LSTM-H, and CNN-LSTM-H. These approaches only trust the trading signal that has a high likelihood. This signal was identified if the predicted likelihood for one direction was higher than a certain threshold
Experimental environments
We perform all the experiments on a single server with an Intel Core Processor 6 Core Coffee Lake CPU and an Nvidia Geforce 2070s GPU. The software environments include Anaconda 3, Python 3.8, Cuda 11.1, and necessary running libraries such as TensorFlow-GPU 2.2.0 and Cudnn 7.6.5. Implementation of all methods is based on Keras.
To better compare the hyperparameters of the different methods, we use relatively stringent parameter settings: (i) No more than four layers of neural networks are adopted for all these models to ensure acceptable real-time performance; (ii) The number of hidden neurons is no more than 64; (iii) We adopt the Softmax activation function for the output layer of classifier C, linear activation function for the output layer of forecaster F, and the Tanh activation function for the output layer of executor E; (iv) The training epochs are less than 100; (v) We use the RMSPROP optimizer for the forecaster F and the classifier C, and the Adam optimizer for the executor E; (vi) a mini-batch size m
To mitigate the experimental randomness, we ran each method for each dataset ten times with different randomized seeds.
Experimental results
This section presents the results intending to demonstrate the overall performance of IFF-BOTS in both prediction and trading on real-world financial time series datasets and provides some insights into the performance comparison among different algorithmic trading methods.
To make the result clear, we show the best performance among the compared methods in bold. The last row of the tables indicates the ranking of each approach within each dataset.
Prediction metrics
Prediction metrics
Datasets of S&P and DJI.
As shown in Fig. 6 and Table 2, IFF-BOTS’s isomorphic CNN-based architecture demonstrates obvious advantages over LSTM-RNN, DC-LSTM, and CNN-LSTM. From the experimental results, two observations can be found as follows: i) Although the performance of the training set looks similarly well in Fig. 6a and b, IFF-BOTS outperforms its competitors for the test sets in Fig. 6c and d. The measurements of RMSE, MAPE, and F1 are consistently decent. To be more precise, for S&P, the metrics are better by 44.46%, 53.90%, and 6.97% than the second one. For DJI, the scores are better by 35.95%, 41.77%, and 9.70%; ii) Training duration is also much shorter, i.e., 10.3 seconds and 9.66 seconds for S&P and DJI, compared with other methods, especially LSTM-RNN.
Trading results and analysis
A. Comparisons with prediction-based heuristic trading methods
Overall trading results in the test set.
Figure 7c and d demonstrate the trading results among these compared prediction-based heuristic methods, including IFF-BOTS, LSTM-RNN-H, DC-LSTM-H, and CNN-LSTM-H. In addition, the prices of the test sets are shown in Fig. 7a and b, which fluctuate in significantly different manner. In brief, IFF-BOTS outperforms the compared approaches for both S&P and DJI, which can be seen from the profit and loss (P&L) curves in Fig. 7c and d. For S&P, the P&L curve of IFF-BOTS shows much better stability to profit than others, while for DJI, there are apparent advantages of IFF-BOTS, although there are some struggles in a few intervals. To be more specific, the quantitative evaluations are summarized in Table 3. IFF-BOTS is doing an impressive job for the TP and SR, i.e., compared with the second winner, 499.39:100.16 for TP of S&P 1.321:0.283 for SR of S&P, 7794.06:3333.9 for TP of DJI, and 1.51:0.8229 for SR of DJI.
Overall TP and SR results of prediction-based heuristic trading methods
Overall TP and SR results of DDRL-based trading methods
B. Comparisons with DDRL-based methods
For deep-direct-reinforcement-learning-based categories, Fig. 7e and f present the trading comparison among IFF-BOTS, LSTM-RNN-DDRL, DC-LSTM-DDRL, and CNN-LSTM-DDRL. Similarly, the prices of the test sets are shown in Fig. 7a and b.
In summary, IFF-BOTS outperforms the compared approaches for both S&P and DJI. For the S&P, the P&L curve of the IFF-BOTS achieves better profitability than the other methods. For DJI, IFF-BOTS still leads the P&L curves, although CNN-LSTM-DDRL performs well. As shown in Table 4, IFF-BOTS wins its main competitor of DC-LSTM-DDRL for S&P: 499.39:133.98 for TP and 1.321:0.269 for SR. Moreover, regarding DJI, IFF-BOTS is advantageous over CNN-LSTM-DDRL: 7794.06:2524.8 for TP and 1.51:0.529 for SR.
The robustness of the IFF-BOTS is an interesting issue to explore further. Following previous experiments on S&P datasets, we conduct more experiments for different hyperparameters as the IFF-BOTS is based on CNN layers and RNN nodes, potentially impacting the overall performance. For the representation learning ability of neural networks, we adjust the number of layers from 2 to 4 and the number of RNN nodes from 1 to 4. As shown in Table 5, we can see that more hidden layers lead to a slightly better TP and SR because improved feature representation can directly benefit forecaster F, classifier C, and executor E. Moreover, excessive RNN nodes in the direct reinforcement learning model have little impact on model performance, which was discussed in the direct reinforcement learning model. Indeed, with increased CNN layers and RNN nodes, the runtime arises correspondingly. Therefore, we can conclude that the proposed IFF-BOTS is generally robust to different CNN layer and RNN node settings.
Robustness results
Robustness results
This paper proposes a novel algorithmic trading framework IFF-BOTS to satisfy better financial time series prediction performance and dynamic trading decisions for real-world online trading systems. IFF-BOTS features a novel isomorphic CNN-based forecaster-classifier-executor architecture for feature representation of nonstationary, nonlinear, and highly noisy financial data. The framework adopts an integrated prediction to leverage both price forecasting and trend classification. Moreover, it exploits the CNN’s strong capability of pattern classification and nonlinear feature learning. Furthermore, we present an improved direct-reinforcement-learning-based strategy to benefit from deep feature extraction and a prediction-based heuristic strategy based on quantitative prediction. Although training neural networks is involved, the real-time performance of IFF-BOTS is suitable for most online real-world trading systems because of the isomorphic structure and shallow layers. Practically, IFF-BOTS achieves the best metrics and ranking on the real-world S&P and DJI datasets compared to several state-of-the-art prediction-based heuristic trading and deep-direct-reinforcement learning-based trading methods. For future work, we plan to conduct the following research: introduce anomaly detection and classification for financial time series so that turning points and jumps can be detected and predicted, which would be of crucial importance for algorithmic trading; take dynamic modeling for slippage and transaction execution, such as trade orders based on the execution status, into account; and in terms of applications, we plan to extend IFF-BOTS for high-frequency trading (HFT) markets and orchestrate it as a software-as-a-service in cloud environments.
Footnotes
Acknowledgments
This work has been partially supported by China Scholarship Council, National Natural Science Foundation of China under Grant No. 71971174, Science and Technology Program of Sichuan Province under Grant No. 2021JDR0222 and No. 2020YFG0326, and Talent Program of Xihua University under Grant No. Z202047.
