Abstract
This document outlines the analysis, design, implementation, and benchmarking of various neural network architectures in a short-term frequency prediction system for the FOREX market. Our objective is to emulate the decision-making process of a human expert (technical analyst) through a system that swiftly adapts to market condition changes, thereby optimizing short-term trading strategies. We have designed and implemented a series of LSTM neural network architectures that take exchange rate values as input to generate short-term market trend forecasts. Additionally, we developed a custom ANN architecture based on simulators for technical analysis indicators. We performed a comparative analysis of the results and came to useful conclusions regarding the suitability of each architecture and the cost in terms of time and computational power to implement them. The ANN custom architecture produces better prediction quality with higher sensitivity using fewer resources and spending less time than LSTM architectures. The ANN custom architecture appears to be ideal for use in low-power computing systems and for use cases that need fast decisions with the least possible computational cost.
Introduction
The bulk of profits in the foreign exchange market, especially in FOREX [1], is generated through substantial leverage using margin [2]. Leveraging ratios as high as 1:200 (where an initial capital of €1000 allows for risking up to €200,000) present considerable risk for low-volatility investments, particularly those executed intraday and sometimes within mere minutes. Consequently, there is a contention that forecasting models [3] should be grounded in short time periods.
In markets characterized by substantial depth and volume, such as FOREX, capitalizing on micro-volatility in the short term holds paramount importance and can be accomplished through analogous short-term forecasts [4].
Over the past few decades, economists have endeavored to construct models capable of successfully predicting trends, giving rise to the field known as technical analysis. Despite extensive and prolonged efforts, there is still no universally applicable index or model that can reliably forecast financial market trends. The primary obstacle stems from technical analysis neglecting the most recent shifts in fundamentals, which remain unrecorded, as well as the impact of breaking news on investor psychology.
The goal of this paper is to evaluate the short-term trend predictions generated by various artificial neural network architectures, aiming to draw insightful conclusions regarding the suitability of each architecture. Specifically, we compare the quality of the prediction of six different parameterizations of vanilla LSTM, bidirectional LSTM and convolutional LSTM networks with a prototype artificial neural network architecture based on simple error backpropagation networks.
This paper is organized into four sections. Initially, we provide a concise overview of related studies on exchange rate forecasting employing computational intelligence. Next, we outline the architectures of the various forecasting systems used in our experiments. This is followed by a presentation and analysis of the experimental results. Finally, we conclude with a discussion of the findings and propose potential future research directions.
A concise overview of exchange rate prediction using computational intelligence
As discussed earlier, participants in the FOREX market employ technical analysis tools [5] to predict currency movements. Nonetheless, automated trading systems frequently generate greater profits by executing large-scale trades according to predictive models. The effectiveness of technical analysis methods can be inconsistent, often failing due to overlooked shifts in fundamental values and market sentiment. The accuracy of these predictions generally diminishes with shorter time frames [6].
Efficiently tackling the challenge of automated trading with a comprehensive portfolio strategy that continuously processes data streams across diverse markets is demonstrated in [7]. The study presents a scalable trading model that learns to generate profit from multiple inter-market price predictions and market correlation structures.
Forecasting methods can be broadly divided into traditional and non-traditional approaches. Traditional methods employ static algorithms that remain unaffected by input data [8], serving as econometric models for interpreting results and testing hypotheses, a standard practice in technical analysis [8].
In contrast, non-traditional methods include data-driven approaches that adapt based on the input data [8]. These methods, such as fuzzy logic [9], Artificial Neural Networks (ANN) [10], neuro-fuzzy architectures (hybrid systems) [11], and genetic algorithms [12], can rival econometric methods due to their generalized operations [13]. Machine-learning-based methods, particularly those utilizing historical trading data, are regarded as robust for predicting trading patterns in FOREX [10].
Neural networks, especially those with hidden layers, provide an internal representation of variable relationships and excel in handling sparse data and complex phenomena [14]. Genetic algorithms have been used to learn trading rules and, when combined with echo-state networks, have shown superior performance in predicting market trends in both bullish and bearish markets compared to conventional strategies [15].
We will now briefly review some key contributions to the field:
Cavalcante et al. [16] offered a comprehensive overview of primary studies from 2009 to 2015, highlighting techniques for preprocessing, clustering financial data, forecasting market movements, and mining financial information. Patel et al. [17] focused on predicting stock market index prices using a two-stage fusion approach with Support Vector Regression (SVR) and Artificial Neural Networks (ANN). Yıldırım et al. [18] used LSTM networks for directional predictions in Forex, achieving success with a hybrid model that incorporates macroeconomic and technical indicator data. Fisher et al. [19] applied LSTM networks to predict directional movements in S&P 500 constituent stocks, observing varying profitability over time. Xiong et al. [20] utilized Long Short-Term Memory neural networks to model S&P 500 volatility, outperforming linear benchmarks. Galeshchuk and Mukherjee [21] explored deep convolutional neural networks for predicting exchange rate direction with satisfactory accuracy.
In earlier work [22], we developed an ANN to predict market signals in the FOREX market, leveraging the advantages of both technical analysis and ANN in causal modeling and case control. In a subsequent study [23], we introduced an ultra-short-term frequency trading system for FOREX, incorporating artificial intelligence techniques for pre-trade analysis, trend forecasting, and trade execution. This system aimed to emulate human expert judgment and decision-making, achieving superior performance compared to individual or combined technical indicators across various automated trading engines.
In this paper we experiment with several LSTM network architectures and compare their performance with the performance of an improved version of the aforementioned architecture, drawing useful conclusions about their suitability for FOREX time series prediction.
An indepth system description
This section outlines our methodology for analyzing, designing, and implementing ultra-short trend prediction. The system involves critical phases such as Pretrade Analysis and Transaction Signal Production (Trend Forecasting) [24].
Our goal is to replicate the decision-making process of a human expert, whether a technical analyst or broker, using an artificial intelligence system that efficiently adapts to market condition changes. This adaptability is essential for maximizing the effectiveness of short-term trades.
The analysis phase begins with data mining, where pertinent data for the subsequent stages is meticulously selected. Following this, in the trend forecasting phase, various Artificial Neural Network (ANN) architectures are developed and applied to produce trend forecasting signals. The final phase consists of a comparative analysis of the different sources of trend forecasting, specifically focusing on the various ANN architectures utilized.
Selection of the exchange rate and experimental data source
For our research, we chose to concentrate on the EUR/USD exchange rate, considering its prominence as the world’s largest trading currency pair. The substantial market depth of this pair serves as a deterrent against entities attempting price manipulations that could distort its accurate representation.
Our selected data sources for experimentation include Truefx [25], a widely recognized exchange rate data server in the industry, and American Integral [26]. Integral is utilized by major institutional FOREX service providers globally as a reference for pricing.
The experimental dataset covers the tick-to-tick EUR/USD exchange rate for the months of October, November, and December 2021. Initially, the dataset contains over 10 million values, which undergo preprocessing to eliminate stagnant periods where the exchange rate remains unchanged.
Selected LSTM networks
Recurrent networks leverage feedback connections to retain information from recent input events as a trigger for the activation function, enabling the incorporation of short-term memory. Although networks of this type are effective for several applications (e.g. voice recognition) they have weaknesses in cases where there is a non-trivial time lag between the input and the expected output.
“Long Short-Term Memory” or LSTM networks, commonly known as such, are recurrent networks specifically designed to address the issue of rapidly diminishing short-term memory in retaining information over longer sequences. The LSTM model effectively preserves selected information in long-term memory, which is stored in the cell state, while short-term information is captured in the hidden state.
For the implementation of the chosen LSTM architectures, we utilized Keras & TensorFlow 2. Keras is a deep learning API written in Python, operating on the TensorFlow machine learning platform. It is designed with a focus on facilitating rapid experimentation [27]. Known for its top-notch performance and scalability, Keras is widely adopted by organizations and companies like NASA, YouTube, Waymo, as well as research institutes and universities.
We selected eight different LSTM architectures for our experimentation with parameters as shown in Table 1. All these LSTM architectures follow the sequential model and have a ReLU activation function.
Selected LSTM architectures
Selected LSTM architectures
*The number of sequences of input LSTM will train before generating an output.
In the following figures we show the various LSTM architectures.
sLSTM-1-1 and sLSTM-15- architectures.
sLSTM-15-1 and sLSTM-15-1,15 architectures.
biLSTM-1-1 and biLSTM-15- architectures.
biLSTM-15-1 architecture.
convLSTM-1-1 and convLSTM-1-1,15 architectures.
Following initial experimentation, we identified and adapted certain technical indicator algorithms for our study [23], aligning them with short-term forecasting objectives (see Fig. 6). These include arithmetic moving averages (MAs) of 300, 600, and 900 prices, the RSI-300 oscillator, the CCI-300 oscillator, the Williams-300 oscillator, and the Price Oscillator (MA-300, MA-600, MA-900). These technical indicators generate forecasts as described in Annex I.
The system takes exchange rate, time, and date as inputs (see Fig. 6) and, utilizing the predicted trend signal and the configurations of its auto-trading agents, simulates short-term trading while generating performance logs to simulate profit or loss.
The data serves as input to the customized technical indicator simulators (see Fig. 6) [28]. Each technical indicator simulator produces an output value from the set of values outlined in Table 2. These outputs from the technical indicator simulators are then fed into the input neurons of the ANN system, as detailed in prior research [23].
The prediction system consists of two sets of Artificial Neural Networks (ANNs) functioning in pairs. Within each pair, one ANN receives the outputs of simulators corresponding to technical indicators as inputs and operates using conventional error back-propagation mode, aiming to align with the trend prediction. This ANN utilizes past values to calculate the prediction error. The learned weights from this ANN are then transferred to its paired ANN. However, the paired ANN operates exclusively in feed-forward mode, considering present values. Consequently, one ANN is trained on historical data, while its counterpart generates predictions on current data. All feed-forward ANNs are aggregated in an ensemble to produce the final trend forecast [23]. This architecture is a modification of the core concept of Generative Adversarial Networks [29]. Customized technical indicators are generated and their anticipated patterns at time t-M(x)-1 are relayed to the input layer of each back-propagation Artificial Neural Network (ANN). Each technical indicator is mapped to an input neuron of the ANN, where its computation mirrors its state at time t-M(x)-1. In this context, t-M(x) denotes the specific moment when the neural network with index x operated previously (for instance, M(1) – 30 indicates a focus on validating the technical indicator’s forecast within 30 seconds). The hidden layer utilizes a sigmoid activation function akin to tanh to yield output values within the interval of [
Additionally, the algorithm for determining the actual trend undergoes updates utilizing data from the previous time point (t-1) and the point at t-1-M(x) (refer to Table 3). This method produces a normalized estimate of the actual trend. Subsequently, the output value from the final node is compared with the actual trend to facilitate the training of the neural network. The criteria for real trend conditions (see Table 3) are chosen based on preliminary tests.
Each back-propagation ANN in the series is identified by the period it operates in the past (t-M(x)), with the number of these ANNs being adjustable. The number of feed-forward ANNs matches the number of back-propagation ANNs, as each back-propagation ANN transfers its neuron weights to a corresponding feed-forward ANN [23].
Custom technical indicators are created, and their forecasted trends for time t are provided to the input layer of each feed-forward ANN. The hidden layer uses a tanh-type sigmoid activation function to generate output values within the range of [
Mapping of numerical values to trends
Mapping of numerical values to trends
An overview of the system – architecture.
Each Forecasting Trend (FT (x)) from the ANN feedforward series contributes a certain proportion to the final Forecasting Trend (FFT) of the system (Fig. 7). This algorithm essentially determines the contribution weight of each ANN feedforward to the ultimate forecast. The contribution of each ANN to the final prediction is calculated as the inverse of its absolute error divided by the sum of the inverses of the absolute errors of both ANNs for time t-K, where K
The criteria for determining the actual trend in the current ANN forecasting system (rules are presented in descending order of priority) are consistent with ultra-short-term trading practices
Parameterization of the Artificial Neural Network (ANN)
An overview of the calculation of the final forecasting trend (for two networks).
The parameter values for all neural networks (both back-propagation and feed-forward series) were chosen based on our previous work to ensure comparability (Table 4). Similar to our prior work, each series of back-propagation and feed-forward ANNs consists of three ANNs (three pairs of ANNs). Additionally, each back-propagation ANN has five parameters, as outlined in Table 5. The parameter values for the technical analysis simulators align with those in our previous work (Table 6). The Predicted Trend Value defines the upward or downward multiplier of the exchange rate required for the neural network to trigger the corresponding trend (
Parameterization of back-propagation ANN’s
Parameterization of technical indicators simulators
This ANN was developed in Java using the Apache NetBeans IDE 13.0 [30]. The application is fully configurable via a properly labeled parameter file. The LSTM architectures were developed in Python using Google Colab [31].
Experimentation and results
In this section, we will evaluate and compare various LSTM architectures outlined in Table 1 with the specific ANN architecture we have developed (Section 3.3). We will compare them in terms of forecasting success in the same field of experimentation, sensitivity in terms of the ability to generate forecasting signals and in terms of resource consumption.
Our experimentation utilizes tick-by-tick data for the EUR/USD exchange rate covering the months of October, November, and December 2021.
We used 2 metrics to compare the different architectures in our experimentation: Success in terms of trend of all prediction signals (STA) and success in terms of trend of strong prediction signals (STS) only. A strong prediction signal is considered to be a signal with an intensity
Condition of success in term of trend of each value of signal
Condition of success in term of trend of each value of signal
The data were fed to eight (8) LSTM architectures (Table 1) and our architecture described in Section 3.3. For the LSTM architectures, 50% of each month’s data was used for training and 50% for trend forecasting. In our architecture, which is retrained serially with each new value, no training dataset was larger than the period of the long-term technical indicator used (in this case 900 exchange rate values, about 15 min of data) is required. To make the results of our architecture and the LSTM architectures comparable, we present the trend forecast only for the data that predicted the trend and the LSTM architectures (2nd half of each month).
Table 8 shows the aggregate results of the experiment for the different LSTM architectures (Section 3.2) and our ANN architecture.
Aggregated results of experimentation
We see that, in all three months, in both the STA and STS indices, the ANN custom architecture outperforms the LSTM architectures in terms of success rates. Furthermore, we observe that the absolute number of forecasting signals yielded by the specific ANN architecture is always more than 100% larger than the number of signals yielded by the LSTM architectures, suggesting a significantly higher sensitivity and better forecasting ability.
Cumulative time series of the percentage of successful predictions – STA.
Cumulative time series of the percentage of successful predictions – STS.
Figures 8 and 9 show the cumulative time series of successful STA and STS predictions over the entire experimentation.
Throughout the experiment it is clearly shown that the ANN-specific architecture outperforms all alternative LSTM architectures. There is no time window during which the predictions of the ANN-specific architecture produces inferior quality forecasting compared to the forecasting of the LSTM architectures. Note that the superiority of this ANN architecture is even more significant when we consider that it generates multiple numbers of forecasting signals than LSTM architectures.
The clarity of the picture is obvious as time passes.
At the end of the field experimentation the specific ANN architecture has produced 31,701 forecasting signals with 25,720 (81.13%) of them being successful. Correspondingly it has produced 2,070 strong forecasting signals of which 1,627 (78.6% percentage) were confirmed.
All LSTM architectures had similar performance between them. We can say that the basic LSTM architecture sLSTM-1-1 (Table 1) had the best relative performance by producing 4,145 forecasting signals of which 3,011 (72.64%) were successful. Correspondingly it produced 822 strong forecasting signals of which 515 (62.7%) were confirmed.
Therefore, the specific ANN architecture produced a total of 7.6 times more forecasting signals than LSTM. The successful forecasting signals of the specific ANN architecture are 8.5 times more than the LSTM architecture.
On a separate but increasingly important aspect, we note that when selecting an artificial neural network architecture one cannot fail to consider the resources it consumes to train and produce prediction.
For the medium complexity LSTM architecture of our experiments (biLSTM-15-1.15) using resources from google colab and specifically a python 3 Google Compute Engine backend with GPU acceleration, the time taken to train and predict the month of December 2021 was 1,175 seconds.
The specific ANN architecture using local resources (laptop with Ryzen 5 7520U processor without GPU acceleration), the time taken for same field of experimentation was 44 sec.
Therefore, the specific ANN architecture needs 27 times less time and fewer resources to perform the same field of experimentation compared to the LSTM architectures which can be an important selection criterion for users who cannot invest in the processing and communication overheads required by some modern cloud services.
We have designed and built a ANN custom architecture which combines machine learning and technical analysis.
Specifically, a set of modified artificial indicators are fed to the input neurons of an ANN architecture, which consists of a series of back propagation trained ANNs and a series of feed forward only ANNs, all of which work in pairs. In each pair, a backpropagation neural network (learn-only network) assigns its weights to an artificial neural network (use-only network) which only works in feedforward mode. The final prediction is based on a weighting algorithm that takes into account the prediction quality of each pair of neural networks (learn-only NN and use-only NN) of the previous time window.
The prediction quality of the custom architecture was compared with the prediction quality of 9 different LSTM architectures. We looked at both the absolute number of successful forecasts and the success rate of all of them. In all cases the custom ANN architecture outperformed the LSTM ones, producing a total of 31,701 forecasting signals with 25,720 (81.13%) of them being successful. The best performing LSTM architectures produced a total of 4,145 forecasting signals with 3,011 (72.64%) of them being successful. It becomes clear that the custom ANN architecture produces better quality forecasting while also being more sensitive, i.e. it produces more, better quality, trend signals
It is also important to note that our custom architecture trains and generates signals serially throughout the experimentation, requiring minimal initial calibration data depending on the maximum period of the modified technical indicators. Note that all LSTM architectures require training on the initial 50% of the experimentation field in order to then generate a reasonable forecast for the remaining 50%.
An increasingly important issue in the selection of an artificial neural network architecture is the resources it consumes to train and produce prediction. We have produced an indicative estimate that the custom ANN architecture requires nearly 1/25th of the time and far fewer resources to perform the same field of experimentation compared to the LSTM architectures. This makes it possible to use it in real time devices with low computational resources, thus lowering the entry threshold for stakeholders who might want to join the FOREX trading market, as well as for other types of applications which rely on nearly-real-time data processing.
