Gold Price Prediction Using Image Based ForCNN Model: A Deep Learning Approach

Abstract

The volatility of gold prices significantly influences global financial stability, necessitating the development of reliable models capable of producing precise forecasts to minimize investment risks and maximize profitability. Recently, both machine learning and deep learning approaches have gained significant traction for time series forecasting across scientific and industrial domains. In this paper, we propose the ForCNN model, which utilizes grayscale image-based input rather than traditional numerical data. This algorithm integrates the advantages of visual image representation of a time series and deep 2D convolution neural network to analyze and extract important features and generate accurate forecasts. We carried out extensive experiments on two real-world gold closing price datasets and showed ForCNN outperformed most of the state-of-the-art deep learning techniques such as MLP, CNN1D, LSTM, CNN-LSTM, BiLSTM, CNN-BiLSTM in terms of accuracy measures. Furthermore, portfolio performance evaluation using Cumulative Return, Average Daily Return, and Sharpe Ratio indicates that ForCNN achieves superior profitability and stronger risk-adjusted performance, thereby underscoring its effectiveness in practical financial forecasting applications.

Keywords

CNN financial forecasting deep learning time series hyperparameter optimization

Introduction

Gold plays a pivotal role in the global economy as it is one of the most preferred commodities for investment worldwide. However, the gold price market demonstrates pronounced volatility and sharp fluctuations due to inflation, supply-demand imbalances, and political disruption. An extensive amount of research has already been done in the development of numerous gold price prediction algorithms. Traditional time series techniques such as ARIMA (Autoregressive Integrated Moving Average) (Guha and Bandyopadhyay, 2016) modeling and multilinear regression had been carried out to forecast future gold price values; however, these classic algorithms rely on presumptions like stationarity and linear correlation between past datasets. In addition to these statistical methods, various sophisticated machine learning algorithms such as MLP (Multilayer Perceptron), SVM (Support Vector Machine), KNN (K-Nearest Neighbor), DT (Decision Trees), and SVR (Support Vector Regression) have been deployed to predict the price of gold (Das et al., 2022). In recent years, researchers have been using deep learning models such as ANN, CNN, LSTM, BiLSTM and many others to produce gold price forecasts with better accuracy (Alameer et al., 2019; Livieris et al. 2020; Lu et al., 2021; Tuncer et al., 2022). Most of these deep learning techniques use a numeric representation of the time series data as input. Although CNNs were originally designed for extracting features from visual images, 1D numeric vectors are used for handling time series data. Despite the use of images in some time series analysis projects, no prior study has explored the use of grayscale images as input to neural networks for generating point forecasts. Spatial structural information encoded in the visual representation of the time series provides a more insightful perspective even though the same information is embedded in numerical values. Researchers found that the combined use of visual time series representations and deep 2D convolutional neural network (CNN) can extrapolate complex patterns, and hence outperform other state-of-the-art deep learning models. Inspired by the aforementioned facts, the contributions of our research are encapsulated as follows:

Proposed a new approach based on the image-based ForCNN model to predict the gold price for the next time step along with hyperparameter optimization.

Investigated the performance of other benchmark models such as MLP, CNN1d, LSTM, CNN-LSTM, BiLSTM, CNN-BiLSTM with hyperparameter optimization and compared their prediction accuracy with the ForCNN model.

Examined the performance of the proposed approach through an empirical evaluation using two different datasets of closing gold price in GBP (Great British Pound) and USD (United States Dollar) and further assessed its investment potential using portfolio performance metrics such as Cumulative Return (CR), Average Daily Return (ADR), Annualized Return (AR), Annualized Volatility (AV), Sharpe Ratio (SR), and Maximum Drawdown (MD).

The outline of the paper is as follows: Sections 1 and 2 provide a brief introduction and review of the literature on the use of deep learning and statistical models in the field of gold price forecasting. In Sections 3 and 4, we have briefly described the architecture of the ForCNN model, the datasets, the accuracy measures, and the benchmark models. The outcome of the current work and the conclusions are discussed in Sections 5 and 6.

Literature Review

The challenging task of predicting and modeling the volatility of financial series (Gezici and Sefer, 2024; Pala and Sefer, 2024; Uygun and Sefer, 2025) has attracted considerable attention from both researchers and practitioners. A paper on the prediction of gold prices was published by Banhi Guha and Gautam Bandyopadhyay in 2016 (Guha and Bandyopadhyay, 2016). Thereafter, in 2017 Sandya N. Kumari et al. demonstrated the strength of GARCH models for modeling and forecasting the volatility and non-linearity of gold prices (Kumari and Tan, 2018). Despite the effectivity of ARIMA and GARCH models in the field of gold price forecasting, machine learning and deep learning algorithms (Pirani et al., 2022) have been recently popularized due to their ability to analyze complex trends, non-linear patterns, and tackle larger datasets. Arash Tashakkori et al. established the efficacy of MLP (MultiLayer Perceptron) for forecasting gold prices and showed the ability of MLP neural networks in grasping the complex relationships behind gold price fluctuations (Tashakkori et al., 2024). Furthermore, R. Hafezi and A. N. Akhavan built an intelligent network that was equipped with a meta-heuristic algorithm known as the BAT algorithm. This algorithm was designed to enable the ANN to efficiently manage fluctuations in gold prices (Hafezi and Akhavan, 2018). In another study, Zakaria Alameer et al. trained a Multilayer Perceptron by implementing WOP (Whale Optimization Algorithm) for forecasting gold prices and higher precision was achieved when compared to other baseline models (Alameer et al., 2019). Moreover, Andres Vidal and Werner Kristjanpoller developed a hybrid model that integrated LSTM and VGG19 in 2019 to enhance the precision of gold price volatility forecast. Since the network was fed with images as input, it was capable of preserving both the stationary and dynamic characteristics of the time series (Vidal and Kristjanpoller, 2020). In 2021, Yu-Chen Chen and Wen-Chen Huang utilized CNN and LSTM models to analyze market behavior and generate trading stategies for the S&P 500, using commodity prices like gold and crude oil as indicators (Chen and Huang, 2021). A CNN-BiLstm-AM model was presented by Wenjie Lu1 et al. to forecast stock closing price of the next day. CNN helped to extract intricate features from input and Bilstm used those features for forecasting stock closing price of the next day. Thereafter, in order to increase forecast accuracy, AM was applied to record the impact of feature states on the closing price of the stock at multiple instances in the past (Lu et al., 2021). On the other hand, Margustin Salima and Arif Djunaidy created a hybrid CNN-LSTM model in 2023 where images were provided as input. In this work, the time series data were converted to images by the Gramian Angular Field (GAF) technique (Salim and Djunaidy, 2024) for training the neural network architecture. Subsequently, a hybrid CNN-LSTM model has been developed by Ioannis E. Livieris et al. to evaluate future gold prices and movements. Two distinct versions of their model were built, each with two convolutional layers and different numbers of filters. As a result, the first model showed improved forecasting ability, while the second model exhibited better performance for the classification problem of the gold movement estimate (Livieris et al., 2020). Expanding the image-based approach to general time series, Artemios et al. in 2022 introduced a ForCNN model and tested it on the time series dataset of the M3 and M4 forecasting competitions. Their results verified that ForCNN outperformed ResNet, VGG-19 and other baseline models in terms of accuracy (Semenoglou et al. 2023). Nevertheless, the ForCNN model has not been demonstrated for the purpose of forecasting gold prices with volatility and non-linear fluctuations. We have conducted a comprehensive performance evaluation of the ForCNN model and have compared it to other benchmark models to validate the efficiency of ForCNN model for forecasting gold time series.

Methodology

A time series is defined as an ordered sequence of observations taken over equally spaced time intervals. A time series Y of length n can be represented as follows: $Y_{t} = {y_{t} \in R | t = 1, 2, \dots . n}$ where $y_{t}$ is the value of the time series at time t.

Time Series Preprocessing

The methodological approach followed in this paper to forecast gold time series includes two steps. In the first step, the $1 D$ numeric time series data is pre-processed and exported as $2 D$ images. Each image is constructed from a particular window of the data in-sample of the time series and is used as input to the ForCNN model. In the second step, the images are utilized for the purpose of training the model and generating out-of-sample forecasts. Initially, the in-sample data of the time series are partitioned into equal-sized windows of 24 observations each. We have considered the gold price trend for last 24 h to forecast the next value. Subsequently, the min-max scaling is implemented to adjust the values within the range of [0,1]. Let ${\tilde{y}}_{i}$ displays the normalized value of $y_{i}$ in the range $i = n - w + 1, \dots, n$ such that: ${\tilde{y}}_{i} = \frac{y_{i} - \min (y)}{\max (y) - \min (y)}$ . Then, the 1D numeric vectors are converted to 2D images by using line plots where the horizontal X axis shows the time period of each observation and the vertical Y axis represents the scaled values of the observations. Since line plots contain the most significant information for generating a forecast, the plots are thickened and made white, whereas the background of the image is kept black. Other visuals such as axes and legends have been eliminated to keep the image simple. In the end, all images have been resized to $64 \times 64$ pixels.

Model Architecture

Let $X \in R^{64 \times 64 \times 1}$ denotes the input image produced in the pre-processing stage is fed as input to ForCNN model for training deep learning network and generating point forecasts for the gold time series. As shown in Figure 1, the architecture of the ForCNN model constitutes of two modules, specifically an encoder and a regressor. The encoder facilitates the conversion of each image X into a vector W that contains a latent representation of X. Although incorporating more layers can help the network detect more specific patterns but issues such as vanishing gradient and decaying of training accuracy may arise. In order to mitigate this problem, shortcut connections have been activated between these layers so that input information of the block can be transmitted straight to the output. Each convolution layer of the encoder contains $2 D$ convolutions applied to its corresponding input using $3 \times 3$ filters, with zero padding employed to preserve the original input dimensions. Afterwards, batch normalization and ReLU (Rectified Linear Unit) activation functions are implemented consecutively. The layers are assembled into blocks, with each block consisting of three convolution layers and an identity shortcut from input to output. The information from the block's primary path and shortcut is added prior to the application of the ReLU activation function. The residual convolution blocks are ultimately structured into stacks where each successive stack doubles the number of convolutional filters while reducing the spatial dimensions of the feature maps by the same factor. Instead of fixed pooling operations such as max or average pooling, downsampling is achieved through $2 D$ convolutions with $2 \times 2$ kernels and a stride of 2. After the final stack, all feature maps are concatenated and flattened to construct the embedding vector which captures the hierarchical spatial temporal representation of the input. The second module of this architecture is the regressor, which produces point forecasts F by utilizing the embedding vector W generated by the encoder. The forecasting horizon and the number of nodes of the output layer of the network are considered same, specifically here it is taken as 1 while generating predictions. The mathematical equations for the proposed ForCNN model are as follows:

Consider, Z^{(0)} = X,

where

X \in R^{64 \times 64 \times 1}

denotes the input image and

Z^{(s, l, k)}

denotes the intermediate feature map produced by the

k^{th}

convolutional layer within the

l^{th}

block of the

s^{th}

stack.

U^{(s, l, k)} = {Conv}_{l, k}^{(s)} (Z^{(s, l, k - 1)}) = W^{(s, l, k)} * Z^{(s, l, k - 1)} + b^{(s, l, k)},

where

*

denotes the 2D convolution,

W^{(s, l, k)} \in R^{f_{l, k} \times c_{l - 1} \times 3 \times 3}

are the learnable kernels, and

b^{(s, l, k)} \in R^{f_{l, k}}

is the bias vector. Here,

f_{l, k}

and

c_{l - 1}

represent the number of output filters and the number of input channels received from the previous layer respectively and

U^{(s, l, k)}

denotes the pre-activation feature map.

\begin{matrix} {\tilde{Z}}^{(s, l, k)} = BN (U^{(s, l, k)}), \\ Z^{(s, l, k)} = ReLU ({\tilde{Z}}^{(s, l, k)}), k = 1, 2. \end{matrix}

The activation is applied after the first two convolutions, while the third convolution omits ReLU before residual addition to preserve gradient flow.

F^{(s, l)} (Z^{(s, l - 1)}) = BN ({Conv}_{l, 3}^{(s)} (Z^{(s, l, 2)})) .

S^{(s, l)} (Z^{(s, l - 1)}) = {\begin{matrix} Z^{(s, l - 1)}, & if channel and spatial dimensions match, \\ BN ({Conv}_{proj, l, s}^{1 \times 1} (Z^{(s, l - 1)})), & otherwise . \end{matrix}

Z^{(s, l)} = ReLU (F^{(s, l)} (Z^{(s, l - 1)}) \oplus S^{(s, l)} (Z^{(s, l - 1)})), l = 1, \dots, L .

Z_{stack - out}^{(s)} = P^{(s)} (Z^{(s, L)}), P^{(s)} (U) = BN ({Conv}_{down}^{(s)} (U)),

where

{Conv}_{down}^{(s)} (\cdot)

uses a

2 \times 2

kernel with stride

2

to reduce the spatial resolution while doubling the number of filters.

V = Cat (Z_{stack - out}^{(1)}, Z_{stack - out}^{(2)}, \dots, Z_{stack - out}^{(s)}),

where Cat(·) denotes concatenation along the channel dimension.

W = {Dense}_{d} (Flatten (V)), W \in R^{d} .

Here, W represents the latent embedding vector containing the compressed temporal–spatial representation of the input image X.

\begin{matrix} Z_{fc}^{(1)} = ReLU ({Dense}_{m_{1}} (W)), \\ Z_{fc}^{(2)} = ReLU ({Dense}_{m_{2}} (Z_{fc}^{(1)})), \\ F (X) = {Dense}_{h} (Z_{fc}^{(2)}), F (X) \in R^{h} . \end{matrix}

Here, $m_{1}$ , $m_{2}$ and h denote the number of neurons in the first and second layers and the output size consecutively. For one-step-ahead forecasting, $h = 1$ , and thus $F (X)$ outputs a scalar value representing the predicted future gold price.

L_{MAE} (Θ) = \frac{1}{N} \sum_{i = 1}^{N} | F (X_{i}; Θ) - y_{i, true} |,

where

Θ

denotes the collection of all trainable parameters (

W^{(s, l, k)}, b^{(s, l, k)}

, batch normalization, and dense layer weights) and the model is optimized using the Adam algorithm.

Experimental Setup

Dataset

This analysis uses two gold price datasets for validating the usefulness of ForCNN model. First dataset comprises per-minute gold price data in GBP from January 2014 to December 2023 (HistData, 2025). The data were divided into a training set and a testing set. The training set contains daily gold closing prices from January 2014 to December 2020, while the testing set includes the subsequent data set from January 2021 to December 2023. We have transformed the per-minute data set into hourly data for our research objectives. The second data consists of hourly gold price closing dataset in USD (MetaTrader 5, 2024). The training set for this data is considered from January 2020 to December 2023 and the testing set contains data from January 2024 to April 2025.

Forecasting Accuracy Measures

It is important to establish direct comparisons between our findings and other benchmarks in literature. Consequently, we opted to use the standard metrics used for the ForCNN model to analyze and compare the predicting accuracy of the examined models. Specifically, the measure used for evaluating the ForCNN model was symmetric mean absolute percentage error (sMAPE) and additionally, we calculated RMSE (Root Mean Square Error) and MAE (Mean Absolute Error). The formulae for these error metrics are as follows:

sMAPE = 2 \cdot \frac{1}{h} \sum_{t = n + 1}^{n + h} \frac{| y_{t} - f_{t} |}{| y_{t} | + | f_{t} |} \times 100

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{t} - f_{t})}^{2}}

MAE = \frac{1}{n} \sum_{t = 1}^{n} | y_{t} - f_{t} |

where, at point t,

f_{t}

represents the method's forecast, while

y_{t}

indicates the actual value of the series at the exact time and n implies the number of historical observations.

In addition to conventional forecasting accuracy metrics, to evaluate the effectiveness of the forecasting models from a trading perspective, several portfolio-based performance measures (Uygun and Sefer, 2025) were computed, including Cumulative Return (CR), ADR, Annualized Return (AR), Annualized Volatility (AV), Sharpe Ratio (SR), and Maximum Drawdown (MD). Let the actual daily prices be denoted by ${P_{t}}_{t = 1}^{T}$ and the predicted prices by ${{\hat{P}}_{t}}_{t = 1}^{T}$ .

The daily return from the actual prices is computed as:

r_{t} = \frac{P_{t} - P_{t - 1}}{P_{t - 1}}, t = 2, 3, \dots, T

The trading signal $s_{t}$ is generated based on the forecasting model:

s_{t} = {\begin{matrix} 1, & if {\hat{P}}_{t} > P_{t - 1} (Buy) \\ - 1, & otherwise (Sell) \end{matrix}

The corresponding strategy return is then given by:

r_{t}^{(s)} = s_{t} \times r_{t}

Using these strategy returns, the following performance metrics are derived:

C u m u l a t i v e R e t u r n (C R) [%] = (\prod_{t = 1}^{T} (1 + r_{t}^{(s)}) - 1)) \times 100

A v e r a g e D a i l y R e t u r n (A D R) [%] = \frac{\sum_{t = 1}^{T} r_{t}^{(s)}}{T} \times 100

A n n u a l i z e d R e t u r n (A R) [%] = \bar{r^{(s)}} \times 252 \times 100

A n n u a l i z e d V o l a t i l i t y (A V) [%] = σ_{r^{(s)}} \times \sqrt 252 \times 100

S h a r p e R a t i o (S R) = \frac{(A R / 100 - r_{f})}{(A V / 100)}

M a x i m u m D r a w d o w n (M D) (%) = m a x_{t \in [1, T]} (\frac{P e a k V a l u e_{t} - l o w e s t V a l u e_{t}}{P e a k V a l u e_{t}}) \times 100

where

\bar{r^{(s)}}

and

σ_{r^{(s)}}

represent the mean and standard deviation of the daily strategy returns, respectively,

r_{f}

is the annual risk-free rate (taken as 4.14% according to the US 10-year treasury rate), and 252 denotes the number of trading days in a year.

Benchmark Models

We have considered six different benchmark models to evaluate the relative forecast performance of the proposed ForCNN model. For each of the models, the last 24 h gold price trend has been fixed as the sequence length to calculate the forecast of the next hour. The main purpose of choosing sequence length 24 is to be able to compare between performance of the models with image data and numerical data. The network architecture and the corresponding optimal values for each models are described in details in the appendix.

MLP

Multilayer Perceptron, or MLP (Tashakkori et al., 2024), is a fully interconnected feedforward network that is commonly utilized for data analysis tasks like regression and classification. MLPs have many hidden layers, and each one is made up of neurons that use non-linear activation functions to their input.

CNN1D

Artificial neural networks that apply convolutional filters along one dimension are known as 1D CNNs (Hu et al., 2023). CNN-1D models are very effective and successful for time series forecasting applications. To enhance feature learning and reduce computational complexity, they make use of shared weights and sparse connections.

LSTM

In 1997, S. Hochreiter and J. Schmidhuber proposed LSTM (Hochreiter and Schmidhuber, 1997), a particular kind of recurrent neural network that assists in solving issues related to vanishing and exploding gradients. LSTM architecture contains multiple LSTM layers and each LSTM layer contains multiple lstm units operating in parallel. Each unit consists of an lstm cell composed of a cell state and three gates: forget gate ( $f_{t}$ ), input gate ( $i_{t}$ ), and output gate ( $o_{t}$ ). Additionally, it also contains activation functions, weights and biases. Due to this configuration, LSTM can manage controlled information flow by filtering out the unnecessary data through forget gate and adding the potential new information through input gate. Thereafter, output gate determines which information could be used for the output of the memory cell. The equation for the LSTM gates, cell state and hidden state are as follows:

\begin{matrix} f_{t} = σ (U_{g} x_{t} + W_{g} h_{t - 1} + b_{f}) \\ i_{t} = σ (U_{i} x_{t} + W_{i} h_{t - 1} + b_{i}) \\ o_{t} = σ (U_{o} x_{t} + W_{o} h_{t - 1} + b_{o}) \\ {\tilde{c}}_{t} = t a n h (U_{c} x_{t} + W_{c} h_{t - 1} + b_{c}) \\ c_{t} = g_{t} * c_{t - 1} + i_{t} * {\tilde{c}}_{t} \\ h_{t} = o_{t} * t a n h (c_{t}) \end{matrix}

where,

σ

is the sigmoid activation function,

U_{i}, U_{g}, U_{o}

are the weight matrices for input connections,

W_{i}, W_{g}, W_{o}

are the weight matrices for hidden state connections,

b_{i}, b_{f}, b_{o}

are the bias terms for each gate. Moreover,

{\tilde{c}}_{t}

is the candidate cell state,

c_{t}

is the cell state and

h_{t}

is the hidden state at time step t.

Finally, cell state ( $c_{t}$ ) and hidden state ( $h_{t}$ ) both are forwarded to the subsequent LSTM unit as input (Selvin et al., 2017).

CNN-LSTM

CNN, introduced by Lecun et al. in 1998, is a multilayer deep neural network which can process both numerical data and image data. Recently, the combination of CNN and LSTM that is CNN-LSTM (Livieris et al., 2020; Widiputra et al. 2021) model is immensely implemented in areas of time series forecasting, signal and natural language processing, image recognition and so on. CNN-LSTM architecture uses CNN layers for extracting complex features from input and LSTM layer for sequence prediction.

BiLSTM

Bidirectional LSTM, often known as BiLSTM (Stankovic et al., 2023), is a model for processing sequences that has two LSTMs: forward and reverse. This network can use past and future data to create more thorough and detailed conclusions by taking into account the changing laws of information both before and after data transmission.

CNN-BiLSTM

The CNN-BiLSTM (Lu et al., 2021) model integrates the advantages of CNNs and Bidirectional Long Short-Term Memory (BiLSTM) networks (Zhang et al. 2023).

Results and Discussion

Forecasting Accuracy and Visual Comparison of Forecasts

Table 1 highlights the forecast accuracy of the benchmark models and the proposed ForCNN model for the gold time series in GBP. The second, third, and fourth columns of the table denote the scores according to the sMAPE, RMSE, and MAE error measures. These results indicate that using grayscale images for training ForCNN model can significantly enhance forecasting accuracy. ForCNN excels the MLP benchmark in accuracy by 0.77%, 72.48%, and 71.45% with respect to sMAPE, RMSE, and MAE, respectively. In comparison to CNN-1D, ForCNN displays improvement by 0.75%, 72.44%, and 71.38% in the identical metrics. Furthermore, it surpasses LSTM by 0.77%, 72.36%, and 71.33% regarding sMAPE, RMSE, and MAE. Furthermore, ForCNN outperforms the CNN-LSTM model by 0.78% for sMAPE, 71.43% for RMSE, and 81.37% for MAE. It also exhibits better performance than BiLSTM by 0.76%, 72.35%, and 71.32%, and exceeds CNN-BiLSTM's precision by 0.76%, 72.48%, and 71.42% across the same measures.

Table 1.

Error Comparison Between ForCNN and Benchmark Models for the Gold Price Dataset in GBP.

Models Used	SMAPE	RMSE	MAE
MLP	4.9744	44.96	30.34
CNN-1D	4.9737	44.89	30.26
LSTM	4.9743	44.76	30.21
CNN_LSTM	4.9738	45.00	30.32
BiLSTM	4.9742	44.74	30.20
CNN_BiLSTM	4.9742	44.95	30.31
ForCNN	4.9359	12.37	8.66

The predictive accuracy of the benchmark models alongside the proposed ForCNN model for the gold time series in USD is illustrated in Table 2. When assessing performance, ForCNN shows 3.13% higher precision in sMAPE, 90.84% in RMSE, and 90.74% in MAE compared to MLP. It also delivers 2.90%, 90.29%, and 90.16% better results than CNN-1D, and shows 3.016%, 90.55%, and 90.53% improvements over LSTM relative to same error metrics consecutively. ForCNN also excels CNN-LSTM in prediction accuracy by 2.37%, 88.88% and 88.56% in sMAPE, RMSE, MAE respectively. It performs superior than BiLSTM, by 2.82%, 90.11%, and 90.03%, and shows enhanced results over CNN-BiLSTM by 2.15%, 79.91%, and 88.31% with respect to identical error metrics. Figures 2–8 and Figures 9–15 illustrate the forecasting plots for the GBP and USD gold price time series, respectively.

Table 2.

Error Comparison Between ForCNN and Benchmark Models for the Gold Price Data in USD.

Models Used	sMAPE	RMSE	MAE
MLP	15.035	57.76	36.94
CNN-1D	14.999	54.51	34.76
LSTM	15.016	56.03	36.14
CNN_LSTM	14.919	47.58	29.90
BiLSTM	14.987	53.50	34.31
CNN_BiLSTM	14.885	44.34	29.26
ForCNN	14.564	5.29	3.42

Financial Performance Evaluation on USD Dataset

The results of the portfolio evaluation in Table 3 reveal that the proposed ForCNN model consistently outperforms all benchmark approaches, achieving the highest cumulative and annualized returns with a superior Sharpe Ratio of 1.3959. Its strong profitability, coupled with moderate volatility and minimal drawdown, demonstrates the robustness of the model in capturing market dynamics and maintaining stable risk-adjusted performance.

Table 3.

Performance Comparison of Models Based on Portfolio Metrics.

Model	Cumulative Return (CR) [%]	Average Daily Return (ADR) [%]	Annualized Return (AR) [%]	Annualized Volatility (AV) [%]	Sharpe Ratio (SR)	Max Drawdown (MD) [%]
ForCNN	1274.5690	0.0334	8.4122	3.0604	1.3959	−2.9195
MLP	220.7455	0.0150	3.7678	3.0969	−0.1202	−8.7890
CNN1D	232.2389	0.0154	3.8802	3.0963	−0.0839	−8.7890
LSTM	227.0327	0.0152	3.8298	3.0966	−0.1002	−8.7890
CNN-LSTM	291.7830	0.0175	4.4063	3.0935	−0.0809	−8.2637
BiLSTM	233.2068	0.0154	3.8895	3.0963	−0.0809	−8.2637
CNN-BiLSTM	265.7511	0.0166	4.1869	3.0948	0.0151	−8.2637

Table 4.

Ablation Study of the ForCNN Model on USD Dataset.

Experiment	Changes in Configuration	MAE	RMSE	sMAPE
Proposed Model	5 blocks, 2 layers, residual = False, 32 filters	3.426	5.294	14.564
Ablation 1	Number of blocks = 3	3.431	5.318	14.564
Ablation 2	Number of layers = 1	3.402	5.289	14.565
Ablation 3	Starting filters = 16	3.475	5.341	14.564
Ablation 4	Residual = True	3.458	5.339	14.566
Ablation 5	Bottleneck = 512	3.439	5.352	14.563

Hyperparameter Optimization

The hyperparameters for all the models were optimized using the Tree-structured Parzen Estimator (TPE) implemented in the HyperOpt library. TPE is a Bayesian optimization approach that models $p (x ∣ y)$ instead of $p (y ∣ x)$ , where x represents the hyperparameter configuration and y the corresponding validation loss. It partitions prior trials into “good” and “bad” configurations based on a quantile threshold. The probability density of the “good” configurations is denoted as $l (x) = p (x ∣ y < y^{*})$ , and that of the “bad” configurations as $g (x) = p (x ∣ y \geq y^{*})$ , where $y^{*}$ is the loss value at the quantile threshold. Then new hyperparameter samples are drawn new samples that maximize the likelihood ratio $\frac{l (x)}{g (x)}$ , thus efficiently exploring promising regions of the search space. For conducting experiments, each configuration was trained for up to 50 epochs with a validation split of 0.2. The MAE on the validation set was used as the optimization objective. A total of 30 trials were evaluated using SparkTrials with a parallelism of 4 to accelerate the search. The selected configuration provided a good balance between model accuracy, ability to generalize to new data, and computational efficiency. Similarly, hyperparameters are found for all the benchmark models as well and described in appedix section. This systematic, data-driven hyperparameter selection ensured that model performance was not biased by arbitrary manual choices.

Ablation Study

As presented in Table 4, we examined the effect of varying architectural parameters such as the number of blocks, the number of layers, residual connections, and number of filters. The configuration obtained through hyperparameter optimization (5 blocks, 2 layers, residual = False, 32 filters) yielded the most consistent and stable results. A model variant with one layer showed a marginal reduction in error (from 3.426 to 3.402 in mae and 5.294 to 5.289 in rmse); however, this difference was not statistically significant. Therefore, the final model in the paper retains the optimized configuration for forecasting. The proposed ForCNN model as well as all the benchmark models used in this work were trained using Lenovo ThinkingSystem SR650 with the following configuration: 2 nos of Intel(R) Xeon(R) Gold 6248 CPU @ 2.50 GHz. Processor, 64 cores, 192 GB RAM, and 2 TB HDD, and the codes were executed in Python 3.8.9.

Conclusion

Deep learning and machine learning algorithms in finance are frequently upgraded to handle the vast amount of data accessible nowadays. This paper investigated the significance of using visual time series representation and deep $2 D$ convolution network in the area of time series forecasting. Since, the proposed ForCNN model combines convolution and dense layers in a single neural network, it can leverage the advantages of deep CNN when used for recognizing patterns and analyzing images. ForCNN model was trained on two large real world gold datasets along with hyperparameter optimization to get the best hyperparameters for reducing the error. In order to verify the performance of ForCNN, several benchmark models such MLP, CNN1D, LSTM, CNN-LSTM, BiLSTM and CNN-BiLSTM are considered as benchmark models and have been trained using the best optimized hyperparameters. Although MLP and LSTM-based models operate solely on sequential numeric data, they fail to capture spatial correlations or localized fluctuations in the gold price dynamics. CNN-1D and hybrid CNN–LSTM/BiLSTM architectures can extract local temporal dependencies, but lack residual pathways, which limits their depth and causes performance degradation with increasing layers. In contrast, ForCNN transforms the time series into visual form, allowing convolutional filters to learn geometric and structural temporal patterns, such as sharp price jumps or smooth upward trends, directly from images. Furthermore, the residual connections in ForCNN preserve information and facilitate stable gradient propagation through deeper layers, enabling effective shortcut connection. Hence, ForCNN excels all the benchmark model with higher precision and our study builds the foundation for future investigation to forecast financial time series with gray-scale images as input. As a limitation to the proposed model, it is important to mention the computational time for the ForCNN model is higher than the other benchmark models. Therefore, computational costs might be aimed at reducing in future work. Moreover, this methodology could be implemented in other types of time series to check the performance of this model in a broader field. In addition to this, colored plots could be considered instead of gray-scale images to derive essential information from the time series and generate forecasts more accurately.

Figure 1.

Architecture of the ForCNN Model.

Figure 2.

Plot of time series and the corresponding forecast given by ForCNN for the gold price data in GBP.

Figure 3.

Plot of time series and the corresponding forecast given by MLP for the gold price data in GBP.

Figure 4.

Plot of time series and the corresponding forecast given by CNN1D for the gold price data in GBP.

Figure 5.

Plot of time series and the corresponding forecast given by LSTM for the gold price data in GBP.

Figure 6.

Plot of time series and the corresponding forecast given by CNN_LSTM for the gold price data in GBP.

Figure 7.

Plot of time series and the corresponding forecast given by BiLSTM for the gold price data in GBP.

Figure 8.

Plot of time series and the corresponding forecast given by CNN_BiLSTM for the gold price data in GBP.

Figure 9.

Plot of time series and the corresponding forecast given by ForCNN for the gold price data in USD.

Figure 10.

Plot of time series and the corresponding forecast given by MLP for the gold price data in USD.

Figure 11.

Plot of time series and the corresponding forecast given by CNN1D for the gold price data in USD.

Figure 12.

Plot of time series and the corresponding forecast given by LSTM for the gold price data in USD.

Figure 13.

Plot of time series and the corresponding forecast given by CNN-LSTM for the gold price data in USD.

Figure 14.

Plot of time series and the corresponding forecast given by BiLSTM for the gold price data in USD.

Figure 15.

Plot of time series and the corresponding forecast given by CNN-BiLSTM for the gold price data in USD.

Footnotes

Acknowledgments

This work has been financially supported by the Department of Science and Technology (DST), Government of India, through the INSPIRE fellowship with no. DST/INSPIRE Fellowship/2022/IF220134. The Centre for Computational Modeling and Simulation, National Institute of Technology Calicut, has provided assistance for all computational work.

Credit Authorship Contribution Statement

First Author: Conceptualization, Methodology, Software, Coding, Writing – original draft, Writing – review & editing. Second Author: Methodology, Software, & Coding. Last Author: Supervision, Editing, Validation.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Department of Science and Technology, INSPIRE, Government of India, (grant number DST/INSPIRE Fellowship/2022/IF220134).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data and Code Availability

The datasets in GBP and USD are obtained from the websites https://www.histdata.com and https://www.metatrader5.com, respectively, and the code is available at .

ORCID iD

Soumi Mahato

References

Alameer

Elaziz

Ewees

, et al. (2019) Forecasting gold price fluctuations using improved multilayer perceptron neural network and whale optimization algorithm. Resources Policy 61: 250–260.

Chen

Y-C

Huang

W-C

(2021) Constructing a stock-price forecast CNN model with gold and crude oil indicators. Applied Soft Computing 112: 107760.

Das

Nayak

Rao

, et al. (2022) Gold price forecasting using machine learning techniques: Review of a decade. Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2021: 679–695.

Gezici

AHB

Sefer

(2024) Deep transformer-based asset price and direction prediction. IEEE Access 12: 24164–24178.

Guha

Bandyopadhyay

(2016) Gold Price Forecasting Using ARIMA Model. Journal of Advanced Management Science 4(2): 117–121.

Hafezi

Akhavan

(2018) Forecasting gold price changes: Application of an equipped artificial neural network. AUT Journal of Modeling and Simulation 50(1): 71–82.

HistData (2025) HistData - Free Forex Historical Data. https://www.histdata.com/.

Hochreiter

Schmidhuber

(1997) Long short-term memory. Neural Computation 9(8): 1735–1780.

A-F

Xie

S-L

, et al. (2023) Soil parameter inversion modeling using deep learning algorithms and its application to settlement prediction: A comparative study. Acta Geotechnica 18(10): 5597–5618.

10.

Kumari

Tan

(2018) Modeling and forecasting volatility series: With reference to gold price. Thailand Statistician 16(1): 77–63.

11.

Livieris

Pintelas

(2020) A CNN–LSTM model for gold price time-series forecasting. Neural Computing and Applications 32: 17351–17360.

12.

Wang

, et al. (2021) A CNN-BiLSTM-AM method for stock price prediction. Neural Computing and Applications 33(10): 4741–4753.

13.

MetaTrader 5 (2024) MetaTrader 5 Trading Platform. https://www.metatrader5.com/.

14.

Pala

Sefer

(2024) NFT Price and sales characteristics prediction by transfer learning of visual attributes. The Journal of Finance and Data Science 10: 100148.

15.

Pirani

Thakkar

Jivrani

, et al. (2022) A comparative analysis of ARIMA, GRU, LSTM and BiLSTM on financial time series forecasting. In 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), 1–6.

16.

Salim

Djunaidy

(2024) Development of a CNN-LSTM approach with images as time-series data representation for predicting gold prices. Procedia Computer Science 234: 333–340.

17.

Selvin

Vinayakumar

Gopalakrishnan

, et al. (2017) Stock price prediction using LSTM, RNN and CNN-Sliding window model. In 2017 International Conference on Advances in Computing, Communications and Informatics (Icacci), 1643–47.

18.

Semenoglou

A-A

Spiliotis

Assimakopoulos

(2023) Image-Based time series forecasting: A deep convolutional neural network approach. Neural Networks 157: 39–53.

19.

Stankovic

Bacanin

Budimirovic

, et al. (2023) Bi-Directional long short- term memory optimization by improved teaching-learning based algorithm for univariate gold price forecasting. In 2023 International Conference on Inventive Computation Technologies (ICICT), 1650–57.

20.

Tashakkori

Talebzadeh

Salboukh

, et al. (2024) Forecasting gold prices with MLP neural networks: A machine learning approach. International Journal of Science and Engineering Applications (IJSEA) 13: 13–20.

21.

Tuncer

Kaya

Sefer

, et al. (2022) Asset price and direction prediction via deep 2d transformer and convolutional neural networks. In Proceedings of the Third ACM International Conference on AI in Finance, 79–86.

22.

Uygun

Sefer

(2025) Financial asset price prediction with graph neural network-based temporal deep learning models. Neural Computing and Applications 37(30): 25445–25471.

23.

Vidal

Kristjanpoller

(2020) Gold volatility prediction using a CNN-LSTM approach. Expert Systems with Applications 157: 113481.

24.

Widiputra

Mailangkay

Gautama

(2021) Multivariate CNN-LSTM model for multiple parallel financial time-series prediction. Complexity 2021: 9903518.

25.

Zhang

Lai

(2023) Stock price prediction using CNN-BiLSTM-attention model. Mathematics 11(9): 1985.