Abstract
Artificial neural network (ANN)-based methods belong to one of the most growing research fields within the artificial intelligence ecosystem, and many novel contributions have been developed over the last years. They are applied in many contexts, although some “influencing factors” such as the number of neurons, the number of hidden layers, and the learning rate can impact the performance of the resulting artificial neural network-based applications. This paper provides a deep analysis about artificial neural network performance based on such factors for real-world temperature forecasting applications. An improved back propagation algorithm for such applications is also presented. By using the results of this paper, researchers and practitioners can analyse the encountered issues when applying ANN-based models for their own specific applications with the aim of achieving better performance indexes.
Keywords
Introduction
In recent years, weather forecasting has been playing a vital role in day-to-day life. Weather warnings are important forecasts because they are used to protect life and property [1]. They can be used for the smart home, load forecasting, renewable energy forecasting, fire hazard prevention, weather report and so on [24]. There is a variety of end users to weather forecasts. For instance, utility companies use them to estimate demand over coming days and improve the reliability, availability, serviceability, and usability of the provided services [2]. Among other things, it is worthwhile mentioning some services provided over communication networks, such as real time communication and multimedia services (VoIP, online games, audio and video streaming), solutions for fully distributed architectures (clouds and grids), evolutionary network services, and storage area networking solutions.
Within the weather forecasting scope, temperature forecasting with high accuracy is a challenging task because of the influence of different atmospheric parameters [23]. Many researchers suggest various temperature forecasting models. In this paper, we propose a temperature forecasting model using artificial neural networks (ANN). ANN-based methods try to mimic a human brain functionality to provide a solution for complex problems, featured by massively parallel fully interconnected structures [6]. They can be composed of different layers such as the input layer, one or more hidden layers and the output layer. The input layer consists of input neurons, which transfer the information or data to the next layer (hidden layer) via weighted connections and activation functions. The same happens for the next layers. The model parameters like weights and biases are modified during the training process by using the so-called training data set. Moreover, before starting such training process, it is important to set the hyper parameters to manage training and testing data set with ease. Some of the hyper parameters are the number of hidden neurons, the number of hidden layers, and the activation functions. The error can be computed based on the difference between the forecasted and the target outputs. A back propagation process is used to reduce such error. In neural network-based application, weight adjustment, hyper parameter selection, minimization of the cost function, regularization, generalization and convergence are the major elements to take into account for achieving satisfactory stability, robustness, and performance indexes.
In this paper, we address the problem of identifying adequate ANN model parameter values (such as the number of hidden layers, the number of hidden neurons, the weights) for temperature forecasting applications. All these elements are referred to as “influencing factors”, since they impact the overall model performances. The outcome of this paper can be used to implement both specific temperature forecasting applications and Application Programming Interface (API) libraries for e.g. Android-based devices.
The paper has been organized as follows. Section 2 provides some preliminaries on ANN-based models and their influencing factors. Section 3 describes the ANN model and design procedures for temperature prediction applications. It also presents the performance indexes used in the paper. Section 4 analyses how the influencing factors affect the performance indexes of the resulting ANN-based model. Section 5 concludes the paper.
Background on the ANN influencing factors
ANN learning methods can be classified as follows: (i) Supervised learning [22] – Neural network with known targets; (ii) Unsupervised learning [7] – Neural network with unknown targets; (iii) Reinforcement learning [8] – Neural network aimed at maximizing expected returns. In this paper, we focus on the first type of learning. Once defined the hyper parameters, the neural network output errors can be minimized by a weight optimization process, known as neural network training. The neural network training process usually starts with small variance and large bias since the neural network output is far away from the desired targets and the influence of the training data set is low. On the other hand, such training process ends with smaller bias (which can solve the underfitting problem) and larger variance (which can give rise to poor generalization of the resulting model). Different algorithms can be adopted for the ANN training process: Delta rule [20], Perceptron learning algorithm [19], Back propagation algorithm [21], Kohonen Learning Rule [10], and Meta heuristic Optimization algorithm [9]. In this paper, a feed-forward ANN with the back-propagation learning algorithm is used: the error of the output is computed and propagated back to the input layer through hidden layers. During such back-propagation phase, the ANN weights are adjusted by means of the gradient descent method. The logistic or sigmoid function is usually adopted in back-propagation learning thanks to its relevant features like continuity, shape, and differentiability.
The selection of an appropriate learning rate is fundamental for the ANN training process. If the learning rate is large, we can take a large step and become stuck in a local minimum. In case of a small learning rate, the gradient descent can be very slow. In ANNs, different activation functions can be used to compute the output signals from the applied total net input signals. The usage of such activation functions is motivated by the following main needs: (i) To improve the computation power and performance of the resulting trained model; (ii) To solve complex problems featured by non-linear dynamics; (iii) To provide output activation with respect to the weighted input neurons. Different types of activation functions can be used, in particular: linear activation function, step activation function, logistic or sigmoidal activation function (sig), hyperbolic tangent activation function (tanh), rectified linear unit activation function (Relu), see [16,18]. The selection of proper activation functions depends on the addressed layer as well as the required performance indexes.
Besides the learning rate, the momentum factor can be used to improve the network stability and minimize the training time. It aids faster convergence by considering past changes in ANN weight updates during the training process. Moreover, neural network oscillations and local convergence to local minima due to higher learning rate can be overcome by the momentum factor.
Finally, we mention the importance of dimensioning the hidden layer by selecting a proper number of hidden neurons. Underfitting can happen in case of too few hidden neurons, whereas a high number of hidden neurons can cause overfitting and slow convergence during the ANN training process. There is no general rule for deciding the number of hidden layers and the number of hidden neurons in each selected hidden layer within a neural network. Researchers can decide the number of hidden layers and the number of hidden neurons in each hidden layer by means of a trial and error method, specific mathematical formulations, and heuristic techniques, see [4,5,12–15,17,27]. An improper number of hidden layers and hidden neurons in each hidden layer can cause different issues like no stability guarantee, poor generalization, slow convergence rate, convergence into local minima, underfitting and overfitting.
The proposed ANN model and design procedure for temperature forecasting applications
In recent years, weather forecasting has been playing a vital role in day-to-day life, and within weather forecasting scope, temperature forecasting. Many researchers suggest various temperature forecasting models, see [3,11,25,26]. Temperature forecasting is used for the smart home, load forecasting, renewable energy forecasting, fire hazard prevention, weather report and so on. Temperature forecasting with high accuracy is a challenging task because of the influence of different atmospheric parameters.
The proposed ANN model for temperature forecasting applications process six meteorological parameters as inputs. The procedure used to set up such ANN model involves three main steps, namely: the ANN design and training approach definition, the data set collection and normalization, and finally, the ANN model training and testing.
ANN design and training approach definition
We apply the ANN back propagation algorithm with the momentum factor to train the ANN model for temperature forecasting applications. Such algorithm is composed of three main steps: the feed-forward step, the error computation step, and the ANN weight update step. The momentum factor can help achieving a faster convergence rate during the ANN training process.
The input layer consists of the input parameters related to the temperature forecasting application, while the output layer has one output neuron, that is to say, the forecasted temperature. The ANN output error is obtained by calculating the difference between the computed and the target temperature values, the latter given in the available data set. In the following section, a deep analysis of the ANN output error and convergence rate based on ANN model influencing factors is performed. The ANN architecture is depicted in Fig. 1, while the procedural steps of the ANN design and back-propagation algorithm are shown in Algorithm 1.

Procedural steps of the proposed algorithm
As for the ANN architecture shown in Fig. 1, the input layer consists of the following data
The output layer provides the forecasted temperature

The proposed ANN architecture.
The synaptic weight vectors of input layer to the hidden layer are the following
The net input to the hidden layer is
The synaptic weight vectors of hidden layer to the output layer can be defined as follows
As consequence, the net input of the output layer becomes
The computed error in the output layer,
The error in hidden layer can be computed as follows
The synaptic weight updating process is described by the following expressions
The real-time data (acting as the ANN input parameters and target temperatures) are collected from the National Oceanic and Atmospheric Administration (United States of America) from January 2016 to December 2018. The input parameters are the temperature (∘C), the wind speed (m/s), the solar irradiance (W/m2), the relative humidity (%), the dew point (%), the precipitation of water contents (%). The target temperature (contained in such data set) is used to compute the error with respect to the forecasted temperature. The data set spans over two years, which data samples 10 minutes. The resulting data set contains 1051200 data samples.
For the considered application, the data set normalization is needed to minimize some training process issues (e.g., high value inputs can reduce the effect of small value inputs) and enhance the performance of the resulting trained ANN model. In particular, the min-max normalization is adopted for the proposed neural network. The scaled input is calculated as follows
Training and testing of the ANN model
The design and training parameters of the ANN model for temperature forecasting applications are given in Table 1.
Design parameters of the ANN model
Design parameters of the ANN model
The hyperbolic tangent sigmoid activation function has been used both in the hidden and output layers. As shown in the next section, the definition of number of hidden neurons in the hidden layer has been performed via a trial and error approach. The temperature forecasting model is constructed by using the training data set, while the corresponding performance indexes are assessed via the testing data set. As shown in the next section,
The following indexes are used for the performance evaluation: Correlation co-efficient (R), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Relative Error (MRE), and Time in Minutes to process the given training and testing data set and compute such error metrics for the given ANN model. The latter index shows the convergence time during the ANN model training process. More specifically, the formulation of the error metrics is
In this section, the performance indexes of the ANN model for temperature forecasting applications are assessed by varying the different influencing factors. Such evaluation has been performed by using MATLAB platform version 2013 executed on an Acer personal computer with Pentium (R) Dual Core processor running at 2.30 GHZ and 2 GB of RAM.
ANN model performance analysis with various hidden neurons
The ANN model has only one hidden layer. Tables 2 and 3 show the performance index values when the number of hidden neurons varies from 1 to 30. As shown in Table 2 in bold, six hidden neurons in the hidden layer achieve the best performance. For this analysis, the data set has been classified as follows: 70% for training purpose and 30% for testing purpose.
Performance analysis with number of hidden neurons 1–15
Performance analysis with number of hidden neurons 1–15
Performance analysis with number of hidden neurons 16–30
The following figures highlight the following aspects:
Fig. 2 shows the temperature data input as a function of the data samples.
Fig. 3 compares the target temperature with the one forecasted by the proposed ANN model.
Fig. 4 shows the error metrics (MSE) as a function of the data samples.
Fig. 5 shows that the proposed temperature forecasting model achieves the regression
Fig. 6 shows the error metrics as a function of the number of neurons within the hidden layer.
Fig. 7 shows the time in minutes to process the training and testing data set as function of the number of neurons within the hidden layer.

Portion of original temperature vs data samples.

Portion of original target temperature against forecasted temperature.

Evaluation error metric (MSE) vs number of data samples.

Regression graph.

Error metrics vs hidden neurons.

Time in minutes vs hidden neurons.
Table 4 shows the performance analysis with different momentum factor values. In this respect, the ANN model with 6 hidden neurons has been used. The best performance indexes are achieved with a momentum factor equal to 0.9. As expected, it can be noticed that the momentum factor only affects the convergence time during the ANN model training process. The other performance indexes remain unchanged.
Performance analysis with different momentum factor values
Performance analysis with different momentum factor values
Table 5 shows the performance analysis with different learning rate values. Like before, the ANN model with 6 hidden neurons with the momentum factor set to 0.9 has been used. The best performance indexes are achieved with a learning rate equal to 0.01. Like the momentum factor, it can be noticed that the learning rate only affects the convergence time during the ANN model training process. The other performance indexes remain unchanged.
Performance analysis with different learning rates
Performance analysis with different learning rates
Table 6 shows the performance analysis with a different classification of the available data between training and testing dataset. Like before, the ANN model with 6 hidden neurons has been used. The best performance indexes are achieved with the following data portioning: 70% for the training data set and 30% for the testing one.
Performance analysis with various training and testing data sets
Performance analysis with various training and testing data sets
Finally, we have assessed the performance indexes of the ANN model for temperature forecasting applications with one, two and three hidden layers. As shown in Table 7, the best performance indexes are achieved with one hidden layer. It can be noticed that, as the number of neural network hidden layers increase, the convergence speed and stability are significantly impacted.
ANN model performance analysis with various hidden layers
ANN model performance analysis with various hidden layers
From this analysis, we can conclude the proposed ANN model for temperature forecasting applications consist of 6 inputs, one single hidden layer, 6 hidden neurons, the momentum factor set to 0.9, and the learning rate set to 0.01. Data set classification is 70 % for training and 30% for testing. This way, it achieves the following performance indexes:
This paper has presented an ANN-based model for real-world temperature forecasting applications. It has investigated how the ANN model influencing factors (e.g., the number of neurons of the hidden layer, the learning rate, the momentum factor) and the portioning of the available data into training and testing data sets can affect the resulting model performance indexes. Prediction error metrics, convergence rate, generalization, and stability have been considered in this analysis, which can be summarized as follows:
As for the ANN model underfitting and overfitting issues, a proper number of hidden neurons has to be determined.
The activation function selection is important to capture the non-linear/complex aspects of the application at hand.
Both the learning rate and the momentum factor affect the convergence rate during the ANN model training process.
An improper classification of training and testing data set can affect the ANN model prediction performance.
The ANN model generalization can be achieved by examining the various ANN model influencing factors.
The results and the remarks of this paper can be used by both researchers and engineers when applying ANN-based models for their own specific applications. The setting of such influencing factors is a key element for the resulting ANN-based model to work. The presented optimization procedure has addressed the optimal value of each influencing factor individually. As a future work, we plan to improve the neural network performance using meta-heuristic optimization algorithms. Moreover, the results of this manuscript can be extended to other weather aspects (such as humidity). Such results can be used to implement weather forecasting applications for e.g. wireless, cellular, and mobile communication services with the aim of improving their reliability, availability, serviceability, and usability.
Footnotes
Acknowledgement
The authors gratefully acknowledge the contribution of the National Oceanic and Atmospheric Administration (United States of America) for the provision of data set used to train and test the proposed ANN-based temperature forecasting model.
