Abstract
Fuel prices, which are of broad concern to the general public, are always seen as a challenging research topic. This paper proposes a variational Bayesian structural time-series model (STM) to effectively process complex fuel sales data online and provide real-time forecasting of fuel sales. While a traditional STM normally uses a probability model and the Markov chain Monte Carlo (MCMC) method to process change points, using the MCMC method to train the online model can be difficult given a relatively heavy computing load and time consumption. We thus consider the variational Bayesian STM, which uses variational Bayesian inference to make a reliable judgment of the trend change points without relying on artificial prior information, for our prediction method. With the inferences being driven by the data, our model passes the quantitative uncertainties to the forecast stage of the time series, which improves the robustness and reliability of the model. After conducting several experiments by using a self-collected dataset, we show that compared with a traditional STM, the proposed model has significantly shorter computing times for approximate forecast precision. Moreover, our model improves the forecast efficiency for fuel sales and the synergy of the distributed forecast platform based on an architecture of network.
Keywords
Introduction
Due to the gradual opening of China’s fuel market, the sales model of the traditional regional market is no longer effective. Fuel retail terminals are facing increasingly fierce competition. To improve competitiveness in the fuel market, it is necessary for these terminals to promptly formulate effective sales strategies. Traditional sales strategies depend on the judgments of terminal sellers regarding the market situation to a large extent and are highly subjective and uncertain. Currently, there are many widely distributed fuel terminal stations, which leads to complex fuel sales data and makes it difficult to conduct a real-time analysis. To address these issues, we design an efficient distributed platform of network architecture, to support real-time data acquisition and analysis, those fuel terminal stations’ large-scale data intensive upload and distributed storage, distributed real-time model training and predict, the network architecture includes 5G mobile network, VPN network based on internet, local intranet. Therefore, the focus of this research is learning how to develop an efficient online forecast using complex fuel sales data and provide support for the timely adjustment of sales strategies, based on this platform. The fuel sales data used in this study are a typical economic time series. An important part of the analysis of the time series is the selection of a suitable probability model (or class of models) for the data.
In the past, traditional statistical models, such as autoregressive (AR) models, autoregressive integrated moving average (ARIMA) models and generalized autoregressive conditional heteroscedastic (GARCH) models, were generally adopted to forecast fuel market information. AR models are most widely studied because of their flexibility to model many stationary processes. However, AR coefficients change over time and when an input variable switches smoothly between two regimes. In practice, many parameters have to be optimized to build a consistent time-varying smooth transition autoregressive (STAR) model, and no efficient analytical optimization method is available. ARIMA models are used in many areas of science. Conejo et al. [5] applied this type of model for electricity price forecasting, with good results. ARIMA models are largely limited to capturing first-order nonstationarity and second-order (moment) nonstationarity, i.e., time-varying conditional variance or volatility. The subsequently developed GARCH model is used when variances in the time series differ. Garcia et al. [9] dealt with the heteroscedasticity problem with electricity price time-series data by using a GARCH model.
These linear models provide only a coarse approximation to real-world complex systems and generally fail to accurately predict the evolution of nonlinear and nonstationary processes.
Neural networks (NNs) and other nonlinear operator-based models have been used for nonlinear time-series forecasting in manufacturing systems and finance. For oil price prediction research, Kristjanpoller and Minutolo proposed a novel hybrid model, comprising an artificial neural network and a GARCH model, to analyze and predict oil price return volatility, where the artificial neural network better improves accuracy compared to several traditional oil price prediction models, such as the GARCH and ARFIMA models [19]. Mahdiani and Khamehchi [22] developed an oil price prediction method based on a neural network model using the genetic algorithm to optimize the model’s parameters, and the results showed that the model has significantly improved performance, especially when there is a small amount of training data or the variables show high fluctuations. Baruník and Malinská [1] used a generalized regression framework-based method in addressing oil price forecasting problems, while neural networks are also being used in such frameworks. By using this method with a real-time dataset, results show that the new method consistently has the smallest error [1]. Gupta and Nigam [12] proposed another novel method of predicting crude oil prices using the artificial neural network (ANN) that delivers significantly better results than other models by modifying time delay changes on the closed optimized results, which are then evaluated by the root mean square error. However, these models are relatively simple, consider fewer market factors and involve strict constraints. Therefore, it is impossible for them to accurately and effectively reflect the complexity of fuel sales data.
To address the abovementioned problem, a structural time-series model (STM)-based market forecast method was proposed by Harvey and Fernandes [14] and Harvey and Shephard [15] as continuous-time structural models for use in forecasting lead time demand for inventory control. The focus of STM is not to accurately represent the data-generation process but to decompose the series into different components and recreate the series. Compared with a traditional time-series model, an STM appears to be more accurate, more explicit and flexible in terms of describing economic variables. In particular, it is much easier for an STM to accurately replicate the characteristics of changes in the series. Therefore, using an STM, time series may be decomposed into components with multiple factors, and different methods can be used to process different factors [13]. The advantage of this model lies in the fact that it can process the data better than other models if there are missing values in the sample data or the data involve more complex changes such as trend changes and unexpected changes. Due to the flexibility of the STM, it has been used in research on sustainable development: Brodersen et al. [2] of Google captured the trend, seasonal and similar components of the target series by developing a Bayesian structural time-series (BSTS) model, and other studies [3,6,31,32,36] demonstrated how the BSTS model could be applied given different events. Recently, Taylor and Letham [34] of Facebook developed a state decomposition model with adjustable parameters. When forecasting fuel sales, data are often missing due to differences in how fuel enterprises develop their information systems. Therefore, strict requirements are proposed regarding the change-point processing ability of the forecast model in terms of obtaining a good forecast result with limited fuel sales data. A traditional STM uses the Markov chain Monte Carlo (MCMC) method to establish a change-point recognition model. The MCMC method is more accurate but time-consuming. It is impossible for such a method to allow the front-end and the back-end to collaborate for real-time forecasting of fuel sales.
To efficiently forecast fuel sales in real time, this paper proposes an STM based on variational Bayesian inference. The major contribution of this paper is that it establishes a model that identifies the change points in data by using variational Bayesian inference. Through experimental verification, this study shows that this method is more efficient than the traditional MCMC method. For approximate forecast precision, the modeling time of the former is reduced to approximately 1/200 of that of the latter. In theory, the MCMC method can be used to find the optimal solution with a sufficient number of iterations. However, in practice, the results obtained by using the MCMC method are problematic due to an excessively large variance, which greatly decreases the efficiency of convergence and prevents the immediate ending of MCMC convergence. By overcoming the shortcomings of the MCMC method, the variational method addresses the problem and can be used for optimization; this model uses the gradient descent algorithm for fast iterations, thus making real-time fast iterations possible for the time-series forecast model. Therefore, the variational method provides an efficient and feasible means for the extensive deployment and implementation of a collaborative front-end and back-end forecast of fuel sales data.
This paper contributes in the following two ways:
A variational BSTS model that uses variational Bayesian inference to make a reliable judgment on the trend change points without relying on artificial prior information is proposed for forecasting.
To improve the robustness and reliability of the model, data-driven inferences are implemented, and the quantitative uncertainties are passed to the forecast stage of the time series.
This study proves through experimental verification that compared with a traditional STM, the proposed model has a significantly shorter computing time for approximate forecast precision. The model can improve the efficiency of forecasting fuel sales and the synergy of the forecast platform.
Structural time-series forecast model for fuel sales
The STM was first proposed by Harvey and Fernandes [14]. An STM can approximate the state space and address unobserved aspects of the data, including trends, cycles, seasonality and irregularities. As these factors are entered into the model as unobservable variables, it is impossible to estimate the model parameters by using the traditional regression analysis method. Therefore, these factors are expressed in the form of a state space in statistical processing. An STM can better interpret the characteristics of economic variables and can deal with complex time series that occur in real business activities. This paper forecasts complex fuel sales information by using an STM. In consideration of the economic indicators of fuel sales, the established model includes three major components, namely, trends, seasonality and promotion shocks. The STM model can be decomposed as follows (1):
Establishment of the trend growth model
The trend growth factor reflects real changes in the time series and is the most basic factor in the time series. The instability of the time series is caused by the instability of the trend growth factor. Taylor and Letham [34] claimed that the trend model is mainly a combination of two trend models, the saturated growth model and the piecewise linear model. The development of the market and human society follow unique laws. Due to the impacts of resources and the environment, it is impossible for the development the market and human society to continue indefinitely. When development reaches its limit for a certain stage, saturation will undoubtedly occur. Hutchinson [16] claimed that market coverage growth will reach saturation. As the sales volume of the fuel market is constrained by market demand and production capacity, the trend model must be a combination of two trend models, namely, a saturated growth model and piecewise linear model. Because the corresponding professional knowledge and experience are required to establish a saturation model for the fuel market, the saturated growth model is omitted, and only a piecewise linear model is incorporated here. The specific form of the piecewise linear model is expressed as follows (2):
Here,
A vector created by (3) and (4) is used to define the total variation of each change point. The growth rate at time t can be expressed as follows (5):
Thus, all states at time t can be described by the following matrix (6):
Equations (2), (4) and (5) can be combined to obtain the trend growth model for the STM for forecasting fuel sales according to (7):
Establishment of the seasonal cycle model
Due to the impacts of mankind’s commercial behaviors, fuel sales fluctuations tend to have a seasonal cycle, with multiple periods. To fit and forecast this seasonality, the cycle function of
When fitting the seasonal cycle factor using Fourier series,
By using (12), the main component of the seasonal factor of the fuel sales forecast STM can be expressed as (13):
Impulse response function at the time of promotion
Due to the characteristics of fuel sales, fuel sales promotion and other behaviors will impact the sales time series to a large extent, and these shocks are periodic. Therefore, there is no way to model the impacts of these factors using a stale cycle function. Only the impacts of fuel promotion behaviors are considered in this paper. For each promotion period i,
To reflect the impact of the promotion on the model, a linear pulse response function is adopted. Assuming that the prior ρ obeys a normal distribution of
Variational Bayesian change-point testing model
To establish the fuel sales forecast STM, a fuel sales growth trend model is required to forecast fuel sales. The change point
The first step is establishing hierarchical prior information; the second step is inferring the posterior distribution using variational inference; and the third step is forecasting future data points using the posterior distribution.
Hierarchical Bayesian prior information modeling
In this paper, the change points can be automatically selected through the sparse prior δ. There are S change points among previous T data points. To make the change points sparse, the prior distribution of the growth rate and growth points is selected as
To prevent the parameter in (16) from being restricted by prior information, a vector containing a series of hyperparameters,
To establish the hierarchical Bayesian model, it is necessary to parameterize each Bayesian layer. It is generally assumed that these parameters obey a multivariate normal distribution and are expressed as follows (18):
In the first layer,
The discussion above describes the method used to construct the hierarchical Bayesian model. Assume that the problem to be solved is a linear regression. Then, let
Equation (22) is obtained by assuming the prior distribution of the regression coefficients obeys the Laplace prior:
Expressing (22) as a mixed normal distribution with a mixed exponential distribution gives (23):
The prior is set. For any given adjusting parameter, the posterior distribution
Then, the posterior distribution of the unknown parameters is given by the following (29):
The hierarchal Bayesian framework for the trend model capable of automatic change-point detection is shown in Fig. 1 below.

Hierarchal Bayesian framework for the trend model capable of automatic change-point detection. The value of sigma has a sparse prior distributed as a Laplace density with parameter tau.
After obtaining the hierarchical prior distribution of a probability model, either variational inference will be used to infer the posterior distribution, or the iterative prior is used to update the parameter. Under the Bayesian framework, the joint posterior distribution is expressed as follows [18]:
By removing the integral of (30), the conditional distribution for the marginal distribution can be further simplified as
When there is a large parameter space but limited samples, it is not easy to obtain (31). However, variational inference can be used for processing, and ξ is parameterized to approximate the posterior distribution [23,25] as follows (32):
Variational inference tries to make
To minimize the KL divergence, it is converted into the ELBO that maximizes the following (34):
The mean field variational inference (MFVI) method is used to control the computing complexity of the second item in (35). The variational distribution is expressed by using a factor distribution subject to the decomposition layer by layer, where each factor is determined by its own variation parameters, as given by (36):
Equations (35) and (36) are substituted into (34). Then, the ELBO target function is transformed into the following (37):
When (37) satisfies the convergence conditions, the iterations are terminated, and the final optimization result is obtained. The variational algorithm can be used for online detection of the change points in the fuel sales growth trend model. Then, the fuel sales forecast STM is established for the efficient online detection of fuel sales [10,17,35,38].
Experimental verification
Fuel sales data acquisition for testing and descriptions
To verify the efficiency and effect of the fuel sales forecast using the variational BSTS, an experiment is conducted with previous fuel sales data. The experiment uses the granularity data of a fuel sales company, which is the branch of the China National Petroleum Corporation (CNPC), from January 2015 to December 2018. The data for the last month or December 2018 (30 days) are set as the testing data. The preliminary data are training data, which are used to forecast the fuel sales of the company. The data include information on the time, fuel sales and relevant promotion time. As some data are missing, data need to be interpolated based on the experiences of experts in relevant fields. Fuel sales data can be acquired and analyzed by using the online fuel sales data acquisition platform shown in Fig. 2. The function modules of the management platform mainly include comprehensive management platforms, collaborative offices, mobile applications and other main modules. Among them, the comprehensive management platform is capable of marketing data asset management and analysis.

Online fuel sales data acquisition distributed platform.
The experiment conducted in this paper considers only the positions of the change points. When assuming the number of change points, one change point is designated at a certain time interval in advance, and then, the change rate is realized through the sparse prior based on the variation in the change points and through hierarchical Bayesian to reduce the variation to 0 and automatically select the change points. Here, the number of change points is set to 15, and the variation in the sparse change points is shown in Fig. 3. Through sparse variation, the change points automatically selected are obtained, as shown in Fig. 4 below. Figures 3 and 4 show that among the 15 change points that are designated on the time axis and uniformly distributed, eight are automatically selected through the regularization of the sparse prior. Figure 4 also shows that there are change points automatically selected by the algorithm near each inflection point, and the long-term trend change points are around the change points with large variations.

Sparse change-point variation. Fuel sales have suffered a persistent depression since the fourth quarter of 2018.

Locations of the sparse change points. The first plot shows the result of the whole trend function as a reasonable continuous trend. The middle plot shows the growth rate at time t. The last plot shows the offset at time t. These can adjust the growth rate’s height.
To verify the estimated efficiency of the fuel forecast method used with the variational BSTS for the variation, the MCMC method and variational inference are used to estimate the parameter posterior of the model.
Figure 5 shows the results when simulating 1,000 samples (2,000 iterations) by using the MCMC method for 39 minutes. Figure 6 shows the results when simulating 10,000 samples (2,000 iterations) by using the MCMC method for 480 minutes (8 h).

Simulation of 1,000 samples by using the Markov chain Monte Carlo (MCMC) method. (Left: kernel estimate posterior density plots; right: sampling trace plots). The chain exhibits very high autocorrelation according to the trace plot. An issue of high autocorrelation between posterior samples can be raised. Postprocessing algorithms require samples to be independent.
A comparison of Figs 5 and 6 shows that the distribution function curves of the latter appear much smoother. Additionally, the distribution of the sample values indicate that the latter has a random state and no significant correlation and trend. Therefore, the convergence effect is much better when simulating 10,000 samples using the MCMC method. However, using the MCMC method for 10,000 samples took 480 minutes. Despite its better convergence effect, its efficiency fails to satisfy the requirements of the existing massive stream data for the model to forecast online training in a real-time manner.

Simulation of 10,000 samples by using the Markov chain Monte Carlo (MCMC) method. (Left: kernel estimate posterior density plots, right: sampling trace plots). The chain is exploring the sample space well, and autocorrelation between posterior samples is no longer significant.
Variational inference is used in the experiment for parameter inference, and 10,000 samples are simulated, as shown in Fig. 7. In only 2 minutes and 40 seconds, results similar to those obtained using the MCMC method with 10,000 samples are obtained. The distribution curves in the figure show that the distribution shape obtained through variational inference is close to that obtained with its counterpart using the MCMC method.

Simulation of 10,000 samples by using variational inference. (Left: kernel estimate posterior density plots; right: sampling trace plots). As shown in Fig. 6, autocorrelation between the samples is not significant, and this figure shows similar parameter estimations.
Figure 8 shows that the variables estimated by using the variational method converge earlier, and the estimated variable mean is generally close to that obtained by using the MCMC method. Figures 9 and 10 show the sample distribution of the estimated variables obtained by using both methods. The covariances of the estimated variables of the two methods are basically similar. Obviously, the simulated sample distribution obtained by using the variational inference method is more concentrated, which increases the speed of attaining convergence. The results of the experiment show that the variational inference after processing the Bayesian posterior distribution not only obtains excellent results but also has obvious computing advantages when using large amounts of sample data.

Estimated variable means obtained by using the Markov chain Monte Carlo (MCMC) method (left) and variational inference (right). Y-axis: number of samples; X-axis: iterations.

Distribution of the estimated variable samples by using the Markov chain Monte Carlo (MCMC) method. Five parameters are shown (sigma, tau, m, k, and delta). The diagonal cells are the marginal distributions of the individual parameters; the other cells are the joint distribution of the two parameters. There are negative correlations between m and k and between m and delta. There is a positive correlation between k and delta.

Distribution of the estimated variable samples by using variational inference. Five parameters are shown (sigma, tau, m, k, and delta). Similar to Fig. 9, the joint distribution of m, k and delta exhibits a correlation, but its shape is narrower than the shape in Fig. 9, and this explains the quick convergence of variational inference.
The variational BSTS model and the traditional STM are now used to forecast the experimental data. The forecast results for three gas stations are shown in Figs 11 and 12. The results of the experiment are shown in Table 1. In terms of forecast precision, the performance of the variational inference structural time-series model (VI-STM) model is similar to that of the MCMC-STM model. By contrast, the computing time of the VI-STM model is significantly shorter than that of MCMC-STM and can ensure accurate online forecasting of complex fuel sales data.

Forecast by using the Markov chain Monte Carlo structural time-series model (MCMC-STM). The red line represents the actual values, and the blue line represents the forecasted values. The gray band around the blue line is called the “uncertainty interval”, and similar to the confidence interval, it indicates the range of forecasted values produced by the model.

Forecast by using the variational Bayesian structural time-series (BSTS) model. The red line represents the actual values, and the blue line represents the forecasted values. The gray band around the blue line is called the “uncertainty interval”, and similar to the confidence interval, it indicates the range of forecasted values produced by the model.
Forecast precision of the different time-series models
MAPE-Mean absolute percentage error, MAE-Mean absolute error, RMSE-Root mean squared error, MCMC-STM-Markov chain Monte Carlo structural time-series model, VI-STM- Variational inference structural time-series model, ARIMA-Autoregressive integrated moving average, GARCH-Generalized autoregressive conditional heteroscedastic, LSTM-Long and short-term memory.
To comprehensively evaluate the forecasting precision of the variational STM in forecasting fuel sales, the experimental data were assessed with the ARIMA time-series model, the GARCH time-series model and an LSTM model. In this section, we also set the last month of 2018 as the forecasting period, and the data from other periods were used for training.
Different
To efficiently forecast fuel sales data online, a variational Bayesian structural time-series fuel forecast model is established considering the characteristics of the data, variational Bayesian inference theory and a structural time-series. The model is also used for the real-time forecasting of fuel sales time-series data. The model uses variational inference to process the change points. Compared with the traditional STM, this model greatly reduces the computing time. Furthermore, we design an efficient distributed platform of network architecture to support time-series fuel forecast model, more effectively forecasts fuel sales, has greater synergy with the forecast platform and satisfies the current demand for large-scale computing and online model training. Finally, it needs to be mentioned that variational Bayesian just get approximate results, through the experiment we found that VI-STM’s forecasting performance is unstable for different fuel terminal stations, particularly when time series includes much missing data and much outliers, it’s performance of prediction is poor, our next work will be focused on solving these problems.
