Abstract
Traffic flow prediction can not only improve the reasonability of the managers’ decision-making and road planning effectively, but also provide helpful suggestions for travelers to avoid traffic congestion. In order to further improve the prediction accuracy of traffic flow, this study presents one data driven hybrid model for short-term traffic flow prediction. This hybrid model firstly extracts the periodicity pattern from the traffic flow data, then, constructs the functionally weighted single-input-rule-modules connected fuzzy inference system (FWSIRM-FIS) for the residual data after removing the periodicity pattern from the original data, and finally, generates the final prediction results through integrating the periodicity pattern and the output from the FWSIRM-FIS model. The partial autocorrelation function (PACF) method is adopted to determine the optimal inputs for the data driven FWSIRM-FIS model, and the iterative least square method is utilized to train the parameters of the FWSIRM-FIS. Furthermore, three detailed experiments on traffic flow prediction are made, and comprehensive comparisons with three popular artificial intelligence methods are done to verify the effectiveness and advantages of the proposed hybrid model. According to five comparison indices, the proposed hybrid model can achieve the best prediction performance, although with much less fuzzy rules. The error histograms also verify that the proposed hybrid model has the smallest prediction errors comparing to the three comparative methods. The hybrid approach proposed in this study can also be extended to some other applications which have periodicity patterns, e.g. the traveling time estimate and the electricity load forecasting.
Keywords
Introduction
With the rapid development of the city, more and more urban problems exposed in the public’s view, such as the air pollution, the traffic congestion, etc [1–3]. To solve such problems, urban computing was presented as a new theory [4–6]. It is a novel comprehensive discipline that utilizes the data mining methods to handle and calculate the big data generated during urban development [4–6]. Urban computing plays an important role in improving the living standards and quality of the urban people. Lots of studies have been done in the research area of urban computing. For example, in [7, 8], the traffic congestion and the congestion propagation were well identified and analyzed. The road usage patterns that are important to understand the traffic characteristics were presented in [9]. And, in [10], the cooperative parallel particle filters based on-line model selection method was proposed and applied to urban mobility. As another important aspect of urban computing, traffic flow prediction can provide decision support to the officers and urban managers, and increase people’s traveling efficiency.
In the past several decades, variety of methods have been used to predict traffic flow, such as the statistical methods, the neural network (NN) method, the fuzzy inference systems (FIS), etc. The statistical methods apply the statistical regression of the historical traffic data to predict the future values of the traffic flow. In [11], a multivariate linear model was built by Dang et al. for traffic flow forecasting and compared with the autoregressive (AR) model and autoregressive moving average (ARMA) model to verify its effectiveness. In [12], seasonal ARIMA (SARIMA) model with the exponential smoothing strategy was applied to predict single-interval traffic flow in the urban freeways. In [13], it was shown that the AR model and the Back Propagation NN (BPNN) have different effects on traffic flow prediction at different time periods. In [14–17], several extended ARMA models were proposed and used in the urban traffic flow prediction.
Compared with the statistical methods, artificial intelligence techniques can give more reasonable response in uncertain and dynamical environments. In [18], the artificial NN was used to learn the characteristics of traffic flow at one major intersection in Istanbul for accurate traffic flow prediction. In [19], the BPNN, one of the most popular kinds of NNs, was applied to the short-term traffic flow prediction. In [20], the NN approach combined with Bayesian method was utilized for forecasting the short-term traffic flow in the freeways. In [21], the Kalman filter model and the NN model were combined by the fuzzy method as one hybrid intelligent model for the short-term traffic flow prediction. In [22], as a variant of the traditional support-vector regression (SVR), a supervised weighting-online learning SVR was given for the short-term traffic flow forecasting. Recently, deep learning methods have also been applied to this application, for example, a deep belief network (DBN) was examined by the traffic flow prediction in [23], and a stacked autoencoder model (SAE) was used to short-term traffic flow prediction in [24]. In addition to the NN methods, FISs also have perfect ability to process the nonlinear traffic flow data for accurate prediction. In [25], the Mamdani and Sugeno FISs were designed based on historical traffic flow data to realize the prediction. In [26], the fuzzy neural network (FNN) which combines the merits of both NN and FIS was applied to traffic flow prediction based on the collected data from on-road sensors. In [27], a urban street traffic flow prediction model was given through combining the FNN, the gate network (GN) and the expert network (EN). In [28], a pruned fast learning FIS (PFLFIS) was proposed for the data-driven prediction of the traffic flow. And, a type-2 FNN, in which type-2 fuzzy sets were adopted to replace the conventional type-1 fuzzy sets, was proposed to traffic flow prediction in [29].
The aforementioned researches are all data driven. However, many prediction models will become complex and be difficult to be constructed when the number of input variables becomes large [30–34], especially for fuzzy methods, whose fuzzy rule base faces the rule explosion problems [35]. In order to alleviate the design difficulty and reduce the number of fuzzy rules, Yi et al. [36, 37] proposed the single-input-rule-module (SIRM) connected FIS (SIRM-FIS). Since its appearance, the SIRM-FISs have found lots of applications, such as the stabilization control of the inverted pendulum systems [37], position control of the over-head crane system [38]. The SIRM-FIS firstly sets each input variable one SIRM which is one single-input-single-output FIS, and then aggregates the outputs of all the SIRMs by multiplying their corresponding importance degrees [39, 40]. In the traditional SIRM-FIS, the importance degrees of all SIRMs are crisp values. Generally, for modeling and prediction applications, the performances of traditional SIRM-FISs are limited due to their simple input-output mappings. Recently, in order to enhance the approximation ability of the SIRMs method, the crisp importance degrees (crisp weights) of all SIRMs were replaced by the functional weights [41]. This new kind of SIRM-FIS is named the functionally weighted SIRM-FIS (FWSIRM-FIS). It has proved that the FWSIRM-FIS has more powerful ability for modeling and prediction problems compared with the traditional SIRM-FISs. Simulation results in [41] also verified the approximation ability and design simplicity of the FWSIRM-FIS.
On the other aspect, in addition to the observed traffic flow data, some kinds of prior knowledge exist in the traffic flow. For example, the traffic flow in different weekdays have similar trends, i.e. the traffic flow in weekdays has the daily periodic pattern. If such kind of prior knowledge can be used, the constructed prediction models can be expected to have improved performance. Despite its importance, there are limited studies that consider periodic features in traffic prediction. In [42], the spectral analysis was used to extract the periodic pattern of the short-term traffic flow, while in [43], the Fourier series was given to model the periodic component. In [44–47], other kinds of prior knowledge, e.g. the monotonicity, the continuity and convexity have been encoded into the data driven models. Their results also verified the usefulness of the prior knowledge.
From the discussion above, it can be concluded that encoding prior knowledge into the data-driven model can achieve better performance. Thus, in order to further enhance the prediction accuracy, this study presents a hybrid model and apply it to the short-term traffic flow prediction. The main contributions and novelties of this study are listed as follows: A hybrid model is proposed for the traffic flow prediction. This hybrid model combines the periodicity knowledge model and the residual data driven model to generate more accurate predicted results. The residual data driven model is accomplished by the FWSIRM-FIS. As shown in [41], the FWSIRM-FIS is easy to design and train, and can reduce the number of fuzzy rules greatly. In this study, the residual data driven FWSIRM-FIS model is also trained by the iterative learning method proposed in [41]. The partial autocorrelation function (PACF) method is adopted to determine the optimal input variables for the data driven FWSIRM-FIS. Existing studies usually choose the input variables manually, while the PACF method provides us one effective way to automatically determine the input variables. Three detailed experiments are done. And comprehensive comparisons with three popular methods, such as the adaptive-network-based fuzzy inference system (ANFIS) [48, 49], the PFLFIS [28], and the BPNN [50, 51], are made. Experimental results demonstrated that the proposed hybrid model can give excellent predicted results with small errors, and can perform much better than the three comparative methods.
The rest of this paper is as follows: In Section 2, the FWSIRM-FIS will be introduced. In Section 3, the hybrid model will be presented. In Section 4, experiments will be done. Finally, the conclusions will be given in Section 5.
The FWSIRM-FIS
The FWSIRM-FIS was proposed in [41] to enhance the approximation ability of the traditional SIRM-FIS [36, 37] which can efficiently solve the fuzzy rule explosion problem. The number of fuzzy rules in the FWSIRM-FIS still increases linearly with respect to the number of the inputs. Thus, the FWSIRM-FIS can not only deal with the rule explosion phenomenon efficiently, but also have much better approximation performance than the traditional SIRM-FIS.
The FWSIRM-FIS with n input variables x1, x2,…, x n is composed of n SIRMs. Each SIRM can be seen as one single-input-single-output FIS [36, 37, 41]. In the FWSIRM-FIS, each SIRM is assigned with one functional weight. In the inference process of the FWSIRM-FIS, the outputs of all the SIRMs are firstly computed, and then, such output values are multiplied by their corresponding functional weights and combined to generate the final predicted result [36, 37, 41].
The SIRM for the input x i (i = 1, 2, ⋯, n) can be expressed as [41]
Choosing the singleton fuzzifier and the center-of-sets defuzzifier, the crisp fuzzy inference result of SIRM-i can be computed as
Suppose that the functional weight of SIRM-i is
Then, the final input-output mapping of the FWSIRM-FIS can be computed as [41]
For simplicity, we give some notations firstly. We respectively denote the parameters in the consequent parts of the fuzzy rules in all SIRMs and the parameters in the functional weights as the following vectors
Then, the input-output mapping of the FWSIRM-FIS can be rewritten in the vector form as [41]
In another way, the input-output mapping of the FWSIRM-FIS can also be calculated as [41]
In this section, the proposed hybrid model will be presented firstly. Then, how to generate the periodicity knowledge model will be discussed. At last, the training of the residual data driven FWSIRM-FIS will be given.
The Structure of the Hybrid Model
As previously discussed, the short-term traffic flow has the periodic characteristic. This periodicity knowledge can provide superior complementarity to the uncertainties of the observed data. Thus, in this paper, the hybrid model as shown in Fig. 1 is proposed to obtain better accuracy for the prediction of the short-term traffic flow.

The structure of the proposed hybrid model for traffic flow prediction.
This hybrid model combines the knowledge model (the model of the periodicity knowledge) and the residual data driven model (the model constructed by observed residual data using the FWSIRM-FIS). In more detail, we utilize the following steps to construct this hybrid model.
In this hybrid model, how to extract the traffic flow pattern and to design the residual data driven model are crucial issues to be solved. In the following subsections, we will give detailed discussions on these problems.
To begin, assume that the sampling data of the traffic flow have been collected for M weekdays, and in each day, T data points have been collected. As a result, the sampled time series of the traffic flow data can be written as a series of one dimensional vectors as
As well known, the traffic flow in weekdays has the daily-periodic characteristic. This kind of daily-periodic pattern can be extracted from the original time series as follows
Consequently, through removing this daily-periodic pattern, the residual time series
For simplicity, we denote this residual time series of the traffic flow as
In order to design a satisfactory data driven model by the FWSIRM-FIS, we should determine the fuzzy rules in all SIRMs and the parameters of all the functional weights. For the fuzzy rules, their antecedent fuzzy sets are usually generated by partitioning the input domains while their consequent parameters need to be tuned by learning algorithms. For the functional weights, all their parameters need to be optimized by learning algorithms. In conclusion, once the fuzzy sets are generated by fuzzy partitions, we still need to optimize the consequent parameters in fuzzy rules and the parameters of the functional weights. Below, we will give an iterative learning algorithm to tune such parameters.
Suppose that the traffic flow of the next sampling time can be affected by the traffic flows of the n sampling times before it. In other words, the data-driven model constructed by the residual data has n inputs and one output, which are respectively denoted as
Considering these N pairs of training data {
It is not an easy thing to determine
In this section, the data sets used for experiments will be given firstly. Then, comparison indices will be provided. Furthermore, in this section, detailed experiments on traffic flow prediction will be presented. At last, comprehensive analysis and discussions on the experimental results and comparisons with the ANFIS, BPNN and the PFLFIS will be made.
Applied Data Sets
In this study, the traffic flow data for experiments were downloaded from the PeMS (California Performance Measurement System) traffic flow dataset [24, 28]. In the PeMS project, the traffic flow data are collected every 30 s by the loop detectors and then sent to the computer workstation in the University of California, Berkeley. The traffic flow dataset used in this study was collected by the Detector 1006210 (NB 99 Milgeo Ave) which is located at north bound freeway SR99, Ripon city, California. This freeway has three lanes under surveillance [24, 28].
In this paper, we select the traffic flow data collected in the weekdays from October 1, 2009 to November 30, 2009 for training and testing. And, we take into account three experiments which are the 5-minute prediction experiment, the 10-minute prediction experiment, and the 15-minute prediction experiment, in which the collected data are respectively aggregated 5, 10 and 15 minutes interval each.
Comparison Indices
To make a quantitative comparison of the proposed hybrid model and the comparative methods, comparison indices are needed. This subsection will introduce five commonly used indices for our comparison.
The first kind of the comparison indices considers the root mean square error (RMSE), the mean of the absolute errors (MAE), and the average percentage error (APE), which can be computed respectively as
We also consider another kind of prediction performance measure including two statistical indices which are respectively the Pearson correlation coefficient and the coefficient of determination. These two indices are respectively denoted as r and R2, and can be calculated as
The index r ranges from -1 to 1, where 1 represents the total positive linear correlation, while -1 means total negative linear correlation. The index R2 ranges from 0 to 1. For both indices, larger values of r and R2 imply better forecasting performance of the prediction model.
Five-Minute Prediction
In this case, we consider the traffic flow prediction of 5 minutes interval. As mentioned in subsection 4.1, we totally have the traffic flow data of 39 weekdays. And, each day has 288 five-minute data. The data in the first 21 weekdays are chosen for training while the data in the last 18 weekdays are for testing. In other words, 21 * 288 = 6048 data points in the weekday traffic flow time series are for training and 18 * 288 = 5184 data points in the time series are for testing.
The initial training and testing data are demonstrated in Fig. 1(a). Then, the daily-periodic pattern is extracted by Equation (Average). The extracted daily-periodic pattern for this 5-minute experiment is shown in Fig. 1 (b). After removing the daily-periodic pattern, the residual time series is given in Fig. 1 (c).
The residual time series will be used to train the FWSIRM-FIS based data driven model. Before this, we need to determine the input variables for the FWSIRM-FIS, i.e. we should determine which values before sampling time t will affect the value at sampling time t. To realize this objective, we adopt the partial autocorrelation function (PACF) method [52] to obtain the partial autocorrelation between s R (t - k) and s R (t) where k = 1, 2, ⋯. The partial correlation coefficient of s R (t - k) with larger values will have greater influence on s R (t). Existing studies usually choose the input variables manually, while the PACF method provides us one effective way to automatically determine the input variables. It is more reasonable and convenient to utilize the proposed approach to realize the determination of the input variables compared with the manual way.
The PACF of the traffic flow time series for the 5-minute experiment with 100 time lags is shown in Fig. 1. When choosing the threshold to be 0.05, we can observe from Fig. 1 that there exist 8 time lags which have obvious influence on the value of s R (t). As a result, the determined optimal input variables with respect to s R (t) are x1 = s R (t - 1), x2 = s R (t - 2), x3 = s R (t - 3), x4 = s R (t - 4), x5 = s R (t - 5), x6 = s R (t - 6), x7 = s R (t - 7), x8 = s R (t - 8). Thus, there are 6040 input-output data pairs for training and 5176 input-output data pairs for testing.
After being trained, the predicted values of the proposed hybrid model for the testing data are demonstrated in Fig. 1. From this figure, we can observe that the proposed hybrid model can capture the characteristics of the traffic flow and can provide satisfactory performance.
For comparison, the five indices of the proposed hybrid model, the ANFIS, the PFLFIS and the BPNN are listed in Table 1. And, we also plot the prediction error histograms of the four predictors in Fig. 1. From both the table and figure, we can see that the hybrid model performs best compared to the ANFIS, the PFLFIS and the BPNN.
Comparisons of different methods in 5-minute experiment
Comparisons of different methods in 5-minute experiment
In this experiment, the traffic flow data of 5 minutes interval are aggregated into the 10 minutes interval. For the 39 weekdays, there are totally 39 * 144 = 5616 data points in the traffic flow time series. Again, we use the data in the first 21 days for training and in the last 18 days for testing. And, there are totally 144 * 21 = 3024 training data points and 144 * 18 = 2592 testing data points in the whole traffic flow time series. In this case, the training and testing data points of the initial time series are shown in Fig. 2 (a).

The training and testing data in the 5-minute (5 min) case: (a) the original data, (b) the extracted periodicity, (c) the residual data.
In this case, the periodic pattern extracted by (Average) is shown in Fig. 2 (b), and the residual time series data used for training and testing are plotted in Fig. 2 (c). Again, we use the PCAF [52] to determine the optimal input variables. The PACF of the traffic flow time series for this experiment with 100 time lags is shown in Fig. 2. From this figure, we can observe that the best input variables for predicting y = s R (t) are x1 = s R (t - 1), x2 = s R (t - 2), x3 = s R (t - 3), x4 = s R (t - 4), x5 = s R (t - 5), x6 = s R (t - 6), x7 = s R (t - 7), x8 = s R (t - 8), x8 = s R (t - 9), i.e. there are 9 input variables in this case. Therefore, there left 3015 input-output data pairs for training and 2583 input-output data pairs for testing.
For the testing data, the predicted results of the proposed hybrid model are demonstrated in Fig. 2. The comparison results of the four predictors are listed in Table 2. And, the prediction error histograms of the four predictors in this experiment are plotted in Fig. 2. From the figures and table, we can observe that the proposed hybrid model has the best performance again. Detailed comparisons and analysis will be given in Subsection 4.4.
Comparisons of different methods in 10-min experiment
This prediction experiment is trained and tested by the traffic flow time series of 15-minute data aggregated from the 5-minute data of 39 weekdays. Thus, in this case, there exist 2016 data points in the traffic flow time series for training and 1728 data points for testing. These original training and testing data of the time series are shown in Fig. 3 (a), and then, the periodic pattern of this time series is extracted and depicted in Fig. 3 (b). After removing the periodic pattern, the residual time series are shown in Fig. 3 (c) which will be used to train the four prediction models.

The PCAF of the 5-min experiment with 100 time lags.
Again, the PCAF of this 15-minute experiment is computed within 100 time lags and demonstrated in Fig. 3. From this figure, we can conclude that, in order to predict y = s R (t), the optimal input variables should be are x1 = s R (t - 1), x2 = s R (t - 2), x3 = s R (t - 3), x4 = s R (t - 4), x5 = s R (t - 5), x6 = s R (t - 6). Consequently, in this experiment, there are 2010 input-output data pairs for training and 1722 input-output data pairs for testing.
For the testing data, the comparison results of the four prediction models are listed in Table 3, and the prediction results of the hybrid model are shown in Fig. 3. Again, in order to better display the prediction errors of the four prediction models, the prediction error histograms of the four predictors are demonstrated in Fig. 3. The proposed hybrid model is still the best one in this case.
Comparisons of different methods in 15-min experiment
Through considering the comparison results in the three experiments, we can make the following conclusions.
For the RMSE, MAE and APE indices, smaller values correspond to better prediction performance. From Tables 1–3, we can observe that the proposed hybrid model performs best according to the RMSE, MAE, and APE. In experiment one, the accuracy of the proposed hybrid model according to these three indices can improve at least 7%, 4% and 11% respectively compared with the ANFIS, the PFLFIS and the BPNN. And, in the second and third experiments, the improvement proportions can achieve about 40%, 40% and 45% respectively with respect to these three indices.
For the indices r and R2, the larger their values are, the better the prediction performance will be. The results in Tables 1–3 also verified the performance and the advantages of the proposed hybrid model.
The error histograms in Figs. 5, 9 and 13 can reflect the error distributions of the four predictors. The less the centers of the error histograms deviate from zero, the better the prediction performance will be. From Figs. 5, 9 and 13, we can also see that the proposed hybrid model performs best and give the smallest prediction errors in the three experiments.
As we have mentioned previously, the FWSIRM-FIS can reduce the number of fuzzy rules. In the three experiments, the numbers of fuzzy rules or neural nodes of the four predictors are listed in Table 4. The results in this table again verify the ability of FWSIRM-FIS on reducing the number of fuzzy rules and alleviating the design difficulty.
The number of fuzzy rules or neural nodes
The number of fuzzy rules or neural nodes

Prediction results of the proposed hybrid model in the 5-min experiment.

Prediction error histograms of the four predictors in the 5-min experiment: (a) the proposed hybrid model, (b) ANFIS, (c) PELFIS, and (d) BPNN.

The data in the 10-min case: (a) original data, (b) extracted periodicity, (c) the residual data.

The PCAF of the 10-min experiment with 100 time lags.

Prediction results of the proposed hybrid model in the 10-min experiment.

Prediction error histograms of the four predictors in the 10-min experiment: (a) the proposed hybrid model, (b) ANFIS, (c) PELFIS, and (d) BPNN.

The data in the 15-min case: (a) original data, (b) extracted periodicity, (c) the residual data.

The PCAF of the 15-min experiment with 100 time lags.

Prediction results of the proposed hybrid model in the 15-min experiment.

Prediction error histograms of the four predictors in the 15-minute experiment: (a) the proposed hybrid model, (b) ANFIS, (c) PELFIS, and (d) BPNN.
This paper combined the knowledge model and the residual data driven FWSIRM-FIS model for the accurate prediction of short-term traffic flow. In this proposed hybrid model, the knowledge model was constructed by the periodicity pattern extracted from the traffic flow time series, while the data driven FWSIRM-FIS model was designed by the residual data through removing the periodicity pattern from the time series. In addition, the partial autocorrelation method was presented to determine the optimal inputs for the data driven FWSIRM-FIS model. And, detailed experiments have demonstrated the effectiveness and advantages of the proposed hybrid model, through comparing it with three popular methods.
On the other aspects, in the past few decades, many mathematic models have been built for forecasting traffic flow. Such mathematic models provide us the mechanism of the traffic flow and can be the supplement of the data driven model. In our future study, we will try to combine the mathematic models with the data driven models to further improve the prediction accuracy.
Footnotes
Acknowledgments
This work is supported by National Natural Science Foundation of China (61473176, 61573225), the Taishan Scholar Project of Shandong Province, and the Colleges and Universities Independent Innovation Program of Jinan City (201303008).
