Support Vector Machine for Short-Term Traffic Flow Prediction and Improvement of Its Model Training using Nearest Neighbor Approach

Abstract

Short-term prediction of traffic flow is essential for the deployment of intelligent transportation systems. In this paper we present an efficient method for short-term traffic flow prediction using a Support Vector Machine (SVM) in comparison with baseline methods, including the historical average, the Current Time Based, and the Double Exponential Smoothing predictors. To demonstrate the efficiency and accuracy of the SVM method, we used one-month time-series traffic flow data on a segment of the Pan Island Expressway in Singapore for training and testing the model. The results show that the SVM method significantly outperforms the baseline methods for most prediction intervals, and under various traffic conditions, for the rolling horizon of 30 min. In investigating the effect of the input-data dimension on prediction accuracy, we found that the rolling horizon has a clear effect on the SVM’s prediction accuracy: for the rolling horizon of 30–60 min, the longer the rolling horizon, the more accurate the SVM prediction is. To look for a solution for improvement of the SVM’s training performance, we investigate the application of k-Nearest Neighbor method for SVM training using both actual data and simulated incident data. The results show that the k- Nearest Neighbor method facilitates a substantial reduction of SVM training size to accelerate the training without compromising predictive performance.

Short-term prediction of traffic flow is crucial for Advanced Travelers Information Systems (ATIS) and Advanced Trafﬁc Management Systems (ATMS). Reliable traffic flow forecasting is important for developing real-time traffic control systems. For these reasons, timely and accurate traffic flow prediction has been a subject of intensive research for a long time. Traffic flow prediction approaches so far can be broadly classified into the time-series analysis approach and the machine learning approach ( 1 ). Techniques associated with the time-series analysis approach cover a wide range, from historical average ( 2 ) to Kalman Filtering ( 3 – 6 ), Exponential Smoothing ( 7 , 8 ), and Autoregressive Moving Average (ARIMA) models ( 7 , 9 – 11 ). Techniques associated with the machine learning approach include k-Nearest Neighbor ( 12 – 15 ), Artiﬁcial Neural Network ( 16 – 18 ), and Support Vector Machine (SVM) ( 19 – 21 ).

Although a considerable number of traffic flow prediction methods have been deployed, each of them has certain advantages and limitations. The time-series analysis approach is simple and easy to understand, but the models following this approach typically only capture linear relationships. Consequently their performance deteriorates under non-linear conditions ( 22 , 23 ). For example, the ARIMA model cannot capture traffic dynamics given its underlying assumption that the time-series data are stationary with unchanged average ( 22 , 24 ). The ARIMA’s extension, the seasonal ARIMA, model can address dynamics of traffic conditions, but the outlier detection and parameter estimation of the seasonal ARIMA models are time-consuming ( 25 ).

The second category, the Machine Learning approach, is able to deal with complex non-linear data and is thus widely used in traffic flow forecasting. The Artiﬁcial Neural Network (ANN) method ( 26 – 28 ) is naturally a methodological candidate for forecasting with multiple inputs and outputs. The ANN model, with its parallel structure and learning capability, is suitable for solving complex problems like prediction of traffic parameters ( 29 ). However, the method can be prone to poor generalization and performance on previous unseen data during the test phase ( 30 ). In addition, the ANN model requires a large amount of data for training, so its prediction accuracy is subject to the sample size.

Support Vector Machines (SVM), introduced by ( 31 ), is a family of machine learning algorithms. SVM possesses a good generalization capability and computational efficiency, and is very robust in high dimensions. SVM can effectively address the shortcomings of ANN by using not only the minimal risk strategy to train, but also the structure risk minimization strategy to minimize the upper bound of the error ( 30 ). Through the application of the Kernel function, SVM can map a non-linear problem in the low dimensional input space to a linear problem in the high-dimensional feature space ( 23 ). Because of its ability to outperform most other learning algorithms, it has been successfully applied to many applications such as biology ( 32 ) and financial time-series analysis ( 33 ). In traffic engineering, SVM has been widely used in many domains, such as traffic incident detection ( 34 ), traffic safety ( 35 , 36 ), and travel time prediction ( 29 , 37 , 38 ).

In the area of traffic flow prediction SVM has received a large body of research work: ( 39 ) proposed an accurate multi-steps traffic flow prediction model based on SVM in which the input vectors were comprised of actual traffic volume and different types of input vectors, including combinations of the space-time data and the historical pattern data. The test results showed that the proposed SVM model had a mean relative error of 12.8%; ( 40 ) introduced a hybrid PSO-SVR forecasting method which uses particle swarm optimization (PSO) to search optimal support vector regression (SVR) parameters to get a higher precision with less learning time. The results of extensive comparison experiments indicated that the proposed model can get better forecasting accuracy than other comparative algorithms, including the conventional SVM, ARIMA, and Back Propagation Neural Network (BPNN); ( 41 ) proposed a hybrid optimization algorithm which combines PSO with a genetic algorithm to search the optimal parameters of a least square support vector machine (LSSVM). The experimental results showed that the hybrid-LSSVM model yields better prediction ability and relatively high computational efﬁciency compared with non-heuristic and heuristic algorithms; and ( 42 ) proposed a traffic flow prediction model based on LSSVM which automatically determines the LSSVM model with two parameters in the appropriate value by Fruit Fly Optimization Algorithm (FOA). The experiment results showed that the LSSVM combined with FOA (LSSVM-FOA) has obvious advantages in traffic flow forecasting accuracy: the LSSVM-FOA model achieves the global optimum quickly and provides better accuracy than the single LSSVM model, RBF neural network (RBFNN), and LSSVM combined with particle swarm optimization algorithm (LSSVM-PSO). Other literatures on the application of SVM techniques for traffic flow prediction can be seen in ( 21 , 23 , 43 ).

In reviewing the literature, we found that a majority of previous research effort had focused on deployment of optimization algorithms for parameters selection for the SVR model such as GA-LSSVM, PSO-LSSVM, and FOA-LSSVM. The works used relatively short prediction intervals and relatively small amounts of data for SVM training. Little is known of the research on critical concerns such as the effect of rolling horizon (input vector dimension) on prediction accuracy, how the SVM’s prediction performance changes in a wider range of prediction intervals, how the similarity of patterns between training and testing sets affect the SVM’s prediction accuracy, and what can be done—apart from optimal parameter selection—to improve the SVM’s prediction performance? Some of these concerns can be explored using simpler tools such as the k-Nearest Neighbor algorithm.

The k-Nearest Neighbor (k-NN) algorithm is a non-parametric technique that considers distances between members in the learning sets and the target data by Euclidean distance ( 12 , 14 , 44 ). In estimating the predicted data, the k-NN algorithm does not need all samples but only the nearest data to obtain the results more quickly, so it is suitable for applications that require short computational time such as traffic flow forecasting ( 13 ). The k-NN algorithm has been widely used for traffic flow forecasting: ( 13 ) proposed two-tier k-NN for traffic flow forecasting and found that the algorithm can meet real-time requirements for accuracy and reliability of short-term traffic flow forecasting; ( 12 ) applied k-NN method to form the training dataset for local linear wavelet neural network (LLWNN) in a kNN-LLWNN model for the online short-term prediction of traffic volumes. The results show that kNN-LLWNN performs comparably with LLWNN and SVM, while its running time is much lower than LLWNN and SVM; ( 45 ) used a hybrid prediction model (kNN-SVM) for short-term traffic flow forecasting and found that the forecasting accuracy of the kNN-SVM model is better than other traditional prediction models, including the k-NN, SVR, and neural networks specifically.

In this paper we present an investigation into application of the SVM model for short-term time-series traffic flow prediction and application of the k-NN method for improvement of SVM training. We evaluate the overall performance of SVM prediction using field data collected from a segment of the Pan Island Expressway of Singapore over a wide range of prediction intervals. We investigate the effect of rolling horizon (input vector dimension) on prediction accuracy. We explore the effects of pattern similarity on prediction performance using the k-NN method under various traffic flow conditions.

The remainder of this paper is structured as follows: the next section introduces essential theoretical background of SVM and SVR algorithms that is relevant for traffic flow forecasting. We then explain the basic notions used in this study: the SVM model and baseline predictors, and the data and performance measures. This is followed by a presentation of the prediction results by the SVM method in comparison to the baseline predictors and a discussion of the effect of rolling horizon on prediction performance. In particular, in this section we present the effect of the k-NN method on SVM training efficiency using both actual and simulated data, and address some issues for online applications. Finally, we summarize essential conclusions and findings from this research.

Support Vector Machines

SVM is one of the supervised learning algorithms in the field of machine learning. It is widely used for classification and regression problems. The basic idea of SVM is to find a hyper plane in the feature space so that all classified data are farthest from the plane ( 30 ). The SVM theory for classification problems can be extended to a non-linear regression problem (SVR) for traffic flow forecasting. In the following paragraphs, we introduce briefly the essence of the SVR theoretical background in light of those provided by ( 30 ) and ( 46 ), with reference to ( 23 ).

Given a group of traffic flow data in a training dataset of a specific location:

S = {({\vec{x}}_{i}, y_{i}) \in ℜ^{m} \times ℜ, i = 1, . . ., N}

(1)

where

the inputs ${\vec{x}}_{i}$ are m-dimensional vectors,

${\vec{x}}_{i} \in ℜ^{m}$ with $x_{i}$ being the traffic flow of the ith time interval,

the responses $y_{i}$ are continuous values, and

N is the number of training data.

The prediction of the traffic flow of the i time interval is denoted as $y^{*} (x_{i})$ , so $y_{i}$ is the ground-truth of $y^{*} (x_{i}) .$

The basic idea of SVR is to map the data input $\vec{x}$ into a high-dimensional feature space through a non-linear mapping function $φ$ . The relationship between $x_{i}$ and $y_{i}$ in the high-dimensional space is defined as:

f (\vec{x}, \vec{w}) = w^{T} φ (\vec{x}) + b

(2)

where

the functions $φ (\vec{x})$ are called features,

$\vec{w}$ is the weight vector and is the subject of learning, and

b is a bias value.

The coefficients w and b are obtained by minimizing the objective function:

R_{emp} = \frac{1}{2} w^{2} + C \frac{1}{N} \sum_{i = 1}^{N} L_{ε} (y_{i}, f (x_{i}))

(3)

where C is a constant and $L_{ε} (y_{i}, f (x_{i})$ is known as $ε$ -insensitive loss function, which is defined as:

L_{ε} = {\begin{matrix} | f (x) - y | - ε, | f (x) - y | \geq ε \\ 0, otherwise . \end{matrix}

(4)

The $ε$ -insensitive loss function is used to find the optimum hyper plane by minimizing the training error:

min \frac{1}{2} ‖ w^{2} ‖ + C \sum_{i = 1}^{N} ({ξ_{i}}^{-} + {ξ_{i}}^{+})

(5)

s . t . y_{i} - f (x_{i}) \leq ε + {ξ_{i}}^{+}

f (x_{i}) - y_{i} \leq ε + {ξ_{i}}^{-}

{ξ_{i}}^{+} \geq 0, {ξ_{i}}^{-} \geq 0, i = 1, 2, . . N .

where C is a pre-specified value and $ξ^{+}, ξ^{-}$ are slack variables representing lower and upper constraints on the system outputs.

The Lagrange multipliers $λ_{i}, λ_{i}^{*}$ are found by solving the dual constraint problem. The solution is given by:

f (x) = \sum_{i = 1}^{N} ({\bar{λ}}_{i} - {\bar{λ}}_{i}^{*}) K (x_{i}, x) + \bar{b}

(6)

where

\bar{b} = - \frac{1}{2} \sum_{i = 1}^{N} (λ_{i} - {λ^{*}}_{i}) (K (x_{i}, x_{r}) + K (x_{i}, x_{s}))

(7)

The equality constraints may be dropped if the Kernel contains a bias term b that has been considered in the Kernel function. In this case, the regression function is reduced to:

f (x) = \sum_{i = 1}^{N} ({\bar{λ}}_{i} - {\bar{λ}}_{i}^{*}) K (x_{i}, x)

(8)

The Kernel function $K (\vec{x}, {\vec{x}}^{'}) = ϕ (\vec{x}) • ϕ ({\vec{x}}^{'})$ is introduced to implicitly map the input data into the feature space. There are some popular kernels, including Polynomial Kernels, Radial Basic Function, and Sigmoid Kernel ( 46 ). It is difficult to determine the type of Kernel function for specific data. However, the Radial Basis Function (RBF) is considered the most efﬁcient since it requires only few parameters but it has powerful non-linear learning ability ( 41 ). Thus, we use the RBF as the Kernel function in this study.

The RBF Kernel function has the form:

K (\vec{x}, {\vec{x}}^{'}) = \exp (- \frac{‖ \vec{x} - {x^{\vec{'}}}^{2} ‖}{2 γ^{2}})

(9)

where $γ$ is a Kernel parameter.

Methodology

Basic Notations

The basic notations used in this study include:

Prediction interval $(Δ)$ : the look-ahead time window from the current time, at the end of which prediction is made. This study explores a wide spectrum of prediction intervals: $Δ$ = 5, 10, 15, 20, 25, 30, and 60 min.

Rolling horizon ( $Ω$ ): the look-back time from the current time, within which data are used for prediction. In the first stage of this experiment, $Ω$ = 30 min is used. In the study the effect of rolling horizon on prediction accuracy, $Ω$ is subsequently extended to 45 and 60 min. We denote $Ω \to Δ$ as a prediction with rolling horizon $Ω$ and prediction interval $Δ$ , for example $30 \to 5, 30 \to 10$ , and so forth.

Rolling step $(δ)$ : the time interval on which the time horizon is forwarded to the next prediction stage. The rolling step should be selected in correspondence with the granularity of the collected data.

Time-series prediction: from the definitions of $Δ$ and $Ω$ , we generalize the time-series traffic volume prediction (Figure 1) as follow: given the time window T, be it 24 h or any time span, in which prediction is considered, decomposed into sub-windows $Ω s$ . Let $δ$ denote the length of a time interval, V(t) denote traffic volume at time t, and n denote the number of intervals in $Ω$ . The objective of the time-series prediction is to find a function $f : R^{n + 1} \to R$ such that:

f (V (t - Ω), . . ., V (t - δ), V (t)) = V (t + Δ); \forall t \in {0, T - Ω} .

Figure 1.

The rolling horizon approach for time-series traffic flow prediction.

Time-Series Traffic Flow Prediction: Mode of Operation

This study uses the rolling horizon approach provided by ( 47 ) for short-term traffic volume prediction. In the rolling horizon approach (Figure 1) the time window (T) is divided into several stages, each consisting of short intervals known as data granularity. In each stage, data in the rolling horizon $Ω$ are available and are used for prediction of traffic volume at time interval $Δ$ . The prediction interval $Δ$ may be as short as few minutes to as long as 60 min.

The predicted data (in stage n) can be used for traffic control or assignment applications, and then the time horizon is forwarded by the rolling step $(δ)$ to the next stage. The rolling step is selected in correspondence with the granularity of the collected data; it equals 5 min in this study. The prediction module is linked with the historical database so that data can be continually retrieved for its operation. Thus in stage n+1 the rolling horizon is updated with new retrieved data from the traffic surveillance system. The cycle is repeated until the operation is over.

The SVM Prediction Model

In this study, the SVR is deployed by using the Python programming language and Keras library. To avoid numerical difficulties during the calculation, the prediction starts with data scaling in both training and testing sets from large numeric ranges into smaller numeric ranges of $[+ 1, - 1]$ . As explained previously, we propose the RBF Kernel for model training since it has less numerical difficulties but is efficient for traffic flow prediction ( 23 , 41 , 48 , 49 ).

Parameter selection: For the use of RBF Kernel, first we determine the model parameters (C and $γ$ ); C is a penalty parameter which is assigned before SVM model training. The parameter strikes for a balance between margin maximization and error minimization ( 39 ); $γ$ is the parameter of RBF Kernel function. The parameters are calibrated using a grid-search method introduced in LIBSVM ( 48 ); during the search process, all possible combinations of parameters (C, $γ$ ) are tested. For each parameter setting, LIBSVM obtains cross-validation (CV) accuracy. Finally, the parameters that produce the highest CV accuracy are selected. We do not consider more advanced parameter selection methods because for only two parameters the number of grid points is not too large. Furthermore, because SVM problems associated with different parameters (C, $γ$ ) are independent, LIBSVM allows the problems to run in a parallel.

In this experiment, in the search space, the parameter $γ$ subsequently received values of 0.001, 0.01, 0.1, 1, 10, and 100, while the values of C changed subsequently from 1, 10, 100, and 1,000 respectively. For each combination of parameters (C, $γ$ ), we used 10-fold CV to avoid overfitting: the dataset was partitioned into 10 subsets of equal size. Sequentially, one subset was used to validate the model using the classifier trained on the nine remaining subsets. The process was repeated 10 times and an average error was obtained across all 10 trials. Finally, the best parameters $C^{*}$ = 10 and $γ^{*}$ = 1 that provide “optimal” CV accuracy of 93.77% were selected. The best parameters ( $C^{*}$ and $γ^{*}$ ) were used to train the whole training set and were subsequently used to predict the test sets.

For SVM training, the empirical NN method (see sub-section underneath) can be used to locate the most similar traffic patterns from the historical database (HDB) within the desirable rolling horizon. We denote the volume predicted by the SVM method at time $t + Δ$ as $V_{SVM}^{p} (t + Δ)$ . Since the SVR is indeed the extension of SVM for regression problems, in this research the terms SVM and SVR are used interchangeably.

Baseline Predictors

Historical Mean Predictor (HMP)

The historical profiling is simply the series of historical mean volumes in successive intervals. The use of the historical average rests on the method’s simplicity, and on the observation that there exist high correlation coefficients of traffic counts among weekdays in the segment ( 50 ). Since the test set basically includes data on weekdays, the historical average is calculated from traffic volumes only on weekdays in $S_{1}^{L}$ . Let $S_{1 - H}^{L}$ denote the learning data for historical profiling method, $D_{H}$ denotes the number of days in $S_{1 - H}^{L}$ , and $\bar{V_{H}^{p}} (t + Δ | S_{1 - H}^{L}) \bar{V_{H}^{p}} (t + Δ | S_{1 - H}^{L})$ denotes the historical volume at time $t + Δ$ , given $S_{1 - H}^{L}$ .

\bar{V_{H}^{p}} (t + Δ | S_{1 - H}^{L}) = \frac{1}{D_{H}} \sum_{d \in S_{1 - H}^{L}} V (t + Δ, d)

(10)

In the first stage of this experiment $S_{1 - H}^{L}$ = $S_{1}^{L}$ , that is the learning (training) data for the experiment (see Section “Data” below).

Current Time-Based Predictor (CTP)

The current time-based predictor predicts traffic volumes by projection of traffic volume at the current time t. Let $V_{R}^{p} (t + Δ)$ denote the predicted volume at time $t + Δ$ , it follows:

V_{R}^{p} (t + Δ) = V (t)

(11)

Since the CTP simply projects the current traffic volume to the future flow, it is likely that the CTP incurs high errors for long prediction intervals, especially under dynamic conditions.

Double Exponential Smoothing Predictor (ESP)

The exponential smoothing method weights past observations using exponentially decreasing weights: recent observations are given higher weights in forecasting than the older observations ( 51 ).

V_{SM}^{p} (t + 1) = 2 S' (t) - S ″ (t) + (\frac{α}{1 - α}) (S' (t) - S ″ (t))

(12)

S' (t) = α V (t) + (1 - α) S' (t - 1)

(13)

S ″ (t) = α S' (t) + (1 - α) S ″ (t - 1)

(14)

where

$V_{SM}^{p} (t + 1)$ = predicted volume at the next interval,

$V (t)$ = volume at the current time t,

$S' (t)$ = single exponential smooth value at time t,

$S ″ (t)$ = double exponential smooth at time t, and

$α$ = smoothing coefficient, 0< $α$ <1.

Note that in Equations 12–14, t indicates the index of the current interval, not the calendar time as in the other predictors. For a prediction interval, $Δ$ equals m intervals, the prediction is made recursively m times, starting from t+1 to t+m.

Data

The data used for prediction involves field traffic volumes obtained on a four-lane segment (ID.80007766) with the length of 400 m, between Adam and Kheam Hock roads, along the Pan Island Expressway (Figure 2). The data are collected via traffic detectors and aggregated for the whole direction every 5 min, retrieved and stored in a HDB. Data in a day associates with 288 intervals of 5 min. From the HDB, data on traffic volume for the whole month of October 2013 was selected and screened for training and testing.

Figure 2.

The study segment in the Singapore expressway system.

The experiment involves employing several datasets. Let $S_{1}$ denote the first set of traffic volumes for the whole month, exclusively classified into the learning (training) data $S_{1}^{L}$ and test set $S_{1}^{T}$ . $S_{1}^{L}$ includes 24-h traffic volumes for 21 days from 1st to 26th (days with bad data are excluded), and $S_{1}^{T}$ includes five separate test sets, one for each day, from October 27 to 31.

S_{1} = S_{1}^{L} \cup S_{1}^{T}

S_{1}^{L} = {{\vec{V}}_{l}, l = 1, . . ., 26}

S_{1}^{T} = {{{\vec{V}}_{t}}, 27 \leq t \leq 31}

Performance Measures

Mean Absolute Percentage of Error (MAPE) is the primary statistic used as key performance measure. Let $V^{a} (t + Δ)$ denote the actual volume, and $V_{(;)}^{p} (t + Δ)$ denote the predicted volume at time $t + Δ$ , being $V_{SVM}^{p} (t + Δ)$ , $\bar{V_{H}^{p}} (t + Δ)$ , $V_{R}^{p} (t + Δ)$ or $V_{SM}^{p} (t + Δ) .$ By definition:

MAPE = \frac{1}{N} \sum_{i = 1}^{N} \frac{| V^{a} (t + Δ) - V_{(;)}^{p} (t + Δ) |}{V^{a} (t + Δ)} \times 100

(15)

where N is the number of observations. MAPE is calculated for each day separately, and the parameter of interest is the mean of the MAPE (or $\bar{MAPE}$ ) for all days in test sets. The notes MAPE / MAPEs used in the following sections are actually the $\bar{MAPE}$ values.

Results

Overall Predictive Performance

The prediction with training set $S_{1}^{L}$ uses the rolling horizon $Ω$ = 30 min to predict volumes for prediction intervals $Δ$ = 5, 10, 15, 20, 25, 30, and 60 min, respectively. Figure 3 presents the prediction errors from different methods with $S_{1}^{L}$ .

Figure 3.

Errors of the overall prediction by different methods ( $Ω$ = 30 min).

Figure 3 shows that the predictive performances by all predictors (except HMP) deteriorate as $Δ$ increases. This may be because the longer the prediction interval, the less relevant the current volume is in reflecting the future condition. The predictability, therefore, reduces with accumulated errors. In general the SVM performance is significantly better than the baseline predictors, excepting the prediction with $Δ$ = 60 min where its MAPEs are greater than that of the HMP. The MAPEs by SVM for $Δ s$ within 15 min are less than 10%. The CTP works reasonably well for small $Δ$ , but deteriorates sharply as $Δ$ increases. Surprisingly, the ESP predicted errors are acceptable only for $Δ$ less than 10 min, and worse than the HMP as $Δ$ is greater than 25 min. The HMP is the worst performer, whose outperformance can only be recognized when Δ >30 min.

The SVM offers excellent performance for small $Δ$ values. In Figure 4, the predicted values by SVM with prediction $30 \to 5$ are plotted against the actual values. The figure shows that the SVM predicted pattern and the actual pattern are almost indistinguishable.

Figure 4.

Support vector machine ( $30 \to 5$ ) predicted profile in comparison with actual data.

Table 1 shows the predicted errors by SVM method for individual test sets. The errors are bounded in a small range, especially for $Δ$ <25 min. Unlike the test days October 28 to 31 that are working days, Sunday October 27 was an exception with traffic patterns varied more widely, and the MAPEs for this day are noticeably higher than the others.

Table 1.

Mean Absolute Percentage of Error (%) by Support Vector Machine ( $Ω$ = 30 min) for Different Test Days

$Δ$ (min)	27 Oct.	28 Oct.	29 Oct.	30 Oct.	31 Oct.	Average
5	2.45	2.39	2.37	2.49	2.24	2.39
10	5.29	4.75	5.24	4.93	4.79	5.00
15	9.18	9.05	8.86	8.44	9.85	9.08
20	12.24	10.09	11.37	10.88	10.14	10.94
25	13.93	11.11	12.67	11.98	11.43	12.23
30	15.62	12.30	13.87	12.61	12.39	13.36
60	22.95	17.89	20.15	17.95	17.79	19.34

Effect of Rolling Horizon

To explore how SVM prediction accuracies change with $Ω$ , we return to the first dataset $S_{1}^{L}$ and $S_{1}^{T}$ : the prediction made by SVM is conducted for the same set of $Δ =$ 5, 10, 15, 20, 25, 30, and 60 min, while $Ω$ subsequently increases from 30 to 45 and 60 min.

Figure 5 plots the prediction errors for three values of $Ω s$ over different values of $Δ s$ . The figure shows the monotonous uptrend of MAPEs as $Δ$ increases. The rates of change of MAPEs are higher for $Δ$ <15 min, decrease slightly for the range $15 \leq Δ \leq 30$ , then rise rapidly as $Δ$ increases from 30 to 60 min. Importantly, MAPEs decrease as $Ω$ increases. The improvement in accuracy with $Ω$ is marginal for small $Δ$ , but increases considerably for $Δ$ = 30 and 60 min (most beneficial for $Δ$ = $Ω$ ). This finding is presented in an opposite way in ( 52 ) who studies the effect of rolling horizon to relative travel time prediction error with a non-linear time-series model (for the rolling horizon less than 30 min); the authors found that MAPEs tended to rise with $Ω$ : “The longer horizon may have adverse impact on prediction performance.” A close look at the SVM theory may help explain why SVM prediction is improved with higher dimensions. SVM is a learning technique with the theoretical foundation of statistical learning theory and structural risk minimization. The prediction error by SVM method, known as generalization error $E_{gen}$ , includes approximation error $E_{app}$ and estimation error $E_{est}$ ( 30 ).

E_{gen} = E_{app} + E_{est}

(16)

Figure 5.

Effect of rolling horizon on prediction accuracy.

The magnitudes of the two errors depend on the model complexity. A simple model (small $Ω$ ) may not have enough representational power. It may, therefore, be biased and result in high $E_{app}$ . However, such a model will be more robust and less dependent on training data, thus it has lower $E_{est}$ . By contrast, a complex model (large $Ω$ ) has a higher power to classify data accurately. It has, therefore, lower $E_{app}$ but compromises higher $E_{est}$ . Thus, there is a tradeoff between the two error components. In this experiment, when $Ω$ increases from 30 to 45 and 60 min, the model dimension increases from 7 to 10 and 13 accordingly, the decrease in $E_{app}$ surpasses the increase in $E_{est}$ , giving a rise in prediction accuracy. Given this tradeoff, there may be an optimal dimension ( $Ω^{*}$ ) for a given training set.

Nearest Neighbor Method for SVM Training (NN-SVM)

SVM provided good performances in various sets of $Ω$ and $Δ$ , in various traffic conditions. However, the training size is large (> 5,000 instances for $S_{1}^{L}$ and $S_{2}^{L}$ ), requiring a long training time (up to several minutes). An important question is how to improve the training to serve for the real-time and online application? A possible solution is that the data are trained offline, and then the trained sample is used for online prediction. This sounds attractive, but the offline-trained prediction may have a poor online performance if the training set has only few patterns similar to real-time conditions. An essential requirement is that for online application the prediction should be flexible to respond promptly and effectively to the current situation. Motivated by this, we propose introducing a Nearest Neighbor method (k-NN algorithm) to improve the training task. The procedure presented below is for offline training. We address some issues concerning online training in the subsequent discussion.

The k-Nearest Neighbor (k-NN) algorithm is a non-parametric technique that makes use of a database to search for data that are similar to the test data. The k-NN considers distances between members in the learning sets and the test data using Euclidean distance. In this study, we consider the relative distance between each instance in the learning data with the corresponding instance in the test data. Given a learning set, the Nearest Neighbor seeks to find a day that has the most similar patterns to the test day. Let $d^{L}$ denote a day in the learning set, $d^{T}$ denote a day in the test set, $V (t, d^{L})$ denote volume at time t in day $d^{L}$ , $V (t, d^{T})$ denote volume at time t in day $d^{T}$ , and $D (d^{T}, d^{L})$ denote the average distance between $d^{T}$ and $d^{L}$ . We define the distance $D (d^{T}, d^{L})$ as follows:

D (d^{T}, d^{L}) = \frac{1}{N_{d^{T}}} \sum_{\begin{matrix} | d^{T} \in S_{1}^{T} | \\ | d^{L} \in S_{1}^{L} | \end{matrix}} \frac{| V (t, d^{T}) - V (t, d^{L}) |}{V (t, d^{T})}

(17)

where $N_{d^{T}}$ denotes the number of time intervals in day $d^{T}$ .

Given the calculated results, the distances are sorted from small to large, so are the historical data according to the distance. Finally, k-nearest neighbors are found based on the selection criteria. In this experiment, we aim to obtain the five closest matches and the five farthest matches as follow: we use the aggregated data for the whole month of October, named as dataset $S_{3}$ , exclusively decomposed into learning set and test sets in the same way as $S_{1}$ : the learning data $S_{3}^{L}$ includes data from October 1 to 26, and the test data $S_{3}^{T}$ includes five test sets for 5 days, from October 27 to 31. Let $S_{3 - all}^{L}$ denote the training set that includes the whole training data, $S_{3 - 5 NN}^{L}$ denote the five closest matches, and $S_{3 - 5 FN}^{L}$ denote the five farthest matches in $S_{3}^{L}$ :

S_{3 - all}^{L} \equiv S_{3}^{L}; S_{3 - 5 NN}^{L} \in S_{3}^{L}; S_{3 - 5 FN}^{L} \in S_{3}^{L} .

The three learning sets were trained and tested with the same test sets in $S_{3}^{T}$ . Results are presented in Figure 6.

Figure 6.

Mean absolute percentage of error (MAPE) of nearest neighbor-support vector machine with $60 \to 15$ prediction.

As can be seen from Figure 6, the prediction with $S_{3 - 5 NN}^{L}$ performs very closely to the standard SVM using $S_{3 - all}^{L}$ but allows dramatic reduction of training size (5 days as compared with 21 days). The average $\bar{MAPEs}$ over the five days are 11.78% and 12.12% for $S_{3 - all}^{L}$ and $S_{3 - 5 NN}^{L}$ , respectively. By contrast, the prediction made by $S_{3 - 5 FN}^{L}$ is considerably worse than its counterparts. This indicates that a substantial decrease in training size in a “supervised” learning only leads to a marginal decline in prediction accuracy. It follows that given a reasonable training size the similarity of patterns between the training and testing data is the critical factor that governs the prediction accuracy. This is similar to the finding in ( 29 ) that if the training data contain sufficient good data (support vectors) to construct hyperplanes and represent the input properly, the other data can be discarded.

We further explored the features of the NN method by addressing the question of whether the NN-SVM is workable under more dynamic and unexpected conditions like incidents and whether the similarity effect helps to enhance prediction under incidents. Ideally, these questions should be explored and verified with actual data. However, since the availability and the quality of incident data do not warrant a prediction that requires good data resolution and incident attributes, we attempted to investigate these issues under simulated incident conditions. The simulated segment has geometrical similarity to the actual site (segment 80007766) of four lanes in each direction, except for the length that is extended from 400 m to 1,000 m. Three levels of traffic demand representing the volumes during the night time, day time, and a.m. peak, derived from a typical day in the HDB are created in $S_{traffic}$ :

S_{traffic} = {low_volume, medium_volume, high_volume}

(18)

where the low, medium, and high volumes are specified at 1,000, 4,000, and 7,000 vph, respectively.

And three incident scenarios:

S_{incident} = {no_incident, one_lane_closure, two_lane_closure}

(19)

Given the sets $S_{traffic}$ and $S_{incident}$ , there are a combination of nine scenarios:

S 1 = {low_volume, no_incident}

S 2 = {low_volume, one_lane_closure}

S 3 = {low_volume, two_lane_closure}

S 4 = {medium_volume, no_incident}

S 5 = {medium_volume, one_lane_closure}

S 6 = {medium_volume, two_lane_closure}

S 7 = {high_volume, no_incident}

S 8 = {high_volume, one_lane_closure}

S 9 = {high_volume, two_lane_closure}

for 180 min of simulation time, decomposed into:

from start to 60th min: warm-up period

from 61st min to 120th min: no incident

from 121st min to 150th min: incident

from 151st min to 180th min: no incident.

The necessary number of repetitions should be strictly estimated through an iterative process. However, because of the limited time for the high number of scenarios, we empirically determined the number of repetitions as five runs for each scenario where the random seeds are consecutively assigned as 20, 30, 40, 50, and 60. The characteristics of these five random seeds are that they have the same the traffic demand and incident scenario, but vehicles are released randomly by the simulation generator, thus traffic patterns are similar but not identical.

To examine the similarity effect of the NN-SVM method, we used data from one random seed for the test set, and data from the other four random seeds, together with the remaining eight scenarios for training. Data on traffic volumes is collected every 2 min. The parameters of prediction model are set as: rolling horizon $Ω$ = 10 min to predict volumes for prediction intervals $Δ$ = 2, 6, and 10 min.

We apply the concept of the aforementioned distance metric to define the closeness between the test data and the training data, sorted by distances, then classified the training data into three training sets: Train-All for the whole training set, Train-NN for the smallest distances (six datasets), and Train-farNN for the farthest distances (six datasets). As can be expected, cases of four random numbers in the training are included in Train-NN with relative distances from 0.04 to 0.06. The other scenarios have higher distance values, ranging from 0.27 to 0.52. Figure 7 shows the result of the NN-SVM prediction. Again, the prediction results with Train-NN are very close to that with Train-All for all prediction intervals. By contrast, the prediction with Train-farNN incurs high errors, even with $Δ$ = 2 min. These clearly demonstrate the effect of the k-NN method for SVM training in the NN-SVM model.

Figure 7.

Nearest neighbor-support vector matrix with simulated incident data.

Discussion

By using actual and simulated data, we have explored the merits of applying the k-NN method to improve SVM prediction. It is appealing that the method allows training at a very high speed. For example: for the prediction $60 \to 15$ , the training time with $S_{3 - 5 NN}^{L}$ (1,125 instances) can be speeded up by a factor of 60 (3 s versus 3 min) compared with that by $S_{3 - all}^{L}$ (4,725 instances). For a larger training set, this factor can be greater, since the training time tends to increase substantially with its size. The pre-processing time is considerably short and, if included, the total training time is still much less than that of the complete training set. This is an attractive feature, especially for online implementation, since it allows time saving without compromising the prediction quality.

It should be noted that there is a tradeoff between distance, training time and accuracy: a high distance threshold will result in less data being cut off, the prediction accuracy enhances with a compromise in the training time. On the contrary, a low distance threshold enables fast training, but the data may become too sparse to provide enough representation power, leading to poorer performance. There is no rule universally established for distance selection, since it depends on data and application. However, a certain threshold can be determined empirically associated with a desirable accuracy and training speed.

The k-NN method presented above is for offline training, and is applied for 24 h. Offline training uses a simple forecasting approach by directly compute the average of the k-NNs, while more sophisticated approaches in the literature generate forecasting values by weighting the k-NNs according to their distances to the current state vector ( 14 ). This is particularly relevant to online-forecasting where different instances in the rolling horizon have different relevancy to the current time, represented by relative weights. Therefore, for online application the method should be modified to adapt to the real-time requirement.

Recall that $Ω$ denotes the rolling horizon, the NN method should aim at retrieving patterns that are most similar to the data in $Ω$ . Let ${\vec{Z}}^{T} (t, Ω)$ and ${\vec{Z}}^{L} (t, Ω)$ respectively denote vectors of volumes in the test data and learning data within time window (t, t- $Ω$ ), $δ$ denotes the length of a time interval, n denotes number of intervals in $Ω$ , $V_{Z^{T}} (t)$ denotes the traffic volume at time t in the test set, $V_{Z^{L}} (t)$ denotes the traffic volume in the test data at time t in the learning set, and $D ({\vec{Z}}^{T} (t, Ω), {\vec{Z}}^{L} (t, Ω))$ denotes weighted sum of the relative volume differences between the test set and the learning set. Their relationship can be expressed in Equation 20:

D ({\vec{Z}}^{T} (t, Ω), {\vec{Z}}^{L} (t, Ω)) = \frac{1}{n} \sum_{i = 0}^{n - 1} w_{i} \frac{| V_{Z^{T}} (t - δ \times i) - V_{Z^{L}} (t - δ \times i) |}{V_{Z^{T}} (t - δ \times i)} .

(20)

where $w_{i}$ indicates the relative “weight” from each interval i in the rolling horizon, attributed to the total distance. Since different instances in $Ω$ have different relevancy to the current time, the weights should be strictly decreased in $Ω$ . Once the actual data for the predicted interval is collected, it is updated in the test set. The process is then “rolled” forward with a rolling step in the rolling horizon procedure ( 47 ). Note that the learning set now can be extended in the HDB to a reasonable extent.

The k-NN distance estimation presented in Equations 17 and 20 is empirical and heuristic in nature. It follows the form of MAPE that targets the lowest prediction error. Alternative techniques of dissimilarity metric commonly used in literature include Euclidean distance, square distance ( 44 ) and absolute distance, and windowed Nearest Neighbor ( 53 ). The acronym “similarity,” being the inverse of “distance” can also be used. In essence, all of these techniques attempt to learn relevant information before the current time.

Conclusion

This paper presents an investigation into the performance of the SVM method for short-term traffic flow prediction in comparison with traditional time-series traffic flow prediction methods, including the Historical Mean predictor, the Current Time Based predictor, and the Double Exponential Smoothing. To improve SVM training, we investigate the application of the k-NN method for the training using actual data, and further reinforce the merit of the k-NN approach with extensive simulated data. From the results, the following findings are summarized:

The performance of the SVM-based predictor is significantly better than the baseline predictors for prediction intervals of less than 30 min. It has excellent performance particularly for small $Δ s$ .

The rolling horizon has a clear effect on SVM’s prediction accuracy: the longer the rolling horizon, the more accurate the prediction is, because of the capability of SVM in solving complex classification problems in a high-dimensional space.

In SVM training, the similarity of patterns between training and testing sets governs the prediction accuracy. By contrast the training size is not critical: given a reasonable training size, if the training data contain sufficient support vectors to construct hyperplanes to represent the input properly, the other data can be discarded.

The k-NN is an attractive tool for SVM training. In retrieving the most similar patterns in the learning data for SVM training, the method allows substantial reduction of training size to accelerate the training, without compromising the prediction quality.

The attractive features when combining k-NN with SVM should be deployed for online applications: a hybrid kNN-SVM model can be established, where k-NN works as a pre-processing component that looks for the most similar patterns, whose similarity is estimated by assigning relative “weights” for different intervals in the rolling horizon. This conceptual methodology shall be elaborated by a system architecture.

Furthermore, we note that the first finding stated above is the comparative results between SVM and relatively simple prediction tools. It is worth exploring the comparative performance of SVM against other advanced machine learning algorithms, such as the ANN method, and this is a subject of our future research.

Footnotes

Acknowledgements

The authors would like to gratefully acknowledge the Land Transport Authority of Singapore for its provision of data used in this study.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Toan and Hung; data collection: Toan; analysis and interpretation of results: Toan and Hung; draft manuscript preparation: Toan. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Data Accessibility Statement

Data used in this research is provided by the Land Transport Authority of Singapore on request for research purpose.

References

Yan

Liu

Wang

A Short-Term Trafﬁc Flow Forecasting Method Based on the Hybrid PSO-SVM. Neural Processing Letters, Vol. 43, No. 1, 2015, pp. 155–172.

Smith

B. L.

Demetsky

M. J.

Short-Term Trafﬁc Flow Prediction: Neural Network Approach. Transportation Research Record: Journal of the Transportation Research Board, 1994. 1453: 98–104.

Okutani

Stephanedes

Y. J.

Dynamic Prediction of Trafﬁc Volume through Kalman Filtering Theory. Transport Research Part B, Vol. 18, No. 1, 1984, pp. 1–11.

Whittaker

Garside

Lindveld

Tracking and Predicting a Network Traffic Process. International Journal of Forecasting, Vol. 13, 1997, pp. 51–61.

Guo

Huang

Williams

B. M.

Adaptive Kalman Filter Approach for Stochastic Short-Term Trafﬁc Fow Rate Prediction and Uncertainty Quantiﬁcation. Transportation Research Part C Emerging Technologies, Vol. 43, 2014, pp. 50–64.

Cai

Zhang

Yang

Zhou

Qin

A Noise-Immune Kalman Filter for Short-Term Traffic Flow Forecasting. Physica A: Statistical Mechanics and its Applications, Vol. 536, 2019, pp. 1–9.

Williams

B. M.

Durvasula

P. K.

Brown

D. E.

Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models. Transportation Research Record: Journal of the Transportation Research Board, 1998. 1644: 132–141.

Attanayake

A. M. C. H.

Perera

S. S. N.

Liyanage

U. P.

Combining Forecasts of Arima and Exponential Smoothing Models. Advances and Applications in Statistics, Vol. 59, No. 2, 2019, 199–208.

Ahmed

M. S.

Cook

A. R.

, 1979. Analysis of Freeway Trafﬁc Time-Series Data by using Box-Jenkins Techniques. Transportation Research Record: Journal of the Transportation Research Board, 1979. 773: 1–9.

10.

Williams

B. M.

Multivariate Vehicular Traffic Flow Prediction: An Evaluation of ARIMAX. Transportation Research Record: Journal of the Transportation Research Board, 2001. 1776: 194–200.

11.

Williams

B. M.

Hoel

L. A.

Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. Journal of Transportation Engineering, Vol. 129, No. 6, 2003, pp. 664–672.

12.

Lin

Sadek

A K Nearest Neighbor Based Local Linear Wavelet Neural Network Model for On-Line Short-Term Trafﬁc Volume Prediction. Procedia - Social and Behavioral Sciences, Vol. 96, 2013, pp. 2066–2077.

13.

Xiaoyu

Yisheng

Siyu

Short-Term Traffic Flow Forecasting Based on Two-Tier K-Nearest Neighbor Algorithm. Procedia - Social and Behavioral Sciences, Vol. 96, 2013, pp. 2529–2536.

14.

Zheng

Short-Term Traffic Volume Forecasting: A K-Nearest Neighbor Approach 14 Enhanced by Constrained Linearly Sewing Principle Component Algorithm. Transportation Research Part C Emerging Technologies, Vol. 43, 2014, pp. 143–157.

15.

Song, X.

Guan

Yang

Yao

K-Nearest Neighbor Model for Multiple-Time-Step Prediction of Short-Term Traffic Condition. Journal of Transportation Engineering, Vol. 142, No. 6, 2016.

16.

Park

Messer

C. J.

Urbanik

T. II.

Short-Term Freeway Trafﬁc Volume Forecasting using Radial Basis Function Neural Network. Transportation Research Record: Journal of the Transportation Research Board, 1998. 1651: 39–47.

17.

Dia

An Object-Oriented Neural Network Approach to Short-Term Traffic Forecasting. European Journal of Operational Research, Vol. 131, 2001, pp. 253–261.

18.

Abdi

Moshiri

Abdulhai

Sedigh

A. K.

Forecasting of Short-Term Trafﬁc Fow based on Improved Neurofuzzy Models via Emotional Temporal Difference Learning Algorithm. Journal of Engineering Applications of Artificial Intelligence, Vol. 25, No. 5, 2012, pp. 1022–1042.

19.

Zhang

Seasonal Autoregressive Integrated Moving Average and Support Vector Machine Models: Prediction of Short-Term Traffic Flow on Freeways. Transportation Research Record: Journal of the Transportation Research Board, 2011. 2215: 85–92.

20.

Zhang

Xie

Forecasting of Short-Term Freeway Volume with v-Support Vector Machines. Transportation Research Record: Journal of the Transportation Research Board, 2007. 2024: 92–99.

21.

Zhang

Hou

Short Term Traffic Flow Prediction Based on Improved Support Vector Machines. Journal of Applied Science and Engineering, Vol. 21, No. 1, 2018, 2532.

22.

Gebresilassie

M. A.

Temporal Trafﬁc Flow Prediction. Master thesis. KTH Royal Institute of Technology. Stockholm, Sweden, 2017.

23.

Cai

Chen

Cai

Zhou

Qin

SVMGSA: Hybrid Learning Based Model for Short-Term Traffic Flow Forecasting. IET Intelligent Transport Systems, Vol. 13, No. 9, 2019, 1348.

24.

Yisheng

Duan

Kang

Wang

F. Y.

Traffic Flow Prediction with Big Data: A Deep Learning Approach. IESVMGSAEE Transactions on Intelligent Transportation Systems, Vol. 16, No. 2, 2015, pp. 865–873.

25.

Smith

B. L.

Williams

B. M.

Oswald

Comparison of Parametric and Nonparametric Models for Traffic Flow Forecasting. Transportation Research Part C Emerging Technologies, Vol. 10, No. 4, 2002, pp. 303–321.

26.

Zheng

Lee

D. H.

Short-Term Freeway Traffic Flow Prediction: Bayesian Combined Neural Network Approach. Journal of Transportation Engineering, Vol. 132, No. 2, 2006.

27.

Zhu

Cao

Fan

Short-Term Traffic Flow Prediction Based on Flocking Theory and RBF Neural Network. Proc., 2nd International Conference on Transportation Information and Safety ICTIS, 2013.

28.

Cicek

Z. I. E.

Ozturk

Z. K.

Combination Methods for Traffic Flow Forecasting. Proc., 10th International Symposium on Intelligent Manufacturing and Service Systems, 2019. Sakarya/Turkey, pp. 992–1000.

29.

Vanajakshi

Rilett

L. R.

A Comparison of the Performance of Artificial Neural Network and Support Vector Machines for The Prediction of Traffic Speed. IEEE Intelligent Vehicles Symposium. University of Parma, Italy, 2004.

30.

Kecman

Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models. The MIT Press, Cambridge, MA, 2001.

31.

Vapnik

The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.

32.

Furey

T. S.

Cristianini

N. D.

Bednarski

D. W.

Schummer

Haussler

Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics, Vol. 16, No. 10, 2000, pp. 906–914.

33.

Kim

K. J.

Financial Time Series Forecasting using Support Vector Machines. Neurocomputing Vol. 55, 2003, pp. 307–319.

34.

Xiao

Liu

Traffic Incident Detection using Multiple-Kernel Support Vector Machine. Transportation Research Record: Journal of the Transportation Research Board, 2012. 2324: 44–52.

35.

Sun

Chen

Use of Support Vector Machine Models for Real-Time Prediction of Crash Risk on Urban Expressways. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2432: 91–98.

36.

Mokhtarimousavi

Anderson

J. C.

Azizinamini

Improved Support Vector Machine Models for Work Zone Crash Injury Severity Prediction and Analysis. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 680–692.

37.

Lam

S. H.

Toan

T. D.

Short-Term Travel Time Prediction using Support Vector Regression. Presented at 87th Annual Meeting of the Transportation Research Board, Washington, D.C., 2008.

38.

Akter

Huda

Nahar

Akter

Travel Time Prediction using Support Vector Machine(SVM) and Weighted Moving Average(WMA). International Journal of Engineering Research & Technology (IJERT), 2015. 4(12): 496–502.

39.

Mingheng

Yaobao

Ganglong

Gang

Accurate Multi-steps Traffic Flow Prediction Based on SVM. Mathematical Problems in Engineering, Vol. 2013, pp. 1–8.

40.

W., H.

Yan

Liu

Wang

A Short-Term Trafﬁc Flow Forecasting Method Based on the Hybrid PSO-SVR. Neural Processing Letters, Vol. 43, 2016, pp. 155–172.

41.

Luo

Huang

Cao

Huang

Guo

Wei

Short-Term Trafﬁc Flow Prediction Based on Least Square Support Vector Machine with Hybrid Optimization Algorithm. Neural Processing Letters, Vol. 50, 2019, pp. 2305–2322.

42.

Cong

Wang

Lia

Traffic Flow Forecasting by a Least Squares Support Vector Machine with a Fruit Fly Optimization Algorithm. Procedia Engineering, Vol. 137, 2016, pp. 59–68.

43.

Yuanyuan

Weixiang

Short-Term Traffic Flow Forecasting Based on SVR. Advances in Engineering Research, Vol. 166, 2018, pp. 57–61.

44.

Oswald

R. K.

Scherer

W. T.

Smith

B. L.

Traffic Flow Forecasting using Approximate Nearest Neighbour Nonparametric Regression. Research Project Report. Report No. UVACTS-15-13-7, University of Virginia, 2001.

45.

Zhao

Wei

Dong-mei

Gan

Jian-hua

Short-Term Traffic Flow Forecast Based on Combination of K- Nearest Neighbor Algorithm and Support Vector Regression. Journal of Highway and Transportation Research and Development, Vol. 12, No. 1, 2018, pp. 122–129.

46.

Gunn

S. R.

Support Vector Machine for Classification and Regression. Technical Report. University of Southampton, UK, 1998.

47.

Peeta

Mahmassani

H. S.

Multiple User Classes Real-Time Traffic Assignment for Online Operations: A Rolling Horizon Solution Framework. Transportation Research Part C Emerging Technologies, Vol. 3, No. 2, 1995, pp. 83–98.

48.

Chang

C. C.

Lin

C. J.

LIBSVM: A Library for Support Vector Machines, 2001. Accessed March 22nd, 2020. http://www.csie.ntu.edu.tw/cjlin/libsvm

49.

Tyagi

Kalyanaraman

Krishnapuram

Vehicular Traffic Density State Estimation Based on Cumulative Road Acoustics. IEEE Transactions on Intelligent Transportation Systems, Vol. 13, No. 3, 2012, pp. 1156–1166.

50.

Toan

T. D.

Lam

S. H.

Development of a Rule-Based System for Congestion Management. Presented at 84th Annual Meeting of the Transportation Research Board, Washington D.C., 2005.

51.

Makridakis

Wheelwright

S. C.

McGee

V. E.

Forecasting: Methods and Applications. Wiley, New York, 1983.

52.

Ishak

Al-Deek

Performance Evaluation of Short-Term Time-Series Traffic Prediction Model. Journal of Transportation Engineering ©ASCE, Vol. 128, No. 6, 2002.

53.

Rice

Zwet

E. V.

A Simple and Effective Method for Predicting Travel Times on Freeways. IEEE Transactions on Intelligent Transportation Systems, Vol. 5–3, No. 3, 2001, pp. 200–207.