Abstract
In order to overcome the problems of low accuracy and long time-consuming in traditional short-term forecasting methods for dynamic traffic flow, a short-term forecasting method for dynamic traffic flow based on stochastic forest algorithm is proposed in this paper. This method chooses short-term forecasting equipment for dynamic traffic flow, eliminates invalid data from the collected data, and normalizes the available data to complete data preprocessing before traffic flow forecasting. A combined forecasting model is established to optimize the output of the pretreatment results and complete the dynamic traffic flow rate forecasting. On this basis, the stochastic forest algorithm is introduced to train the sampling set of flow rate decision tree and generate short-term flow decision tree to realize short-term forecasting of dynamic traffic flow. The experimental results show that the forecasting time of the proposed method is short, always less than 0.5 s, and the forecasting accuracy is high, with more than 97%, so it is feasible.
Keywords
Introduction
With the increase of urban population, the urban traffic problem has become more and more serious, which has attracted widespread attention and people’s daily travel choices are also increasing. At the same time, the traffic problem has become a key problem to be solved urgently. In order to maintain normal traffic order and ensure the operation safety of traffic roads, it is necessary to make a short-term forecasting of traffic flow. Based on the forecasting results, traffic rules and road construction and maintenance should be corrected appropriately. Traffic flow is highly non-linear and uncertain, and the forecasting results are required to be timeliness and time-dependent, which brings great challenges to traffic flow forecasting [1]. Nowadays, with the rapid development of national economy and China’s basic transportation construction, the traffic flow forecasting is not limited to the operation of classical pure mathematical methods. The corresponding intelligent forecasting algorithm can quickly get the traffic flow forecasting results.
In recent years, scholars at home and abroad have obtained relevant research results of intelligent traffic flow forecasting. Among them, reference [2] proposes a short-term forecasting method for traffic flow based on exponential smoothing and Markov chain. Aiming at the problem that the forecasting accuracy of exponential smoothing method is not high when fitting the measured data, this method combines exponential smoothing theory with Markov chain, and proposes a short-term traffic flow forecasting method based on exponential smoothing Markov. With the help of Markov chain, it solves the problems existing in exponential smoothing method to narrow the forecasting interval and improve improving the calculation of weighted state centers and state transition probability matrices, so as to improve the accuracy of future state forecasting. However, this method has the problem of low forecasting accuracy in practical application. Reference [3] proposes a short-term forecasting method for traffic flow based on GA-Elman neural network. This method takes single-section traffic flow as the research object, optimizes the weight and threshold of Elman neural network by genetic algorithm, overcomes the shortcoming that Elman neural network is easy to fall into local minimum, and improves the generalization ability and forecasting accuracy of Elman neural network. However, this method has the problem of more time-consuming in forecasting, and it is difficult to achieve short-term traffic flow forecasting. Reference [4] proposes a short-term forecasting method for traffic flow based on support vector regression machine. In this method, the traffic parameters such as traffic flow, occupancy and average speed collected by the traffic detector in the first several periods of a certain time and in the first several periods of upstream and downstream are taken as input, the corresponding traffic flow in the corresponding period is taken as output, and the kernel function is selected to train the support vector regression machine. The trained support vector regression machine is used to input traffic flow, occupancy rate and average speed to predict the traffic flow in the next period. Combined with the traffic flow situation of upstream and downstream, the short-term traffic flow forecasting is realized. However, the accuracy of this method is low.
Because these forecasting methods can only get static forecasting results, they can not meet the demand of dynamic traffic forecasting. Therefore, on the basis of traditional forecasting methods, in order to solve the problems existing in the traditional forecasting methods, the stochastic forest algorithm is used to optimize the short-term forecasting method for dynamic traffic flow. The stochastic forest algorithm mainly compares the dynamic traffic to the sample decision tree, and uses multiple decision trees to train and forecast the traffic flow samples. The application of this algorithm can save the time consumed in the short-term forecasting of dynamic traffic flow on the premise of guaranteeing the results. The overall scheme of the method is as follows:
(1) Short-term forecasting equipment for dynamic traffic flow is chosen, which includes dynamic traffic flow monitor, flowmeter, dynamic table, non-leaf node controller and so on. Invalid data are removed from the collected data, and the available data are normalized to complete the data pre-processing before traffic flow forecasting.
(2) Dynamic traffic is divided into low-energy state, high-energy state, state failure and interruption. Based on this, a combined forecasting model is established to optimize the output of pretreatment results and complete the forecasting of dynamic traffic flow rate.
(3) The meaning and characteristics of stochastic forest algorithm and its advantages in short-term forecasting of dynamic traffic flow are expounded.
(4) The stochastic forest algorithm is introduced to train the sample set of flow rate decision tree, generate the short-term flow decision tree, synthesize the training results of the decision tree table and the sample set of the decision tree, and obtain the short-term forecasting results of dynamic traffic flow, so as to realize the short-term forecasting for dynamic traffic flow.
(5) Experiments are carried out to compare the accuracy and time-consuming of different research methods.
(6) Conclusions are drawn and the future is look forward.
Analysis of short-term forecasting method for dynamic traffic flow
The design and implementation of short-term forecasting method for dynamic traffic flow based on Stochastic Forest algorithm are mainly divided into two steps. The first step is to use time series and wavelet transform technology to predict the dynamic traffic flow rate. The second step is to use stochastic forest algorithm to obtain the short-term forecasting results of dynamic traffic flow. The realization process of short-term dynamic traffic flow forecasting is shown in Fig. 1.

Flow chart of short-term forecasting for dynamic traffic flow.
Selection of short-term forecasting equipment for dynamic traffic flow
From the point of view of stochastic probability, the predicted results of dynamic traffic flow are related to mode, machine learning degree and controlled area of dynamic traffic flow acquisition equipment, while from the point of view of dynamic traffic flow, it is related to time, key points, decision tree and stochastic forest machine learning algorithm. Therefore, it is necessary to establish the relevant parameters from the above two perspectives to ensure the accuracy of dynamic traffic flow forecasting. The general structure of the dynamic traffic flow acquisition equipment is shown in Fig. 2.

Structure of dynamic traffic flow acquisition equipment.
Dynamic traffic flow acquisition equipment includes dynamic traffic flow monitor, flow meter, dynamic table, non-leaf node controller, etc. The flow collection devices such as electric dynamic meter and non-leaf node controller are installed in the dynamic traffic intersection, while the dynamic traffic flow monitor is installed in the higher position of the traffic intersection, which ensures the accuracy of data collection to the greatest extent without affecting the dynamic traffic operation.
In the process of acquisition, because of the influence of environment or the problem of equipment settings, the collected data will be lost or invalid, so it is necessary to pre-process the original data. The data that need to be processed include sample set data, basic traffic data, average traffic flow change and time of dynamic traffic, and traffic change [5]. Generally, the data collected from the sample set are integer and uniformly distributed. The reasonable range of variation of each measurement data and an invalid data determination condition are shown in Table 1.
Elimination condition of data acquisition
Elimination condition of data acquisition
The acquisition data that meet the rejection conditions are judged as invalid data and deletion processing is adopted. In addition, in the actual dynamic operation process of traffic, due to traffic overhaul and other emergencies will lead to data loss, it is necessary to repair the abnormal values and missing values collected [6]. The collected data are divided into a group every 2 minutes according to the time of collection. The anomaly recognition of the collected data in each group can detect the anomaly and missing values in the data aggregation. Based on the principle of continuity and integrity of data acquisition, data completion and correction are realized by referring to the data of ringing bells.
By adjusting the weights of the filtered data, the difference between the two predictive probability distributions can be measured. The smaller the results are, the more similar the probability distributions of the two learning decision trees are. The original permissions for data collection are defined as:
The parameter x in Formula 1 represents any data collected, q (x) is a known probability distribution and p (x) is a predicted probability distribution. Formula 2 is used to fine-tune the original weights.
The wind speed acquisition data adjusted by weight is normalized, and the maximum and minimum values of wind speed acquisition data are counted to realize the normalization of parameters.
Where, v t denotes the traffic flow used to limit the forecasting.
In order to distinguish the dynamic state of traffic, so as to differentiate the traffic state under different states in time difference allocation, and then realize the short-term forecasting of dynamic traffic flow, dynamic traffic is divided into low-energy state, high-energy state, state failure and interruption. Dynamic traffic classification is based on:
Formula 4 shows the category of dynamic traffic. Since dynamic traffic is divided into four categories, the value of S is 0–4. A i , B i , C i and D i represent the situation, current status, predicted traffic flow parameters and minimum set traffic flow [7]. In the case of obtaining dynamic traffic state, the combination forecasting model is built by combining traditional forecasting technology. The result of the combination forecasting model is shown in Fig. 3.

Forecasting model.
The influence parameters of dynamic traffic flow forecasting are calculated separately, and then the forecasting value of dynamic traffic flow is obtained. The specific forecasting calculation process is shown in Fig. 4.

Flow chart of dynamic traffic flow forecasting.
The data provided by the dynamic traffic flow monitor calculates the dynamic traffic flow value, which mainly calculates the number of vehicles per unit time. The quantity is related to the location of the dynamic traffic flow monitor and the selection of the time interval. The calculation formula can be expressed as follows.
The average traffic flow per hour is taken as the reference value for calculation, and the magnitude of the reference value directly affects the forecasting results of traffic flow. Because the dynamic traffic flow scale has the characteristics of fluctuation and discontinuity, it is necessary to calculate the average occurrence probability distribution of vehicle under traffic flow. The dynamic traffic flow characteristics are shown in Fig. 5.

Characteristics of dynamic traffic flow.
In Fig. 5, a is the amplitude of dynamic traffic flow, b is the time consumed by dynamic traffic flow changes, c is the maximum offset of dynamic traffic flow, and parameter d is the disappearance time of dynamic traffic flow [8]. Through the probability analysis of the dynamic traffic flow in the Figure, the expression of the flow density function under the dynamic traffic conditions can be obtained, as shown in Formula (6).
In Formula (6), v is the average traffic flow, c and k represent the two parameters of the forecasting model respectively, and the values are constant.
The change of traffic flow is mainly determined by time and emergency events. Emergency events directly affect the change of traffic flow. Therefore, it is necessary to determine the possibility of emergency events when calculating dynamic traffic flow [9]. The calculation method is shown in Formula (7).
In the formula, ρ is the density of dynamic traffic flow, V is the width of dynamic traffic, t is the monitoring time interval of dynamic traffic, and S is the effective monitoring flow of dynamic traffic [10]. The effective flow of dynamic traffic is calculated by the vehicle before and after the selected section, and that does not enter or entering to a part does not join the calculation, as shown in Fig. 6.

Diagram of effective dynamic traffic flow identification.
Assuming that the effective length of dynamic traffic is expressed by R, the area calculation result of effective flow of dynamic traffic can be obtained by Formula (8).
Where, A represents the length of the selected pavement, integrates all the calculation results of the parameters, and substitutes them into Formula (9), namely:
From this, we can get the effective flow of dynamic traffic under parameter limitation [11]. If the traffic efficiency of dynamic traffic is χ, the predicted results of dynamic traffic flow can be calculated by Formula (10).
By synthesizing the real-time traffic flow of dynamic traffic, the relationship between sample x and the predicted traffic flow curve can be obtained as shown in Fig. 7.

Relation between sample and predicted flow curve.
Stochastic Forest is a machine learning algorithm for classification and regression. It is a very flexible and practical method. In each node of the classification tree, bootsrap resampling method is used to extract multiple samples from the original sample, combine many relatively independent decision trees, and establish a decision tree “forest” and “vote” to determine the final forecasting results.
Because of the huge amount of dynamic traffic flow data, the Stochastic Forest algorithm can deal with the data set which contains more data independently and improve the efficiency of data processing. Compared with all current short-term forecasting algorithms for dynamic traffic flow, it has excellent accuracy. Not only that, it can also process input samples with high-dimensional characteristics, and it does not need dimensionality reduction, reducing data redundancy and data processing process. In the process of generating Stochastic Forest, an unbiased estimation of internal generated errors can be obtained, and good results can be obtained for missing value problems, especially in estimating and inferring mapping. It is not necessary to debug many parameters like SVM, and it has good tolerance for outliers. It can be applied to dynamic traffic flow forecasting to improve forecasting. Moreover, the training and learning effect of high-dimensional data of dynamic traffic volume is better. The process of building a Stochastic Forest is shown in the following figure.
Short-term forecasting for dynamic traffic flow based on stochastic forest algorithm
In view of the above advantages, this algorithm is introduced into short-term forecasting of dynamic traffic flow. Based on the results of dynamic traffic flow forecasting, the Stochastic Forest algorithm is used to predict and analyze the short-term dynamic traffic flow, and then the accurate short-term dynamic traffic flow results are obtained [12]. The specific forecasting process is shown in Fig. 9.

Stochastic Forest construction process.

Flow Chart of Short-term Forecasting for Dynamic Traffic Flow.
According to the results of dynamic traffic flow forecasting, we can see that there are some abnormal data in the data. Let a total contain N samples, and then extract n individuals as samples without replaying them. The opportunity of individual extraction in each period of dynamic traffic is equal [13]. Therefore, each dynamic traffic sample is set as the sample of the initial data set, and the traffic volume of each dynamic traffic training subset is about three-quarters of the original training set. The process of training decision tree sampling set is shown in Fig. 10.

Flow chart of training decision tree sampling set.
In the process of training decision tree sampling set, it can be found that a complete decision tree can be generated by node splitting. Among them, the attributes involved in comparing the split attributes of nodes are empty data or non-digital data. This kind of abnormal data is mainly caused by the failure of data transmission terminal in dynamic traffic flow acquisition [14]. The deviation data is a random characteristic variable that does not conform to the Stochastic Forest algorithm, and there is a serious deviation from the median monitoring traffic. The random features sampled from the training decision tree are linearly combined, and the data form is taken as input variable. Then the Stochastic Forest algorithm is constructed. The data will not be out of line during the generation of each sub-tree. The data with over-fitting and local convergence are inserted into a single decision tree until the output of the data is 0 [15], and the above process is cycled to form multiple classifier forests. Let i be the cyclic variable in the program, and over-fit all the data in the sample set of the forecasting results, judge whether there are local defects, and classify all the abnormal data and output them to get the final training decision tree sampling set [16].
The decision tree sampling set is used to select the attributes of nodes, and multiple attributes are randomly selected from the whole decision tree sampling set. The original data subset nodes are determined. The test attributes of regression tree are collected by bagging method. As the training samples of meta-decision tree in each decision tree, any branch is generated parallel at different number of nodes. Each meta-decision tree on the tree is described by the selected feature rules, and each decision tree is constructed, as shown in Fig. 11.

Decision tree generation process.
According to the gain reduction of the complexity of the decision tree sampling set defined in the figure, when each node is divided into nodes, the reception sample spaces of the same layer do not overlap each other, so as to ensure that the optimal attribute under the splitting rule is passed along with the node splitting [17]. The example of unknown hierarchical partition has good expansibility in Stochastic Forest algorithm, so the mathematical definition of stochastic forest is expressed as follows:
The sample set is a forest consisting of a limited set of n decision trees. Let the pattern vector of a positive decision tree value in the sample set be represented by Formula (12).
In the formula, x
kj
represents the initial data set corresponding to the Stochastic Forest algorithm, and x
k
represents the initial data value at the j-th level of the decision tree [18]. Then the general expression of each decision tree can be obtained by Formula (13) as follows:
Where, the parameter d ik in the formula is the vector distance between the sample x k of the i-th layer in the Stochastic Forest and the average value of the extracted data set. The resulting P j is the expression of each decision tree.
Combining the training results of decision tree table and decision tree sampling set, the short-term forecasting results of dynamic traffic flow are obtained. The calculation of short-term dynamic traffic flow is mainly to bring the selected data of decision tree sampling set into the table of decision tree. The flow constant data of different time periods are expressed as C1, C2, C3 and C4, which represent the normal time, overflow time, low peak time and stopping time, respectively. First of all, it is necessary to confirm whether or not each time is in the table. In the case of data overlap or cross-correlation, it is necessary to avoid repetitive calculation resulting in poor forecasting accuracy [19]. The expression of overlapping or cross-correlation test of data is shown in Formula (14).
Formula (14) mainly tests the correlation among data C1, C2, C3 and C4. Similarly, all overlaps in the data can be calculated, and the overlaps can be deleted. The complete short-term traffic forecasting results can be expressed as
In the formula, P
j
is the intensity of correlation obtained by Stochastic Forest algorithm, C is the flow in the selected time of dynamic traffic, and t
e
is the result of voting. The short-term flow of dynamic traffic can be predicted by the calculation of
In order to verify the overall performance of the short-term forecasting method for dynamic traffic flow based on Stochastic Forest algorithm, performance test analysis is designed. The experimental scheme is as follows: designing simulation experiment environment, using traffic flow information acquired by data acquisition equipment in a local city within 24 hours as experimental data, selecting the short-term forecasting method for traffic flow based on exponential smoothing and Markov chain in reference [2], the short-term forecasting method for traffic flow based on GA-Elman neural network in reference [3], the short-term forecasting method for traffic flow based on support vector regression machine in reference [4] as experimental comparison methods. Firstly, the forecasting time of different research methods is compared, and the shorter the forecasting time is, the shorter the forecasting delay is, and the higher the timeliness is. Then the forecasting accuracy of different research methods is compared, and the higher the forecasting accuracy is, the more accurate the short-term forecasting for dynamic traffic flow can be realized by this method is. The specific experimental process is as follows:
Constrained by the hardware conditions, the simulation test environment is built to simulate the random situation and environment of the actual dynamic traffic in the simulation environment, and the random program is written to control the environment by controlling the program parameters. The experimental simulation environment is shown in Fig. 12.

Simulation diagram of experimental environment.
In the test experiment, the traditional short-term flow forecasting method is set up as the comparative experiment method in the performance test experiment. The data acquisition equipment is installed on the dynamic traffic road surface in the simulation environment, and the real-time data acquisition results are input into the corresponding forecasting method. When starting the forecasting program, the standard counting mode at the same time is opened. The actual flow of dynamic traffic pavement and the forecasting results under different methods are statistically calculated.
(1) Comparison of forecasting time-consuming
In order to verify the time consumed in short-term forecasting for dynamic traffic flow by different research methods, a time-consuming comparative test is carried out. The test results are shown in Fig. 13.

Time-consuming comparison of different research methods.
From the analysis of the figure above, it can be seen that the forecasting time of the method in reference [3] is more than 4 s, which is the highest of the four methods. The forecasting time of the proposed method is always less than 0.5 s, which is the shortest of the four methods. This shows that the forecasting time of the proposed method has short forecasting delay and high aging time when performing short-term dynamic traffic flow forecasting. The reason is that the proposed method combines Stochastic Forest algorithm to select short-term forecasting equipment for dynamic traffic flow, eliminates invalid data in the number of equipment acquisition, and normalizes the available data, effectively shortening the forecasting time.
(2) Comparison of forecasting accuracy
On the basis of comparing the time-consuming of different studies, in order to compare the overall performance of different studies, the accuracy of forecasting is compared. The results are shown in Fig. 14.

Comparison of forecasting accuracy of different research methods.
From the curve trend in the figure, it can be seen that among the four methods, the short-term forecasting method for dynamic traffic flow based on Stochastic Forest algorithm has the highest accuracy, which is always above 97%. This shows that the method has high forecasting accuracy. The reason is that the method establishes a combination forecasting model, optimizes the output of pre-processing results, and completes the rate forecasting of dynamic traffic flow. The Stochastic Forest algorithm is introduced to train the sampling set of flow rate decision tree and generate short-term flow decision tree to improve the short-term forecasting accuracy of dynamic traffic flow.
In order to solve the problem of low forecasting accuracy in traditional short-term forecasting methods for dynamic traffic flow, stochastic forest algorithm is introduced to forecast short-term dynamic traffic flow. Stochastic Forest algorithm includes classifier, which can effectively synthesize unbalanced data and continuous data. The short-term dynamic traffic flow data has strong randomness, so it is more combined with Stochastic Forest algorithm. The experimental results show that the forecasting time of the proposed method is always less than 0.5 s, and the forecasting accuracy is over 97%. The forecasting accuracy is high, which guarantees the unity of theory and practice. It is hoped that this study can provide more technical help for the direction of traffic flow statistics. In the future, it is necessary to design an effective urban traffic guidance plan based on this method to alleviate the serious traffic congestion and achieve green traffic.
Footnotes
Acknowledgments
This work was supported by Open fund project of intelligent transportation technology Key Laboratory of transportation industry, “Research on basic database of traffic sign information”.
