Abstract
In order to overcome the problems of low accuracy and time-consuming of traditional prediction methods for short-term traffic flow in urban, a prediction methods for short-term traffic flow in urban based on multiple linear regression model is proposed. The corresponding data attributes of short-term traffic flow in urban are selected by traffic operation status, and used as the original data of traffic flow prediction. According to the selected attributes, spatial static attributes data and traffic flow dynamic attributes data are collected, and fault data are identified and repaired. A multiple linear regression model for prediction of short-term traffic flow in urban is constructed to realize the prediction of short-term traffic flow in urban. The experimental results show that, compared with other methods, the average prediction accuracy of the proposed method is as high as 98.48%, and the prediction time is always less than 0.7 s, which is shorter.
Keywords
Introduction
With the improvement of people’s living standard and urban modernization level, the problems of energy shortage, environmental pollution, traffic safety and traffic congestion are becoming more and more serious. Globally, the traffic problem has become more and more serious, and China is no exception. There are some problems in domestic transportation system, such as imperfect management and maintenance system of traffic facilities, lack of function of road system, unreasonable construction of traffic network, etc [1]. The solutions to traffic problems in China mainly include the following aspects: strengthening the construction of transportation infrastructure, expanding road capacity, applying intelligent transportation system and improving the utilization rate of transportation network. However, due to many factors, such as unreasonable urban road planning, difficulties in urban road reconstruction, less space for urban road expansion, too fast growth of the number of motor vehicles, etc., it undoubtedly causes more obstacles to solving traffic problems. Intelligent transportation system (ITS) has attracted much attention in recent years, and has even become a research hotspot to solve traffic problems [2]. Under the constraints of various realistic factors, ITS has become the acknowledged best way to solve urban traffic problems.
ITS is a new type of transportation management system, which integrates computer network, control, communication, information and other high-tech technologies with traditional transportation. The system integrates socialization, intelligence, information and data. It can improve the utilization rate of transportation network and give full play to the functions of infrastructure [3]. ITS regards the use, analysis, exchange, publication, processing and collection of information as the main line of traffic management and control. It will provide systematic travel services for travelers as the basic goal. Through the coordination and unification of environment, moving target, road, automobile and human factors, it can achieve the comprehensive objectives of reducing environmental pollution caused by traffic factors, reducing traffic energy consumption, reducing the probability of accidents and alleviating road congestion pressure.
The core mechanism of traffic management in ITS is to predict the short-term traffic flow. Because of the change of time, density, flow, speed and other factors, a variety of traffic conditions will appear on the road. Only by accurately predicting the short-term traffic flow according to the changeable traffic conditions, can the safe, orderly and efficient operation of ITS be ensured [4].
Reference [5] presents a prediction method for short-term traffic flow in urban road under mixed conditions. This paper first analyzes the temporal and spatial distribution characteristics of traffic flow parameters in urban expressway under the state of free flow, congestion flow and congestion flow. Based on the traffic flow conservation equation and the velocity dynamic model, the prediction model for short-term traffic flow under three states are established by using the idea of partial differential equations to solve the spatio-temporal discretization. Considering the influence of ramp, lane number change and road gradient, the prediction model for short-term traffic flow is transformed into the state space prediction model for short-term traffic flow to realize the prediction of short-term traffic flow under mixed state. However, the method has the defect of low accuracy. Reference [6] proposes a prediction method for short-term traffic flow in urban based on decision tree. In this paper, decision tree is used to analyze the relationship between short-term traffic flow upstream and downstream of urban roads, and the corresponding rule base is established. Decision tree is used to realize short-term traffic flow prediction. However, this method has the shortcoming of long time-consuming prediction, and it is difficult to achieve the desired implementation effect. In reference [7], a prediction method for short-term traffic flow in urban road based on grey system and neural network is proposed. The grey model is used to fit and predict the actual monitored data, and the prediction value and prediction residual are obtained. The prediction residual is input into the neural network model to learn, simulate and predict the residual. The sum of the residual prediction value and GM model prediction value is taken as the final prediction result. However, although this method has lower prediction time-consuming, it has the defect of low accuracy. Reference [8] proposes a prediction method for short-term traffic flow in urban road based on multi-machine learning competition strategy. According to the characteristics of traffic flow data in urban core area, a multi-dimensional traffic flow data model of multi-related roads in the same area is constructed. On this basis, a prediction algorithm for short-term traffic flow in urban road based on multi-machine learning competition strategy is proposed. The algorithm is used to realize the prediction of short-term traffic flow in urban, but the method has the problems of low accuracy and long time-consuming.
Because the traditional methods are not filtered the traffic flow of the collected data, more not to repair the failure data and recognition, so the problem of low prediction accuracy and long long in this paper, in order to solve the problem, the above problem, as the key of the targeted space static property data collection and traffic flow dynamic attribute data, and the collected data to repair and recognition of fault data, in order to realize the short-term urban traffic flow prediction of high accuracy, low consumption, into the multiple linear regression model, to solve multiple sets of variables linear causal relationship between the complexity of higher problem, Based on the above analysis, a short-time urban traffic flow prediction method based on multiple linear regression model is proposed, which is of great significance for the application of intelligent traffic system, to improve the safety of urban traffic, and to achieve intelligent urban traffic. The overall framework of the method is as follows: The data attributes of corresponding short-term traffic flow are selected through the traffic operation state, and are used as the original data of traffic flow prediction. The corresponding data are collected according to the selected attributes, including spatial static attributes and traffic flow dynamic attributes. the fault data from the collected data are recognized and repaired, to build a multiple linear regression model for predicting short-term traffic flow in urban, and realize the prediction of short-term traffic flow in urban. The accuracy and time-consuming of different prediction methods are compared. Conclusion
Short-term urban traffic flow forecast
Traffic flow data properties
Firstly, the attributes of the corresponding short-term traffic flow data are selected according to the traffic operation status, and used as the original data of the traffic flow prediction to ensure the accuracy of the prediction results of short-term traffic flow.
Data attributes of short-term traffic flow in urban include spatial static attributes and traffic flow dynamic attributes. The specific classification is shown in Table 1.
Specific classification of data attributes of short-term traffic flow in urban
Specific classification of data attributes of short-term traffic flow in urban
The specific content of traffic flow dynamic attributes is shown in Table 2.
Specific contents of traffic flow dynamic attributes
The data attributes of short-term traffic flow in urban collected by the monitor include phase period, green light time, traffic flow, monitor number, collection date, collection time and intersection number, etc. The specific forms of the collected data attributes of short-term traffic flow in urban are shown in Table 3.
Specific forms of data attributes of short-term traffic flow in urban
Phase A represents the East-West straight line, including west-south, east-north, west-east, east-west; phase B represents the North-South straight line, including north-west, south-east, north-south, south-north.
Green signal ratio G and road saturation S are important data in short-term traffic flow predicting in urban. The calculation method is as follows:
Green signal ratio G:
Where, GT represents the effective green light time contained in a phase period; T represents a phase period.
In the above formula, SPT represents idle time; PT represents the time spent by each vehicle passing through the monitor when the road traffic flow reaches saturation; MQ represents the maximum traffic flow per unit time of the lane; Q represents the collection value of the initial traffic flow.
According to 2.1, the data attributes of short-term traffic flow in urban are collected, including spatial static attributes and traffic flow dynamic attributes [9]. Combining automatic acquisition technology with non-automatic acquisition technology, the data of short-term traffic flow in urban are collected. The automatic acquisition technology is used as the main acquisition means, and the non-automatic acquisition technology is used as the auxiliary acquisition means [10]. Automatic acquisition technology includes mobile data acquisition technology and fixed data acquisition technology. The specific data acquisition methods of these two acquisition technologies are shown in Table 4.
Specific data acquisition methods of two kinds of acquisition technologies
Specific data acquisition methods of two kinds of acquisition technologies
Traffic flow fault data identification
The fault data collected from the short-term traffic flow data in urban are to repair and identify, that is, to clean the data. Data cleaning includes missing information completion, redundant information reduction, error information correction, noise elimination and other processes. The specific process of data cleaning is shown in Fig. 1 [11].

Data cleaning process.
First, it needs to identify fault data, including missing data and error data [12]. The expected maximization algorithm is used to identify and fill missing data, while the identification of error data requires the detection of outliers in the original short-term traffic flow data, and the outlier algorithm is used to obtain pseudo-error data. Normal data are extracted by combining traffic flow mechanism method with threshold method.
The detection method of outlier algorithm is interval detection. Before calculating time interval t, the variance σ
t
and mean value
Recognition of error data requires full consideration of the relationship among traffic flow, type of detector equipment and related road grade, so as to obtain the threshold of short-term traffic flow data of each traffic detection equipment [13]. The specific method of identifying false error data by combining traffic flow mechanism method with threshold method is shown in Table 5.
Specific methods of identifying pseudo-error data by combining traffic flow mechanism method with threshold method
Repairing fault data, that is, restoring fault data to original or more approximate values, including smoothing filtering, missing data repair and error data repair [14]. The fault data is repaired by historical estimation method, and the process of repairing the fault data is shown in Fig. 2.

Error data repair process.
In the second case, the calculation formula of traffic flow estimation is as follows:
Where,
In the third case, the formula for calculating the results of data restoration is as follows:
Where
The specific process of missing data completion is shown in Fig. 3.

Specific flow of missing data completion.
The missing data completion structure of EM algorithm is shown in Fig. 4 [15].

Lost data completion structure of EM algorithm.
Then the adaptive exponential smoothing algorithm is used for data smoothing filtering. The specific formulas are as follows:
Where,
The reason why the linear regression model is used to predict the short-term traffic flow is that the multiple linear regression model can establish the linear causal relationship among multiple variables. When affected by multiple factors, the accuracy of the prediction results is high, and it is simpler and more convenient. Because there are many variables in urban traffic flow predicting, the adaptability of using this model to realize traffic flow predicting is high. Multivariate regression analysis can accurately measure the correlation degree and regression fitting degree between various factors, and effectively improve the accuracy of prediction results.
According to the results of data preprocessing of short-term traffic flow in urban, a prediction model for short-term traffic flow in urban is established based on multiple linear regression model to realize the prediction of short-term traffic flow in urban [17]. The prediction model for short-term traffic flow consists of three modules, and its specific workflow is shown in Fig. 5 [18].

Specific workflow of predicting model for short-term traffic flow in urban.
The three modules of the predicting model for short-term traffic flow are the predicting module for traffic flow in upstream section, the predicting module for traffic flow in the predicting section and the calculation module for historical average traffic flow in the predicting section [19]. The concrete formulas for predicting short-term traffic flow in urban through these three modules are as follows:
Where, T represents the calculation results of the traffic flow prediction module in the upstream section; R represents the calculation results of the traffic flow prediction module in the predicting section; F represents the calculation results of the historical average traffic flow calculation module in the predicting section; a represents the prediction results of the short-term traffic flow; ω1, ω2 and ω3 represent the prediction coefficients respectively; a1 - a3 and q1 - q3 represent the average traffic flow at time t on the k-th day; b represents the average traffic flow at time t - 1 on the k-th day; and q0 and h0 represent the average traffic flow at the upstream section and the historical traffic flow at the upstream section [20].
In order to test the performance of the prediction method for short-term traffic flow in urban based on multiple linear regression model proposed in this paper, a comparative experiment is designed. The specific experimental scheme is as follows: Firstly, the experimental location and environment are designed. In order to improve the accuracy and authenticity of the experiment, the urban traffic flow data of a certain section of a local city is obtained, which is taken as the experimental sample data. The length of the experimental data conforms to the standard of simulation experiment. According to the value of variables in the urban traffic flow group, the initial value of parameters is obtained, and the initial parameter design of the experiment is completed, so that the experimental parameters are infinitely close to the actual traffic flow data. By using the methods in reference [5], reference [6], reference [7], reference [8] as experimental comparison methods, the prediction accuracy of different research methods is compared, and the higher the accuracy is, the higher the prediction accuracy is; furthermore, the prediction time of different research methods is compared, and the shorter the prediction time is, which shows that the method can quickly obtain prediction results of short-term traffic flow in urban. The specific experimental process is shown below.
Experimental location and data
The location of the experiment is shown in Fig. 6, where intersection A and B are two adjacent intersections. Short-term traffic flow prediction is made for the northern section of intersection A, while the west, north and east sections of intersection B are all related sections of the northern section of intersection A.

Selection of experimental sites.
The data from 15:30 to 18:30 including peak flow period are selected as the historical data of this experiment. The specific data are shown in Table 6. Firstly, the collected data are identified and repaired to ensure the integrity of the experimental data.
Experimental data
(1) Comparison of prediction accuracy
The accuracy data of predicting short-term traffic flow for the northern section of intersection A by different methods are shown in Table 7.
Comparison of prediction accuracy
Comparison of prediction accuracy
According to the prediction accuracy data of short-term traffic flow in urban in the northern section of intersection A in Table 7, the average prediction accuracy of the method in reference [5] is 76.88%; that of the method in reference [6] is 77.52%, the method in reference [7] is 64.05%, the method in reference [8] is 82.38% and the prediction method for short-term traffic flow in urban based on multiple linear regression model is 98.48%. Therefore, the main reason for the accuracy of the proposed method lies in the collection of spatial static attribute data and dynamic attribute data of traffic flow according to the selected attributes, so as to identify and repair of fault data in the collected data. Finally, a multivariate linear regression model for prediction of short-term traffic flow in urban is constructed to improve the accuracy of the prediction.
(2) comparison of prediction time
In order to further validate the advantages of the proposed method, prediction time test is needed. The test results are shown in the following figure.

Comparison of prediction time.
Analysis diagram 7 shows that the method of literature [5] the forecast of time-consuming within 2.8 s to 4.2 s, the method of literature [6] of forecasting takes 3.7 s to 4.5 s, is the highest of five kinds of methods, the method of literature [7] forecasts take within 1.9 s to 2.9 s, the method of literature [8] predict time-consuming fluctuations within 1.1 s to 2.9 s, and the proposed method of forecast time consuming under 0.7 s, is forecasting takes the lowest in five ways, so through the comparison shows that in this paper, the research method of forecasting takes the shortest, the short-term urban traffic flow time consuming task. The reason is that this method can identify and repair the fault data in the collected data according to the collection of static spatial attribute data and dynamic traffic flow attribute data of selected attributes, and build a multiple linear regression model for short-term urban traffic flow prediction, which reduces the prediction time and improves the prediction efficiency.
With the acceleration of urban construction and the gradual development of intelligent transportation system intelligent traffic management and advanced urban traffic prediction system are playing an increasingly important role in urban traffic management. Short-time urban traffic flow prediction is the most important to realize intelligent urban traffic. In order to solve the problem of high complexity of linear causality among multiple variables, a multivariate linear regression model is introduced and a short-time urban traffic flow prediction method based on multivariate linear regression model is proposed. Spatial static attribute data and traffic flow dynamic attribute data are collected, and fault data are repaired and identified. The short-term traffic flow in urban is predicted by constructing multiple linear regression model. The experimental results show that the average prediction accuracy of this method is up to 98.48%, and the prediction time is always below 0.7 s, which is of high prediction accuracy and low prediction time. This method is of great significance to the application of intelligent transportation system, and the realized intelligent transportation can improve the important guarantee of driving safety. In the future, it is necessary to conduct in-depth research on short-time urban traffic flow prediction methods. With the collaborative development of information technology and mobile technology, an App for short-time urban traffic flow prediction is designed to enable people to obtain urban traffic flow prediction information at any time and realize intelligent urban traffic.
Footnotes
Acknowledgments
This work was supported by National Natural Science Fund under grant no. 61672304.
