Abstract
Road traffic safety has always been a topic of special concern for countries around the world. With changes in weather conditions and a significant increase in the number of highways and vehicles, traffic accidents occur frequently. To effectively predict traffic accidents and reduce the amount of accidents, a support vector machine based traffic accident prediction model under extreme weather conditions is proposed. Firstly, the relationship between different extreme weather conditions and road traffic events is analyzed. Then, genetic algorithms are used to optimize accident data, and support vector machines are fused for accident prediction. The results show that on the AVOID dataset, when the iteration reaches the 32nd time, the loss function value of the research method is the smallest, 10−5; the loss function of other algorithms is significantly larger. When the accuracy of the research method is 80% and 90%, the corresponding recall rates are 87.89% and 79.98%, respectively. Among the prediction errors of different algorithms, the max relative error of the research method is 1.214%, and the mini relative error is 0.0213%. Compared to other models, the overall error of the research method decreased by 2.021%. In the prediction application of non-serious accidents, the research method has the highest prediction accuracy of 94.53%. The above results indicate that the proposed accident prediction model can accurately predict different types of accidents under extreme weather conditions, providing certain technical support for the prevention and development of traffic accidents.
Keywords
Introduction
Road traffic issues have always been of great concern to the general public. While China’s economy is rapidly developing and bringing convenience to humanity, there are also huge security risks hidden. The higher speed of vehicles on expressways, coupled with an increase in the number of different vehicles, significantly increases the probability of traffic accidents.1,2 The World Health Organization released the “Global Road Safety Status Report 2015” stating that although road safety has improved, about 1.25 million people still die from road traffic accidents every year; in 2017, Xinhua News Agency, Manila, May 5, based in the Philippines The World Health Organization’s Western Pacific Regional Office in Manila, the capital, issued a report on the 5th, pointing out that in the Western Pacific Region, which includes 37 countries and regions including China, Japan, South Korea, and Australia, an average of about 330,000 people die in traffic accidents every year. The main factors that can trigger road traffic can be divided into two types: one is the external environment and driving vehicles, and the other is the driver himself. And among them, the environment is an important factor that causes traffic accidents. Changes in external weather conditions can directly or indirectly affect the friction coefficient between vehicles and the ground, and have an impact on the physical and psychological condition of drivers, ultimately leading to traffic accidents.3,4 To effectively reduce and prevent traffic accidents, it is necessary to know in advance the probability of accidents occurring within a specific time frame to take timely and effective measures for prevention. Based on certain conditions, it is possible to predict the possibility of traffic accidents in advance, thereby ensuring personal safety and reducing property losses. Therefore, accurate expressway traffic accidents prediction is particularly important. How to establish accurate traffic accident prediction (TAP) models has become a hot topic of concern for many scholars. Based on the analysis of big data, it is a feasible method to deeply understand the various factors that cause traffic accidents and apply different machine learning analysis of algorithms to analyze the relationship between extreme weather and traffic accidents, to realize the prediction of accidents. 5 Therefore, with extreme weather as the background, an expressway TAP model based on support vector machine (SVM) is proposed to further reduce the probability of accidents. The article mainly consists of four parts. The first part is an analysis of the current domestic and international situation related to SVM algorithms and TAP models. The second part analyzes the correlation between extreme weather and traffic accidents, and proposes a TAP model combining blending genetic algorithm (GA) and SVM. The third part is to verify the performance and practical application effect of the expressway TAP. The fourth part is a summary statement of the entire article.
Related works
SVM is a kind of generalized linear classifier that classifies data twice according to supervised learning. This method has good robustness and excellent generalization ability. Therefore, this method has been widely applied by scholars to typical regression and classification problems. Essam and other scholars adopted the method of machine learning (ML) model based on SVM, artificial neural network (ANN), and long short-term memory (LSTM) algorithm when studying the prediction of suspended sediment load (SSL). The research results indicated that the ANN3 prediction model had the highest reliability and high prediction accuracy. 6 Li and other scholars adopted a diagnosis model based on wavelet packet decomposition method and SVM when studying how to improve the accuracy of diagnosing mechanical faults of high-voltage circuit breakers. The research findings indicated that the improved fault diagnosis model had higher diagnostic efficiency and accuracy than the traditional methods. 7 Zhu and other scholars adopted magnitude estimation model combining transfer learning and single-station recorded SVM (TLSVM-M) when studying methods to improve the accuracy and speed of rapid magnitude estimation. The SVM estimation model has been trained and monitored on datasets in the Sichuan Yunnan region, and the prediction models have shown excellent performance. The research results indicated that the prediction error of the magnitude estimation model was all within 0.3 fluctuations, and the prediction model could quickly estimate the magnitude of small and medium-sized events in the Sichuan Yunnan region. 8 Scholars such as Zhu and Li used the SVM-based magnitude estimation (SVM-M) model to improve the speed of earthquake magnitude prediction. The SVM-M prediction model has shown good performance after being trained and detected with strong motion data from K-NET in Japan. The research outcomes expressed that the prediction error of the magnitude prediction model decreased with time, and the prediction speed also improved. 9 Scholars such as Lei used an SVM-based detection model to improve the accuracy of structural damage detection in steel frame models. The experiment findings indicated that the damage detection model had good performance in high-dimensional input feature vector damage feature recognition, and even in the case of incomplete measurement, the detection model still exhibited high detection accuracy. At the same time, the detection method also had excellent noise resistance and robustness. 10
Traffic accidents cause nearly 100000 casualties annually, with some of them caused by extreme weather. Traffic accidents not only bring a large amount of economic losses but also cause countless families to perish. Therefore, how to reduce the occurrence and casualties of traffic accidents has always been a concern for many scholars. Catalano and other scholars used a network-based simulation platform to predict the frequency of road accidents by using a microeconomic model when studying the main factors that affect road accidents. The experimental findings expressed that the model had high accuracy in geographical blocks at the national level. 11 When studying short-term traffic flow prediction, scholars such as Sun used a combination of K-means clustering algorithm and gated recursive unit (GRU) to establish a short-term traffic flow prediction model. After training and testing on the California traffic flow dataset, the prediction model has shown good performance. The research results indicated that the traffic flow prediction model took into account the diversity of data while maintaining accuracy. 12 Scholars such as Ospina Matteus used a safety performance function (SPF) based on negative binomial regression combined with Bayesian empirical methods to predict the risk of short-term motorcycle accidents. The research outcomes denoted that the SPF had high accuracy in predicting the risk of motorcycle accidents. 13 Wang and other scholars used the convolutional model based on the spatiotemporal graph of changes (VSTGC) method when studying the prediction of short-term traffic. The research results indicated that after repeated detection of real data, the VSTGC prediction model exhibited extremely high accuracy, and compared with other excellent methods, YSTGC showed certain competitiveness. 14 Chung and other scholars used a combination of data statistics and data analysis when studying the issue of driver risk perception. Based on the self-reported survey results of Korean professional drivers (n = 388) and their official traffic accident records, this study analyzed the factors that affect driver risk perception. The research outcomes denoted a significant correlation between risk perception and professional pride, job satisfaction, and active driving. 15
Literature review classification table.
Prediction model for expressway traffic accidents under extreme weather conditions based on SVM
With the frequent occurrence of global climate change and extreme weather events, traffic safety issues on highways are becoming increasingly prominent. Extreme weather conditions, such as rainstorm, haze, strong wind, and low temperature, will greatly affect drivers’ driving behavior and visual perception, thus increasing the risk of traffic accidents. In order to reduce the probability of risk occurrence, the experiment aims to integrate GA and SVM to construct an accurate and reliable prediction model, in order to take timely preventive measures, reduce the occurrence of traffic accidents, and ensure the safety of road users’ lives and property.
The correlation between different meteorological factors and highway traffic accidents under extreme weather conditions
While driving on the highway, traffic accidents will inevitably occur. Traffic accidents are caused by the interaction of multiple factors, including driver behavior, road conditions, vehicle condition, traffic rules and regulations, and environmental factors. When the environment is in extreme conditions, such as rain, snow, fog, ice, and strong winds, road safety will be affected. These conditions can affect driver visibility, vehicle handling, and adhesion to the road surface, thereby increasing the risk of an accident. For example, slippery or icy roads can cause a vehicle to lose control; fog can reduce visibility and increase the likelihood of a collision. If the probability of traffic accidents can be predicted in advance under different circumstances, it can effectively reduce the probability of accidents and casualties, to comprehensively grasp and predict the probability of expressway traffic accidents under extreme weather conditions.
16
The experiment analyzes the correlation between weather factors and the occurrence of expressway traffic accidents. Traffic accidents with a large amount of data are often accompanied by some noise data, which has a significant impact on the prediction of road traffic accidents. Therefore, the experiment uses data preprocessing technology to delete useless data. The data preprocessing process is shown in Figure 1. Preprocessing process.
The normalization between the max and min values in the preprocessing is called dispersion normalization. A linear conversion method converts the original data values between [0, 1] through processing. The normalization of zero and mean values is also known as standard deviation normalization. Decimal scaling normalization maps different attribute values of the dataset to [−1, 1], which is achieved by converting the decimal places of the dataset attributes. The three data calculation methods are shown in equation (1).
In equation (1),
In equation (2),
In equation (3),
Research on optimization of highway traffic accident characteristics based on improved GA
The main purpose of modeling the occurrence rate of expressway traffic accidents is to summarize and process the data. The higher the classification accuracy of the processed data, the higher the classification accuracy of traffic accidents. The selection of feature subsets and the setting of parameters will affect the effectiveness of classifier classification; at the same time, more original data will reduce the classification accuracy. To process the redundant information in the initial features, the experiment plans to choose an artificial intelligence method to process it. The GA algorithm is an optimization algorithm that simulates the evolutionary mechanism of nature. GA has good global search capabilities and can comprehensively search within the solution space to find the global optimal solution or a solution close to the optimal solution. It is especially suitable for solving complex, nonlinear, and complex problems such as traffic accident incidence prediction under extreme weather conditions. It is a multi-modal and high-dimensional problem; it also has good adaptability and robustness, and can adapt to dynamically changing environments and different types of optimization problems. Therefore, the study first uses improved GA to analyze the characteristics of highway traffic accidents and optimize selection.
19
The GA is shown in Figure 2. Flow chart of GA.
It cleans up incomplete data during traffic accidents. Firstly, the data are merged with practice between attributes, and then the useless attributes and missing more than half of the attributes are deleted to effectively utilize the data. For a small amount of missing data, the method of filling missing data, deleting data with incomplete information and duplicates, deleting data that does not conform to logic shall be adopted for processing, and the processed data shall be discretization and normalized. After preprocessing the data, the obtained effective data need to be classified, so the K-means clustering method is introduced in the experiment, divides the training samples into
In equation (4),
In equation (5),
In equation (6), Cross, variation, and evolution process.
From Figure 3, the parent generation obtains a new offspring through a single-point crossing and mutation. At this time, the offspring will replace the parent generation to generate chromosomes for the next evolution and calculate the fitness value to select the optimal value. Firstly, it sets up chromosomes and studies using binary strings to represent chromosomes in GA. Then, it converts the binary system to the decimal system to represent the actual value of the parameter. The expression of the chromosome decimal string is shown in equation (7).
In equation (7),
In equation (8),
Design of highway traffic accident prediction model integrating GA-SVM
Based on the optimized parameters and features of expressway traffic accidents obtained by GA, the experiment integrates SVM to map the obtained feature vectors and introduces radial basis function kernel (RBF kernel). The RBF kernel can map nonlinear problems into high-dimensional space to form linear problems, and the values of the kernel matrix after Gaussian kernel mapping are all between [0, 1], which is more convenient for calculation.
20
The representation of kernel functions in the SVM method is shown in Figure 4. Nonlinear partition of hyperplane.
In Figure 4, the addition of kernel functions makes it possible to map the original input variables into the high-dimensional space and simplify the problems that need to be calculated. The calculation method for the improved SVM obtained is shown in equation (9).
In equation (9), Specific process of SVM prediction model.
Due to the possible presence of noise in the training data, a study is conducted to add relaxation variables to the constraint conditions, and to add penalty factors to the objective function, combined with the Kuntak theorem. The specific calculation form is shown in equation (10).
In equation (10),
In equation (11),
In equation (12),
In equation (13),
In equation (14),
In equation (15), Improved SVM model overall framework.
Performance test and application effect of highway traffic accident prediction model
In recent years, with the continuous increase of traffic flow, the incidence rate of highway traffic accidents has also been on the rise, which has attracted widespread attention. In order to predict and prevent traffic accidents more accurately, various traffic accident prediction models have emerged. The performance test and application effect of the highway traffic accident prediction model constructed experimentally are analyzed, in order to provide theoretical support and practical guidance for improving highway traffic safety.
Performance comparison of different traffic accident prediction models
Basic hardware environment of the experiment.
To ensure the smooth progress of the experiment, this study sets the iteration times for all algorithms to 150, with a population size of 100, and repeated 30 operations on the different basic functions obtained. The AVOID and NGSIM were selected as experimental datasets to conduct performance tests on different algorithms applied to the dataset. The raw data of the AVOID dataset comes from the National Highway Traffic Safety Administration (NHTSA), California Department of Motor Vehicles (CADMV), Global News and Social Media, Google Maps, OpenStreetMaps (OSM), and OpenWeatherAPI. Through cross-modal processing, automatic processing, manual verification, and other multiple processes to ensure the accuracy of the data, a total of 2000 pieces of data on highway traffic accidents in the dataset were taken as the verification set. The NGSIM dataset is collected from the United States. The dataset contains data from two highways (US-101, I-80) and two urban roads (Lankershim and Peachtree). The collection time of each road is 45 min; take the relevant data from the dataset. A total of 2000 pieces of data are used as the training set. In both the AVOID dataset and the NGSIM dataset, there are 1200 relevant highway traffic accident data, showing a 1:1 ratio. This means that the two datasets selected for the experiment are both balanced datasets. The fitness changes of the four algorithms are shown in Figure 7. Changes in fitness functions of four methods.
Figure 7(a) shows the change of loss function value on AVOID dataset. The research method had the mini function value from the 32nd iteration of the system. This value infinitely approached 10−5. However, BP-SVM, GA-SVM, and literature [21] all started to have stable loss function values after the system iterated 40 times, and they were far greater than 10−5. Figure 7(b) shows the changes in algorithm function values on the NGSIM dataset. When the system was trained 28 times, the research method began to approach 10−7. At this time, the function values of other algorithms were still undergoing twists and turns, and were far greater than the research methods. By comparing the above results, the research method had smaller loss function values and achieved a stable rate of convergence in a short time. Test results on two independent datasets show that our model has stronger generalization ability and can predict the occurrence of accidents more accurately. Next, the AVOID dataset was used as the main experimental dataset to compare the accuracy and recall of the four algorithms. The specific results are shown in Figure 8. PR curve changes for the AVOID dataset.
The PR curve variation in Figure 8 shows that when the precision of the research method was 80% and 90%, the corresponding recall rates were 87.89% and 79.98%, respectively. Looking back at the literature [21], the recall rates at this time were 80.56% and 68.47%, respectively. At the same time, when the precision rate was 0.9, the corresponding recall rates for BP-SVM and IGA-SVM were 57.74% and 72.21%, respectively. The above results indicated that in the same experimental environment, the precision and recall of BP-SVM and IGA-SVM were significantly lower than those of the research method. However, reference [21] was slightly superior to the above two methods, which also indicated that the research method had higher prediction precision in TAP systems and could more precisely predict the probability of accidents, thereby ensuring the safety of drivers. In addition, the research method was compared with reference [21] and IGA-SVM models, which had better PR curve values, to analyze the prediction error of traffic accidents. The results are shown in Figure 9. Changes in error values of different algorithms for TAP.
Figure 9 shows the prediction error results of three different models. The overall error of the research methods in predicting traffic accidents in China has been relatively small; during the operation of the entire system, the max relative error was 1.214%, and the mini relative error was 0.0213%. The prediction error of the IGA-SVM model fluctuated greatly. The max error could reach 3.235%. Compared to the research method, the overall error has increased by 2.021%. This might be because the research method could fully utilize the computational power of SVM algorithm on nodes and accelerate the prediction performance significantly. The method in reference [21] showed a max prediction error of −1.956% and a mini relative error of 0.0637% during the prediction. From this, the system prediction error under the operation of the research method was the smallest, and it had excellent computing power. It can be seen that compared with other models, the research model shows significant superiority in multiple performance indicators, including higher accuracy and lower error rate; at the same time, the system prediction error under the operation of the research method is the smallest and has very superior computing power.
Application effects of different traffic accident prediction models
To further evaluate the calculation and prediction effects of different algorithms, an analysis was conducted on the probability of traffic accident risk occurrence on a certain road section as a function of the number of system iterations, as shown in Figure 10. Change in probability of accident risk occurrence with iteration number under different algorithms.
In Figure 10, the probability of accident risk obtained by the prediction system varied significantly during the operation of four different algorithms. The fitting degree between the research method and the actual value under operation was as high as 98.87%, and the predicted results were the closest to the actual ones. The fitting degree between the algorithm in reference [21] and the actual value was 98.22%. The actual fit between BP-SVM and IGA-SVM was only 95.34% and 92.78%, indicating a significant difference between the predicted results and the actual data. The above results indicated that the prediction ability of the method proposed by the research institute was significantly superior to other algorithms, and it could effectively calculate the probability of traffic accidents, with the best overall performance. At the same time, the research model adopts a number of innovative technologies and methods, making it more flexible and adaptable when solving complex problems. At the end of the experiment, four algorithms were applied to predict the accuracy of traffic accident types under extreme weather conditions using different methods. The specific results are shown in Figure 11. Accident prediction accuracy of different models under different input characteristics.
Figure 11 shows the accident prediction accuracy of different models under three different input features, including non-severe, severe, and overall accidents. There are significant differences in the accuracy of classification prediction among the four models. Due to the sufficient number of sample datasets, the four different models have higher accuracy in predicting non severe accidents, with the order of research methods > literature [21] > IGA-SVM > BP-SVM. The prediction accuracy of the research method is the highest, at 94.53%; the maximum difference between BP-SVM and BP-SVM with the minimum prediction accuracy is 23.38%. The research method still has the highest prediction accuracy for both severe accidents and overall accidents, far surpassing the other three algorithms. This indicates that the research method can be applied to predict different types of traffic accidents and provide reasonable judgments, with universality. Due to the versatility and adjustability of the model, it can be easily transferred to different fields and tasks, showing excellent adaptability. After a series of rigorous experimental verifications, the model performed excellently on various evaluation criteria, far exceeding other competitors.
Conclusion
Over the years, the changing weather conditions have led to frequent accidents on expressways, and the country’s attention to traffic accidents has gradually increased. To effectively predict the occurrence of traffic accidents and achieve intervention and prevention of traffic accidents, this study proposed an expressway TAP method based on SVM. During the process, the relationship between the accident occurrence rate and different extreme weather conditions was first analyzed. Then, GA was used to extract the characteristics of the accident occurrence, and experimental parameters were optimized. Finally, SVM was introduced to establish a prediction model, and a system for analyzing and predicting the impact of extreme weather conditions on traffic accidents was constructed. The data showed that on the NGSIM dataset, when the number of system training was 28, the loss function of the research method approached 10−7. When the accuracy of the research method was 80% and 90%, the corresponding recall rates were 87.89% and 79.98%, respectively. In the fitting analysis of different algorithm predictions, the fitting degree between the research method and the actual value was as high as 98.87%, and the predicted results were the closest to the actual results. At the same time, the research method had the highest prediction accuracy for different types of accidents, with a prediction accuracy of 94.53% for non-serious accidents. The above results indicated that the proposed method could accurately predict expressway traffic accidents and had better application effects for different types of accidents. Using the model constructed by the institute to predict the occurrence of road traffic accidents can improve road safety to a certain extent; it helps the national traffic control department to formulate traffic rules and indirectly affects the irregular behavior of drivers and pedestrians, ultimately reducing traffic accidents and improving highway safety. However, the study did not apply the constructed method to experiments under extreme weather conditions in different regions, which may result in regional differences in the results obtained. Further and more detailed experiments are needed in the future.
Statements and declarations
Footnotes
Conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by 2017 Sichuan Provincial Department of Education Research Project: “Investigation, Statistics and Analysis of Transportation Volume in the Express Delivery Industry” (Project No: 17ZB0290).
