Abstract
Autonomous vehicles offer the potential to drastically decrease the number and severity of road accidents. Most accidents occur due to human inattention or wrong decisions, whose factors can be eliminated by autonomous vehicles. However, not all accidents are avoidable through automation. Complying with the law is not always enough, there can be environmental problems (bad weather, road surface, etc.) causing accidents, and other actors (human drivers, pedestrians) making mistakes. These are unexpected situations, and the real-time sensors of vehicles are currently limited in their ability to predict them (a slippery road surface for example) in time, and deliver a programed response to a dangerous situation. This paper presents a method based on the analysis of historical accident records, to find danger zones of public road networks. A further statistical approach is used to find the significant risk factors of these zones, which data can be built into the controlling algorithms of autonomous vehicles, to prepare for these situations and avoid, or at least decrease the seriousness, of the potential incidents. It is concluded that the proposed method can find the black spots of a given road section and give assumptions about the main local risk factors.
Introduction
A great deal of scientific and engineering attention has been directed towards the rapidly developing area of automated driving. One of the great promises of this novel technology is the elimination of road accidents. According to the International Organization for Road Accident Prevention [1], 90% of road accidents are caused by human errors (drivers or pedestrians). The introduction of self-driven vehicles offers the opportunity to significantly decrease the number of accidents or at least the number and severity of personal injuries. We can assume that autonomous devices of the near future will be able to drive without any human assistance, and as expected from a well-designed machine, without making mistakes. These always comply with the law of the land and cannot be tired or careless.
Nevertheless, complying with the law is not enough. There are (and for a long time, will be) environmental problems (bad weather, road surface, etc.), for example, unexpected slippery road surfaces can cause road tracking accidents, regardless of the speed limit. Careless pedestrians (or animals crossing the road) can also cause problems. And it is also necessary to prepare for the long-lasting period when automated cars and conventional cars driven by human drivers co-exist in the traffic. As an example, poor light conditions (sun glare, dark, etc.) may not cause a problem for the self-driven car; though, these increase the probability of frontal accidents caused by conventional vehicles. It would be worthwhile to prepare the self-driven car to avoid these kinds of accidents caused by other participants.
The control of autonomous vehicles consists of a combination of several sensors and actuators [2, 3, 4]. The basic principle of operation is based on real-time data obtained from the sensors, processed by sophisticated algorithms and the operation of actuators. However, not all accidents are avoidable through real-time automation. For example, this process flow cannot handle unexpected accident situations like a child running out in front of the vehicle. Automated vehicles have the advantage of faster interpretation of the situation and faster reaction times, to take action and decrease the severity of a collision [5]. However, these may not be enough to avoid the accident.
Human drivers with local knowledge can often perform better in these situations [6], because they have not only real-time sensor information, but some historical knowledge about specific locations (linked to the previous example, a playground without fence near the road). If they know that there were multiple pedestrian collisions somewhere, they will always decrease the speed and try to be more attentive. It is possible to present several other examples (a road section is usually extremely slippery in rainy days, etc.).
Autonomous vehicles also have several potential options to decrease the probability of these accidents. Obviously, decreasing the speed near hazardous locations is the most widely applicable action to avoid the dangerous situations or at least to decrease the severity of an accident. However, there are several specific additional actions to do if there is additional information about the nature of the potential problem. For example, if the autonomous vehicle knows that the road surface quality (unexpectedly slippery surface, potholes, etc.) caused several accidents in the past, it should increase the tracking distance, choose safer movement path (the trajectory at curved road segments), change the suspension settings, and postpone dangerous actions like overtaking. Near hotspots where the number of accidents caused by careless pedestrians is high, the car should take several preventive steps, like increasing the volume of the artificial engine sound (in the case of electric cars) or using the headlights instead of the daytime running lights. As a further example, any vehicle has the freedom to choose its position inside its road lane, based on the potential risky situations (to avoid a frontal crash or hitting bicycles, etc.). This list can be ongoing, and it contains some difficult situations. For example, in some cases, the autonomous car should give higher probability to the event that an approaching car from the minor leg of a junction will not stop if the historical accident database shows that there were several accidents in the past because of this issue. These actions should be invisible for the passengers or at least cause as minor inconveniences as possible. However, these are not negligible and can be very effective in the case of unexpected dangerous situations.
This knowledge should be integrated into the control of autonomous devices, using the following consecutive steps:
Use data-mining methods to localise accident hot spot candidates in the accident database. Identify the reasons for these accidents with statistical or pattern matching techniques. Recommend preventive steps based on these results to decrease the probability and seriousness of further accidents.
This paper focuses on the first two steps. The last preventive steps can then be conducted using further procedures. The organization of the paper is as follows. The next section gives an overview of the already existing developments from the fields of both accident hot spot localization and run-time safety systems. Section 3 presents the novel methodology used to find the accident hot spot candidates and statistically analyze them to find the common accident reasons. The next section contains the evaluation based on a real-world example. Finally, the conclusions are drawn from the results and further developments.
Black spot identification
In road safety management, an accident hot spot (also known as accident black spot) is a well-defined location of a public road network, where road accidents are historically been concentrated [7, 8, 9, 10, 11]. The identification of these hazardous sections is one of the most important steps to prevent further accidents or to decrease the seriousness of these. From a theoretical point of view, it is usual, that the number of accidents is higher at these black spot locations than other similar sections of the road network. However, this alone is not a sufficient condition, it is necessary to have one or more risk factors causing these accidents; without these, it is possible that the higher frequency of accidents is purely coincidental (the number of accidents in a given road interval can be modelled by a Poisson distribution).
Accordingly, the results of the various black spot searching methods are just a set of black spot candidates, and it needs further analysis about the risk factors to make the final decision that is it a real black spot or not. The objective of this paper is to help the autonomous vehicle to take the appropriate preventative actions to avoid accidents. The main steps of this process are similar:
Identify black spot candidates using historical data and make assumptions about the risk factors. Based on these, the self-driven vehicle will know where the dangerous areas are and what actions to do.
There are various methods contained in extensive literature, to find accident black spot candidates. All of these are based on historical road accident data. Nowadays, there are several large, reliable and detailed databases about the accidents of the public road network, handled by the national governments. Thanks to this, road safety experts can use several ideas from related fields (statistics, data mining, pattern recognition) to find interesting patterns in these databases.
Basically, black spot searching can be treated as a simple road accident density calculation with some limits, but there are several more advanced statistical and data mining procedures. Some statistical methods are based on the classical statistical confidence intervals comparing the observed crash count to the group of comparison sites [12]. It is worth highlighting the Empirical Bayes [13] method, which combines the information contained in accident counts with the information contained in knowing the safety of “similar” entities. It gives precise results in the case of short time intervals, and it is not sensible for the regression to the mean effect. On the other hand, it needs a good accident estimation function and additional data to determine the “similarity” of sites. Based on the fact that black spots can be considered as places where the density of accidents is higher than average, there are several data-mining based approaches to find them. Geographical Information Systems (GIS) makes it possible to run spatial clustering algorithms like Kernel Density Estimation, K-means [14] or DBSCAN [15]. One of the benefits of these procedures is the possibility to find black spots covering multiple roads (roundabouts or junctions) and the easy visualization of the results. However, some critiques challenge the fact that spatial methods treat discrete events as a continuous surface, and it is hard to interpret some derived parameters (for example, how to define accident density in a spatial environment). All of these have their area of use, a method well applicable for highways would be not efficient in built-up areas.
One of the most widely used techniques is the sliding window method [16, 7, 17] based on the following steps:
Set window position to the beginning of the examined road. Count the number of accidents in the section covered by the window. Weight the accidents with constant factors, if necessary. Save the segment as a black spot candidate if the weighted number of accidents is higher than a given threshold. Move the window to the next position (to the location of the next accident). Repeat steps 2–5 until the window reaches the end of the road.
The input parameters of the process are the accident database, section length, weighting factors, and the threshold. The effect of small section length value results in a set of very dangerous local areas; while using larger section length values results in more global problematic areas. The ratio of weighting factors makes it possible to choose the significance of seriousness. It is usual to give multiple times greater weight values for fatal accidents than for accidents with no or light personal injury. However, some experts disagree that because, in black spot management, the number of accidents at each black spot is usually too low to permit a meaningful consideration of accident severity [18]. And last, the threshold parameter cannot be clearly defined without knowing the exact purpose of the examination. Lower threshold results in a lot of less dangerous places; in contrast, higher threshold value leads to less but the most dangerous areas.
There are many variants and improvements to the basic sliding window method [14, 19]. The most essential modification is the usage of variable window length. This has several advances, experts do not have to set the window length, it is enough to set a minimal and maximal value. Moreover, the method can adapt the length of the sliding window to the patterns of accidents, due to this, it is able to find small local black spot candidates and larger ones too. Another question can be that the sliding window moves with overlapping or not; and in the first case, it is necessary to manage the overlapping black spot candidates (as a big one, or multiple distinct ones).
Some of the researchers proposed empirical Bayesian methods, which combines the benefits of the historical and the predicted accident frequencies. These are weighted in a statistical model based on the reliability level of the predicted, which is derived from a safety performance function developed from historical accident data [20].
There are several Kernel Density Estimation (KDE) based methods for the same purpose [21, 22, 14, 23]. In this case, the location of the black spot candidate is represented by a virtual point. Its density is determined according to the distances of the road accidents from this point.
The already presented micro-level road crash prediction models are traditionally reactive in nature [24]. It is also worth mentioning the related work on the field of macro-level safety performance functions [25]. These macroscopic methods have high efficiency at integrating zone-level features into crash prediction models to identify hot zones in large areas [26].
Several methods exist based on clustering techniques. In data mining, a cluster is a group of objects, which belongs to the same class (they are similar to each other and differ from items outside the cluster). This concept corresponds with the black spot definition, similar accidents (where similarity based on the location of the accidents and on similar risk factors) are considered as one cluster. The most popular clustering method in the field is the K-means, but the authors have several promising results with the DBSCAN data-mining algorithm [15].
Although the sliding window method faces a large number of critics because of its limitations, on the other hand, it can overcome several clustering techniques in some type of roads. It is road section based which is ideal for the autonomous vehicle control; because in this case, it is enough to analyze the actual (or planned) route, not the whole country. Most of the data mining based, clustering methods are not applicable in the case of a small number of accidents. The sliding window can solve this issue using the appropriate window length and threshold parameters. The method has the additional advance that it has no high computational demand and it is based only on the historical accident database.
Accident prevention by autonomous vehicles is a widely discussed topic. Unfortunately, most papers in the field deal with topics of the far future (when most of the vehicles will be self-driven and all of them will be a part of a densely connected network). Today, implementation level developments are far from this, but there are several already existing technologies and researches. These are not closely related to the autonomous vehicles, currently operating accident preventing technologies are typically built into traditional vehicles: braking assistants, etc. Nevertheless, it is worth considering these because most of the presented methods are applicable for self-driven vehicles too.
There are two main classes of autonomous accident prevention systems: passive methods send notifications to the driver but do not perform any operation and the second is active methods, which are able to perform interventions to avoid accidents. The benefits of passive methods are already proven [27], and it is obvious that these have a large impact on accident prevention. As an example, forward collision/mitigation systems potentially could prevent/mitigate up to 1.2 million crashes in the USA each year [27]. Harpen proves [28] that the cost-benefit ratio is also positive.
In the field of active methods, one of the most researched topics is the development of brake assist systems (ABS), where the potential benefits are the lower risk of injury, and the less serious injuries of the pedestrians [5]. Forward-looking crash avoidance systems are continuously scanning the road in front of the vehicle, and in the case of any other vehicle or pedestrian detection, these take the appropriate action (brake enforcement or autonomous emergency braking). Bálint et al. [29] present a test-based methodology for the assessment of pre-crash warning and braking systems with very promising results. Most developments in this topic are based on real-time data from the sensors of the vehicle (typically from radars and cameras), not using the historical accident records of the past to fine-tune the mechanisms.
There are also papers from the sub-field of run-time crash prediction models. Hossain et al. [30] conduct a comprehensive review of existing real-time crash prediction models. These are based on the hypothesis that the probability of a crash occurring on a specific road section within a very short time window is predictable using real-time sensor data. The already existing methods typically use several sensors and make real-time decisions based on these. However, the methods usually do not use the already existing accident databases as an input.
Lenard et al. [31] present a method for identifying the common accident scenarios to help the development of autonomous emergency braking protocols. They used data-mining tools (hierarchical ascending method) for the accidents of two British databases, filtered by given preconditions (urban area, in daylight and fine weather where pedestrians walk across a road). Two test configurations were defined for each dataset to represent the conditions of the most common accident scenarios. They were able to define some major accident scenarios and classify pedestrian accidents into these groups. A major limitation was that the results did not contain any location information. However, these scenarios would be useful in the general training phase of a self-driven vehicle, but cannot be used as real-time accident prediction.
The objective of Nitsche et al. [32] was similar, the development of a data analysis method to identify the critical pre-crash situations at T- and four-legged intersections, as a basis for testing the safety of self-driven vehicles. They used a k-medoids based method for clustering crash data into distinct partitions and then applied the association rules algorithm to each cluster to specify the driving scenarios. The dataset consists of one thousand junction crashes in the UK. As a result, the paper presents thirteen crash clusters, describing the main pre-accident situations.
Methodology
Accident reason determination
The localization of road safety hotspot candidates must be followed by the determination of the main accident reasons. Road safety experts use these experiments to design the necessary action plans to decrease the probability of further accidents. The outcome of the analysis can be one of the following options:
There is no significant unexpected pattern in the accident attributes, which means that the potential hot spot candidate is not really an accident black spot, the increase in the number of accidents is just a coincidence. It is possible to recognize a special pattern in the accident attributes, which leads to the identification of one or more reasons. These reasons would be environmental (slippery road, poor visibility), weather (bad lighting conditions, heavy raining) or human errors (wrong speed, tracking distance). Some of these factors can be improved by direct actions (repairing the road surface, improving lighting conditions), and some by indirect actions (placing warning signs, education of drivers).
From the viewpoint of autonomous devices (and this paper), the required actions for the elimination of accident risk factors is not important, obviously, there is no way to solve these problems. However, the main reasons identified by the previous steps would be useful to avoid crashes. These avoidance methods are useful for two reasons:
Directly, the self-driven vehicle is able to avoid accidents caused by itself. For example, if the historical data series show that in a given location, the number of accidents caused by slippery road is significantly higher than expected, it should decrease the speed to reduce the probability of this event. Indirectly, the self-driven vehicle is able to avoid accidents caused by other drivers or other participants. For example, there are numerous locations in the public road network, where the number of accidents caused by careless pedestrians is significantly higher than expected. These accidents are caused by independent participants, but the autonomous vehicle would proactively try to decrease this negative potential. Using visual or auditory warning or decreasing speed should be a good way to avoid or decrease the seriousness of these accidents.
Meanwhile, the localization of hotspot candidates is a well-automated area, there are several methods and readily available applications to find these, the further analysis is not as well developed. This phase is usually done manually, by road safety experts. They have to travel to the scene and check the environmental conditions, to make decisions for actions. This process is supported by some rules, but mostly manual and requires the pattern matching capability of the human mind. Obviously, these methods are not applicable to an autonomous vehicle, it is necessary to find a fully automated process.
This paper presents a more advanced method to find unexpected patterns in black spot candidates. This method based on the following consecutive steps:
Analysing all known accidents by all possible accident reasons and assign a score value to the accident showing how much the accident is affected by this factor. Calculate the distribution of these score values for the given black spot candidate and for all known accidents. Using a selected statistical test to check that the distribution of this variable is significantly different for the black spot and for the whole population.
It is possible to define several independent accident reasons, like “slippery road”, “bad visibility”, “careless pedestrians”, etc. These are named as
Using this table, it is possible to calculate an overall score, of a given
where
Based on these functions, it is possible to calculate the score values for all reasons and all accidents at the examined hotspot candidate. The result of this process is
where
It can be effective to make decisions using some threshold values. For example, a
where
The disadvantage of the previously presented threshold-based approach is that it is hard to determine the
It may be better to directly compare the distributions of
The objective of the test is to determine if the mean value of the accident reason score in a given black spot candidate is higher than the same value for all accidents or not. According to this, the alternative hypothesis Eq. (5) states that the mean score of the black spot candidate minus the mean score of the whole population is greater than zero. The null hypothesis Eq. (4) covers all other possible outcomes.
where:
Welch’s
The statistic
where:
The degree of freedom is calculated by Eq. (7)
Based on these
In the case of rejection, there is evidence that the examined accident reason is related to the accidents. If the null hypothesis cannot be rejected, there is no evidence for that statement.
Database
Public road accidents of the Hungarian road network from 2010.01.01 to 2018.12.31 (total number of accidents is 305114) were used to test the proposed method. The database is maintained by the Hungarian Central Statistics Department (HCSD) based on the data recorded by accident scene investigators. It contains all accidents with personal injury classified into three categories: fatal, serious, and light. There are no accidents in the database without personal injury.
This practical evaluation focuses on one accident reason (
Hot spot searching
In the hot spot candidate search phase, the following sliding window parameters were used:
Minimum window size: 100 m. Maximum window size: 1000 m. Minimum accident count: 5. Minimum accident density: 0.01 weightedaccident/m.
The accident weights that were used:
Fatal accidents: 10. Seriously injured: 3. Lightly injured: 1.
The result of the sliding window method is a set of 146 black spot candidates.
The HCSD database contains more than two hundred fields, in four categories:
Accident attributes (time, location, nature, seriousness, etc.). Participant attributes (vehicle/pedestrian, speed, direction, age, etc.). Casualty attributes (location, age, etc.). Environmental attributes (lighting conditions, road surface, weather, etc.).
Most of these are not related to a slippery road surface reason, therefore, the
Three of the attributes have notable
“Slippery Road Surface” (
“Slippery Road Surface” (
“Slippery Road Surface” (
To apply the Welch-test, the next step is the generation of sample sets. The first sample contains the
Number of items ( Mean ( Variance (
It is necessary to iterate over all black spot candidates and calculate these values for all of them. Based on these, it is possible to apply the Welch-test resulting in the statistical
Table 4 shows the black spot candidates where the null hypothesis was rejected. This means that the mean of the
Accident black spots where the null hypothesis was rejected
As an example, it is worth checking the first black spot on the list. The detailed information concerning the accidents is presented in Table 5.
Accidents of road no. 8 from section 24
It is evident that all accidents are affected by one or more slippery road related attributes; however, there is no special environmental factor (sharp turn of the road) in the map. This pattern significantly differs from the expected; therefore, there should be some general problem at this location in the road. The examination and elimination of these reasons is the task of road safety experts [34]. But, until then, it is worth taking preventive steps to decrease any chance of further accidents. Using this data, an autonomous vehicle can adapt its control to this situation, for example, by decreasing speed.
This paper presents a novel automated method, which enables autonomous vehicles to use historical accident data, to quickly process information regarding the potential road risks. The standard sliding window technique is used to find the black spot candidates, where the number of accidents is greater than expected.
The proposed method has additional processing steps that make assumptions pertaining to the main accident reasons. The possible reasons are checked, one-by-one, assigning a score to all accidents. If the distribution of these values significantly differs from the expected distribution (calculated by statistical methods based on the entire accident database), the black spot candidate can be considered as affected by the given factor.
The output of this process is a list of dangerous locations on the public road network and a prediction concerning the reasons. These results would be the base of a further process to make automatic preventive steps by autonomous vehicles. These can use this database in the route planning phase (avoid black spots if possible) and in the driving phase (make the appropriate preventive steps) [35]. It is expected that this knowledge will decrease the number and seriousness of road accidents [35].
Limitations and future research
A known limitation of the presented method is that (according to the capabilities of the standard sliding window method) it is not able to find hot spots in junctions, which is especially disadvantageous in urban environments where most of the accidents occurred at crossroads [36]. As a further development, it is worth extending the research scope, with other black spot search methods, for instance, like a 2D sliding window [37], or other data-mining techniques (DBSCAN) [38].
This data should be built into the control process of the self-driven vehicle to fine-tune its movement strategy and avoid risky situations. For example, in the case of pedestrian accidents, the self-driven vehicle should increase the volume of the engine voice; to avoid frontal accidents it is worth increasing the brightness of the headlights; and of course, decreasing the speed may decrease the seriousness of almost all accident types. Building an expert system to give similar advice based on the historical data will be the next step of our research project.
Footnotes
Acknowledgments
The research presented in this paper was carried out as part of the EFOP-3.6.2-16-2017-00016 project in the framework of the New Széchenyi Plan. The completion of this project is funded by the European Union and co-financed by the European Social Fund. Sándor Szénási would like to thank EPAM Systems for their support.
