Explainable Hidden Markov Model for road safety: a case of road closure recommendations in extreme weather conditions

Abstract

Recommendations analysis of road safety requires decision-making tools that accommodate weather uncertainties. Operation and maintenance of transport infrastructure have been one of the sub-areas that require attention due to its importance in the quality of the road. Several investigations have proposed artificial neural networks and Bayesian networks to assess the risk of the road. These methods make use of historic accident records to generate useful road safety metrics; however, there is less information on how climatic factors and road surface conditions affect the models that generate recommendations for safe traffic. In this research, Bayesian Network, as a Hidden Markov Models, and Apriori method are proposed to evaluate the open and closed state of the road. The weather and road surface conditions are explicitly written as a sequence of latent variables from observed data. Different weather variables were studied in order to evaluate both road states (open or close) and the results showed that the Hidden Markov Model provides explicit insight into the sequential nature of the road safety conditions but does not provide a directly interpretable result for human decision making. In this way, we complement the study with the Apriori algorithm using categorical variables. The experimental results show that combining the Hidden Markov Model and the Apriori algorithm provides an interpretable rule for decision making in recommendations of road safety to decide an opening or closing of the road in extreme weather conditions with a confidence higher than 90%.

Keywords

Road safety analysis hidden markov models apriori methods

1 Introduction

Climate change poses a significant threat to several economic activities in developing and developed countries. Most studies have focused on the long-term impacts of climate change and severe weather events to the transportation sector. However, despite the importance of roads and public infrastructure, less attention to the potential consequences of such events to the operation and maintenance of transportation infrastructure [1, 2].

Experts have been warning about the systematic changes in the frequency of extreme climate and weather events. These events may have a direct or indirect impact on the roads. Indirect impacts can be related to traffic loading as a result of demographic changes [3]. Direct impacts are mainly due to weather conditions such as temperature, rain, wind and snow. Yuan and Li wrote a review of studies on traffic prediction. The authors focused on the spatio-temporal data layer and its application to intelligent transportation. In particular, the authors discussed pre-processing, prediction and traffic application based on the existing techniques for addressing these challenges [4].

Conversely, a changing pattern in the frequency of extreme events requires incorporating such uncertainties into the operational and emergency management of roads and highways. Generate accurate early warnings to safer driving and operational maintenance of transportation infrastructure, on extreme weather, require decision-making tools that accommodate to the weather uncertainties. Previous works have used binary logit/probit models [5], spatial and temporal correlations [6], Markov switching models [7], and genetic algorithms [8], artificial neural networks and Bayesian networks, among other machine learning methods, to road safety analysis and provide recommendations for driving [9 –11]. In particular, Sharma et al. proposed an efficient road surface monitoring using an ultrasonic sensor and image processing technique to improve the classification and accuracy of road surface detection conditions [12]. Dongyao et al. [10] proposed a novel method based on BA-BP (Bat-Back Propagation) algorithm which is applied on five kinds of road features to safety condition evaluation (including roughness, curvature, obstacle width to height ratio, obstacle effective area ratio, obstacle coefficient). On the other hand, Del Vecchio et al. developed an algorithm to assess the occurrence of road ice and its potential application to determine the surface temperature of agricultural land. The algorithm works operationally every day in real-time and has been tested for different weather conditions [13]. The basic components of a probabilistic assessment tool include random variables associated with hazards and assets as well as the utility of decisions. In the context of climate change, hazards represent changing weather conditions and extreme events. Conversely, assets represent the infrastructure and its economic value in terms of the transportation system and the cost of replacement. Probabilistic safety analysis is a well known technique for evaluating the safeness of a system. Bayesian networks have been proposed for the probabilistic evaluation of road and safety analysis [14]. The authors developed a probabilistic safety analysis methodology (PSA) of highways and roads. The main idea consists of identifying and reproducing all the elements encountered when travelling the road, identifying the direct dependencies among variables to reproduce the qualitative structure of the Bayesian network, and where the human error is used to model how the driver’s behaviour evolves with A probabilistic safety analysis methodology (PSA) of highways and roads is presented. The main idea consists of identifying and reproducing all the elements encountered when travelling the road, identifying the direct dependencies among variables to reproduce the qualitative structure of the Bayesian network, and where the human error is used to model how the driver’s behaviour evolves with driving. Bayesian networks are suitable for this problem because allow to express the multi-dimensional variables into several factors that follow conditional independence properties. Furthermore, Bayesian networks provide a simple causal model that surpasses the limitations of other competing frameworks for risk assessment [9]. Bayesian networks have been also used to analyse safety of other transportation infrastructure such as railway lines [15], road accidents [16, 17] and causation analysis of road accidents [18] among other applications. Most research on probabilistic road safety analysis have considered explanatory variables such as drivers attention, road geometry, weather conditions and traffic volume [19]. On the other hand, weather variables such as temperature and rainfall can be associated to driver’s behaviour and road conditions.

In particular, significant correlations among the weather variables and the aggregate number of injury conditions have been found [20, 21]. Besides, the relevant effects of adverse weather conditions on traffic crashes. The authors compensated the number of crashes with the low traffic volumes observed during adverse weather conditions (which to a greater extent can be used to explain the number of crashes). The authors reported increased crash rates for wet pavement (more than 300%), rain (71%) and snow (84%). However, considering the effect of a single variable in isolation might not reveal the interaction between other weather variables (low temperature, fog, etc) and its effects on visibility and road conditions.

2 Related works

Building accurate and explainable models for extreme weather events is a complex task. An approach to reveal patterns that can deliver meaningful information for decision making was developed in [22]. The authors used several data mining techniques such as Artificial Neural Networks and Self-Organizing Maps in conjunction with rule-based systems such as C5.0 and CART. The results of their work indicate that the data mining algorithms can be used to explain the weather patterns that lead to wind gusts at a specific location.

Traditional weather forecasting methods use continuous data as an input. Instead, a fuzzy inference engine transforms the crisp data into fuzzy sets (linguistic variables) that can be used to generate explainable rules. Conversely, neuro-fuzzy inference systems were proposed in [23, 24]. However, fuzzy sets can handle ambiguity in the definition of the categorical variables (such as low-medium-high) rather than the uncertainty of the event.

An accident prevention system for urban environments using Hidden Markov Models (HMM) have been previously studied in [25]. The system make use of injury statistics and related weather conditions obtained from nearby stations to estimate a latent variable that represents the risk of a crash. The authors performed evaluations on simulation results and demonstrated the effectiveness and robustness of their approach. A large-scale road safety evaluation was proposed in [26]. The authors considered extreme value theory to estimate the probability of extreme events and used microwave Doppler radar data for calibration.

Road safety models that produce interpretable predictions have received less attention. Yu et al. studied factors that influence the patterns of road crashes using association rules [27]. The study shows how the Apriori algorithm is used to mine the significant association rules between the severity and the factors influencing the occurrence of crash accidents. Mazouri et al. proposes a data mining combined to the multi-criteria decision analysis for analyzing road accident data. The authors show significant correlations between conditions that led to accidents multi-criteria decision methods [28]. In addition, Zou et al. explore the impact of climate and extreme weather on fatal traffic accidents. A negative binomial model and a log-change model are proposed to analyze the impact of various factors on fatal traffic accidents, showing that both models can provide accurate fitting results and how the climate variables can significantly affect the frequency of fatal traffic accidents [29].

In this paper we propose an unsupervised approach to obtaining explainable rules to issue early warnings for the open or close state of the road. In the proposed approach, we assume that there is a historical records generated by in-situ weather stations. A Hidden Markov Model (HMM) with continuous meteorological data along with the Apriori algorithm with categorical data are both used. The combined method provides a predictive model for the open or close state of the road, while also delivering explainable rules for decision-making. The main contributions of this approach can be summarized as follows:

Early warnings for the open or close state of the road are treated as a latent binary indicator variable that can be estimated using a continuous HMM from the observed weather variables.

The Apriori algorithm is used to build frequent itemsets from the categorical weather variables (which can be derived from the continuous data) along with the indicators of early warnings for the open or close state of the road.

The resulting itemsets achieve strong confidence in a real-world database and provides explainable rules for decision making.

The contributions of this article aim to provide rules for decision-making. The rules can be used proactively to helps to decide the opening or closing of the customs border between Chile-Argentina which should operate every day of the year. This customs border, from May it presents regular storms and snowfalls, which interrupt vehicular and cargo traffic between both countries, conditions that endanger business, communications and integration between the two countries [30].

3 Materials and methods

3.1 Energy balance in a terrestrial surface layer

Undoubtedly, under extreme weather conditions and sudden changes in the weather, the control and safety of road traffic is a task of great difficulty. Prior knowledge of the state of the roads is of utmost importance to ensure maintenance, provide safety for drivers, and ensures economic activity. Several studies in this regard have been developed previously, however, the large number of variables included in the different researches make it difficult to develop a model that effectively describes the state of the roads. For this purpose, the earth’s surface constitutes a highly influential interface for processes in the atmosphere and for exchanges between the surface and the atmosphere. Likewise, it is a fundamental fact that the development of climatic and meteorological models that can help describe the surface condition of roads must be based on the conservation or balance of energy on the earth’s surface, under different atmospheric conditions, water availability and land use, among others, and that are useful when calibrating and evaluating their performance [31, 32]. The energy balance is based on a one-dimensional formulation that considers the four main forms of energy transfer through a surface layer [33, 34]; radiation, conduction, sensible heat flux, and latent heat flux, and that can be expressed as: $R_{n} - L_{e} E - H + L_{p} F_{p} - G + A_{h} = \frac{\partial W}{\partial t}$ (1)

where the incoming energy flow is taken as positive and outgoing as negative. R_n is the net radiation on the surface, L_eE is the latent heat flux, H is the sensible heat flux, L_p is the thermal conversion factor for fixation of carbon dioxide, F_p the specific flux of CO₂, G the energy flux leaving the lower boundary layer, A_h the energy advection into the layer, and $\frac{\partial W}{\partial t}$ the rate of energy stores per unit of area in the layer. An special case is when a snow or ice layer is present, the last term may include the energy consumed by fusion, and L_e may be replaced by the latent heat of sublimation L_s. However, for practical purposes several of these terms may be omitted and the eq.(1) could assumes a simple form.

The net radiation can be broken down into: $R_{n} = R_{s} (1 - α_{s}) + ɛ_{s} R_{ld} + R_{lu}$ (2) where R_s is the short-wave global radiation or radiant flux resulting directly from the solar radiation, α_s the albedo of the surface (ratio of the global short-wave reflected radiative flux and the flux of the corresponding incident radiation), ɛ_s the emissivity of the surface, R_ld the downward long-wave radiation (radiant flux downward resulting from the emission of the atmospheric gases and the land and water surfaces of the earth) and R_lu the upward long-wave radiation (radiant flux resulting from the emission of the atmospheric gases and the water surfaces of the earth).

The short-wave radiation $R_{s} = R_{sc} {a + (1 - a) \frac{n}{N}}$ (3) where a is a constant of the order of 0.235 at Stockholm [33], R_sc is the short-wave radiation under clear skies, and $\frac{n}{N}$ the fraction of sunshine hours. Now, the two components of the terrestrial radiation at the earth’s surface could be consider separately, the component of downward radiation from the atmosphere R_ld $R_{lu} = ɛ_{a} c σ T_{a}^{4} (1 + {am}_{c}^{b}),$ (4) in terms of the surface temperature T_s and the Stefan-Boltzmann constant σ, and that of upward radiation from the surface R_lu $R_{lu} = ɛ_{s} σ T_{s}^{4}$ (5) where ɛ_ac is the atmospheric emissivity under clear skies, T_a is the air temperature near the ground, m_c is the fractional cloud cover of the sky and a and b are constants. In general, many cases of studies use the energy balance on the earth’s surface to obtain a model that results in temperature fluctuations.

Fig. 1

Energy balance schema in a terrestrial surface layer.

3.2 Hidden Markov Models

Hidden Markov Models (HMMs) are probabilistic graphical models for sequential data [35]. In this context, weather data arrives in batches and Markov models are well suited for this task since allows to explicitly write a factored distribution for a sequence of latent variables from observed data. The HMM consists of a discrete-time Markov chain with state x_t representing the hidden road conditions and an observation model with state z_t representing the weather variables. The factored distribution can be written as follows:

$\begin{matrix} p (x_{1}, x_{2}, \dots, x_{T}, z_{1}, z_{2}, \dots, z_{T}) = \\ p (x_{1}) \prod_{t = 2}^{T} p (x_{t} | x_{t - 1}) \prod_{t = 2}^{T} p (z_{t} | x_{t}) \end{matrix}$ (6)

The hidden and observed states convey a state space model, which is a generative representation for the noisy observations. The state space model consists of a transition probability p (x_t|x_t-1) for the hidden states, an initial probability p (x₁) and an emission probability p (z_t|x_t) that accounts for the conditional probability of observing z_t given x_t. The hidden states x_t are discrete and the emission probabilities are continuous. Figure 2 shows the Bayesian network representation for the HMM.

Fig. 2

Bayesian network representation. The hidden states of the road x₁, …, x_T are observed through the observed weather variables z₁, …, z_T

3.3 Association rules

The association’s study is basic for the development of different tasks that involve correlations, classifiers, associations, and groupings, to name a few [36, 37]. Association rules are one of the most important branches in data mining. The Apriori, an algorithm was developed by Agrawal and Srikant in 1994 [38]. This technique can be used to find correlations and patterns within a set of elements [39], such as identification of pathological conditions [40], and recommendation systems [41], among others. The algorithm starts by mapping the database to find frequent elements from the set that appear together several times (item-sets). Then, the algorithm reduces the number of item-sets by eliminating rare elements and combining item-sets with a single element.

Several strategies for database scanning have been proposed. The Candidate Generation-based (CGB) is a breath first strategy [38] and the Pattern Growth-based (PGB) is a depth-first strategy [42]. Also, various efforts have been made to improve and optimize the Apriori algorithm [36 , 43–46]. In [43] a novel and efficient algorithm for mining frequent patterns in large databases was presented. More recently, [36] presented a new scheme that overcomes some of the disadvantages of the original Apriori algorithm. The improved method takes into account the time complexity, number of database scans, memory consumption, and the pattern evaluation rules. [44] also proposed an improvised Apriori algorithm using an FP-tree data structure. The method reduces the memory space and time complexity of the algorithm by considering a frequent pattern tree.

3.3.1 Apriori algorithm

Apriori is a simple algorithm and can be summarized as follows:

Scan the database to find k frequent item-sets.

Verify that each k-itemset must be greater than or equal to the minimum support threshold.

If an itemset is not frequent, prune from search space in the database.

Repeat the procedure to obtain k + 1 itemsets until there are no more patterns.

3.4 Explainable Machine Learning

An interpretable machine learning system can be defined in terms of the model ability to explain or to present its outcomes in a form that can be easily understood by a human counterpart [47]. This requirement is not necessarily accomplished by traditional metrics to evaluate the model performance and therefore this topic has remained elusive in the machine learning literature [48].

Examples of interpretable systems are simple models such as association rules, that are intrinsically able to provide explanations along with their predictions. When the model does not provide such insights, post-hoc techniques can be applied after training and thus can be considered as being model-agnostic.

LIME (Local Interpretable Model-Agnostic Explanations) is one example of post-hoc processing [49]. In this case, the authors propose local surrogate models that are used to explain individual predictions. Another method named SHAP (SHapley Additive exPlanations) is also inspired by local surrogate models [50]. SHAP provides explainable predictions using additive feature importance measures.

Recently, Chakraborty et al. [51] applied an explainable machine learning framework to quantify the importance of hydro-climatic variables to impute hourly reference evapotranspiration. Also, Chaibi et al. [52] proposed an interpretable machine learning model for daily global solar radiation prediction. The authors used the SHAP method to identify the influence of each one of the features and their interactions using the period of time as one of the features. Parsa et al. [53] developed a methodology for real time accident detection using Gradient Boosting and SHAP for feature analysis. Due to the class imbalance (accidents versus non-accidents records), the authors used oversampling techniques in order to obtain accurate predictions.

These model-agnostic techniques do not take into account the time domain which is necessary for evaluating road safety. Post-hoc methods can be applied to temporal aware models such as recurrent neural networks (e.g using model distillation). However, these techniques are usually used in the context of supervised learning which is not directly applicable to the road safety analysis case. To the best of the author’s knowledge, the method proposed in this paper is the first approach that combines temporal Bayesian networks such as HMMs and Apriori rules to provide explainable decision making for road safety recommendations in extreme weather conditions.

3.4.1 Proposed model

The proposed approach was implemented using Python 3.7 and R. In particular, the following combination of software libraries were used:

HMM training and inference: hmmlearn 0.2.4 https://hmmlearn.readthedocs.io/en/latest/

Data manipulation: pandas 0.2.4 https://pandas.pydata.org/.

Data analysis: R project 3.6.3 https://www.r-project.org/.

Apriori algorithm https://github.com/ymoch/apyori.

The model consists of a continuous HMM from the observed weather variables. The latent states of the HMM represent the road safety indicators which are then used along with the categorical weather variables to build frequent itemsets. The time complexity for training the HMM is O (K²T) with T being the number of data points and K = 2 the number of road safety indicators. In the other hand, the time complexity of the Apriori algorithm is O (2^D+1) with D = 9 being the number of categorical variables (see Table 2). The overall complexity of the proposed approach remains exponential with respect to the number of categorical variables, therefore model refinement could make use of a distributed FP-growth representation [54].

The HMM model is used to produce latent factors associated with the current road conditions (road safety indicators). These factors are then used to produce explainable recommendations by means of the Apriori algorithm. Figure 3 shows a depiction of the proposed approach for explainable decision making.

Fig. 3

Decision making is based on the categorical weather variables and the estimated road safety indicators obtained from the HMM.

3.5 Data

3.5.1 Accidents

Accidents records were obtained from the data observatory of the Chilean National Committee of Transportation Security (CONASET). The dataset contains N = 704 records from years 2015-2019 along the CH-115 route in the Region of Maule, Chile. The accident types were collisions, injuries and rollovers while the causes vary from alcohol ingestion, mechanical failures, driver’s attention and lost control among others. Figure 4 show the distribution of the number of accidents given the year, month and weather conditions. Most accidents occur in summertime when there is more traffic congestion. Conversely, accidents tend to occur under clear sky conditions during the summer period of the southern hemisphere.

Fig. 4

Distribution of number of accidents per year/month/weather conditions

In winter time, the local authorities send road closure notifications in order to prevent accidents. The observed records does not not reflect the effect of adverse weather on the number of accidents. Figure 5 shows the location and weather conditions of the accidents in the CH-155 route.

Fig. 5

Accidents in CH-115 route.

3.5.2 Weather

Weather and road conditions data were registered from 2017 at 2020 for the winter period in the southern hemisphere (May - October) and obtained from two weather stations located at kilometer 121 (Lo Aguirre, 1230 meters above sea level) and kilometre 138 (Los Condores, 2000 meters above sea level) on the CH-115 route (Ministerio de Obras Públicas MOP, Pehuenche international road, Región del Maule - Chile). Probabilistic assessment of risk is modelled as a hidden state and the observed variables is the combined information from both weather stations. Figure 6 shows the location of the weather stations along the CH-115 route.

Fig. 6

CH-115 route in the Region of Maule in Chile.

The observed weather data, that contains numerical and categorical variables, are shown in Table 1 and Table 2 respectively.

Table 1

Numerical observed data for meteorological conditions and road surface state

Observed data	Unit
Surface temperature Site (i), i = 1, 2	^oC
Thermal conductivity Site (i), i = 1, 2	-
Chemical product Site (i), i = 1, 2	g/m²
Liquid Freezing temperature Site (i), i = 1, 2	^oC
Surface temperature Deep 6cm Site (i), i = 1, 2	^oC
Solid freezing temperature Site (i), i = 1, 2	^oC
Water layer thickness Site (i), i = 1, 2	mm
Snow layer thickness	mm
Air temperature	^oC
Dew point temperature	^oC
Relative air humidity	%
Intensity of precipitation	mm/h
Wind speed	m/s
Wind direction	Deg
Visibility	m
Accumulated precipitation	mm
Global radiation	W/m²
Maximal Wind speed	m/s

Table 2

Categorical observed data for meteorological conditions and road surface state

Observed data	Categories
Surface state Site (i), i=1,2	(with ice, with snow, with frost, hum chim, damp, wet, dry)
Alarm state Site (i), i=1,2	(frost w., ice w., rain w., ice alarm, non w.,)
Rain status	(light, moderate, snow I., snow i., snow m., none)
Current time	(hiGrL, rain, drizzle, snow, rainfall, non rainfall)
Visibility sensor status	(accept, Rd1, Rd2)
Rain	(off, on)
Atmosphere State	(no Ok,all Ok)

4 Results

This article proposes an explainable Hidden Markov Model as a method to provide safety recommendations in extreme weather conditions. In order to evaluate the feasibility of the approach, weather and road surface data are used to train the model and provide interpretable rules. As previously mentioned, extreme weather due to climate change has a direct impact on the transportation and infrastructure sectors. Therefore, decision-making tools that accommodate to the uncertainty due to a changing pattern in the frequency of extreme events are required. In particular, accident records, weather and road surface conditions from the CH-155 route that connects Chile and Argentina is studied and the resulting Apriori rules are shown as evidence of the proposed approach.

The daily fluctuations of temperature and seasonal changes of weather are mostly due to the earth’s energy balance (see Sec. 3.1 for a brief description). Conversely, predictive models for surface temperature and road conditions are based on the energy balance, depending on the weather variables involved and the conditions imposed. Recent models developed include effects of a include the effects of a superficial layer of snow or ice on the pavement, hydrological surface condition, surface friction, wind conditions, snow melting, and thermal conductivity due to the density of these superficial layers [55 –59]. The database used includes several variables that directly appear in the energy balance model: global radiation (R_n), surface temperature (T_s), and air temperature (T_a) to name a few. In the other hand, variables such as the accumulated snow amount or the presence of the ice layer on the surface, can indirectly be related to terms such as the latent heat flux (L_eE) or the rate of energy storage per unit of area ( $\frac{\partial W}{\partial t}$ ). Figure 7 shows the surface temperature variable across different states.

Fig. 7

Surface temperatures and states for the observed sites

Using the HMM, the road condition is modelled as a binary latent variable x_t = {Open, Closed} and the numerical observed meteorological variables z_t follow a multivariate Gaussian distribution with spherical covariance matrix. The forward-backward algorithm is used to estimate the transition and emission probabilities. The estimated parameters of the HMM can be found in Table 3.

Table 3

Estimated HMM parameters

Parameter	Value
Log-likelihood	-1908361.65
Transition matrix	$(\begin{matrix} 0.98 & 0.02 \\ 0.22 & 0.78 \end{matrix})$
Initial probability	[1, 0]

Once the parameters of the HMM have been found, the most likely sequence for latent road conditions can estimated using the Viterbi algorithm. This algorithm uses a dynamic programming approach to find the latent sequence of road conditions (that can be either {open} or {closed}). Figure 8 shows the estimated road conditions and the associated climate variables.

Fig. 8

Climate variables associated with road conditions

Now we are interested in observing differences in the weather variables given the predicted labels. Figure 9 shows the box-plots for each of these weather variables related to the open and closed state of the road. An analysis of the statistical relevance of the weather variables was also conducted (see Table 7). Several variables such as global radiation, relative air humidity, air temperature, surface temperature 1, Water layer thickness and snow layer thickness, among others, showed statistical significance (p < 0.001).

Fig. 9

Conditional distribution of the climate variables given the road conditions

So far, we have assumed the number of hidden states to be known. This assumption is based on the utility of the latent variable used for the road conditions given the observed climate variables. The number of hidden regimes, namely the order of the HMM cannot be estimated consistently from the observed data [60]. Also, standard methods such as likelihood-ratio tests do not hold the required regularity conditions when applied to non i.i.d samples. Nevertheless, likelihood-ratio tests for multivariate Gaussian HMM have been used for the purpose of performing model comparisons between switching regimes [61]. Table 4 shows the result of the statistical tests for the climatic variables under two alternating conditions.

Table 4

p-values of the Wilcoxon and T test for numerical observed data for meteorological conditions and road surface state

Variable	Wilcoxon test (p value)	T test (p value)
Surface temperature 2 (^*)	6.47E-34	2.53E-233
Air temperature (^*)	2.85E-160	3.17E-247
Dew point temperature (^*)	4.02E-74	3.59E-93
Relative air humidity (^*)	0	0
Intensity of precipitation (^*)	0	3.51E-158
Wind speed (^*)	4.66E-72	1.61E-58
Wind direction (^*)	2.86E-59	5.76E-22
Atmospheric visibility (^*)	0	0
Accumulated precipitation (^*)	0	5.25E-113
Thermal conductivity 1 (^*)	1.16E-159	1.49E-66
Thermal conductivity 2 (^*)	1.07E-182	4.97E-66
Chemical product surf 1 (^*)	1.07E-105	3.41E-09
Chemical product surf 2 (^*)	7.24E-95	8.02E-11
Global radiation (^*)	2.17E-05	2.37E-70
Snow layer thickness (^*)	0	8.62E-115
Liquid Freezing temperature 1	0.376972778	0.278969177
Liquid Freezing temperature 2 (^*)	0.001155354	0.006889695
Surface temperature Deep 6cm 1 (^*)	6.12E-78	1.52E-262
Surface temperature Deep 6cm 2 (^*)	8.15E-60	8.54E-274
Solid freezing temperature 1 (^*)	2.57E-13	0.009105553
Solid freezing temperature 2 (^*)	2.67E-29	1.48E-05
Maximal Wind speed (^*)	4.81E-67	1.87E-53
Water layer thickness 1 (^*)	0	4.40E-172
Water layer thickness 2 (^*)	0	1.02E-152

Table 5

Summary of Apriori rules, Case 1, for the road condition forecast (min support = 0.01, min confidence = 0.70, min lift = 3, min length = 2)

Weather conditions	Road status	Support	Confidence	Lift
{snow m.,}	Closed	0,05	0,90	9,38
{snow m., accept}	Closed	0,04	0,90	9,35
{snow m., ice w.}	Closed	0,02	0,95	9,86
{snow m., rain w.}	Closed	0,03	0,92	9,58
{snow m., ice alarm}	Closed	0,01	0,92	9,49
{snow m., with snow}	Closed	0,02	0,95	9,84
{snow m., wet}	Closed	0,02	0,90	9,36
{snow m., off}	Closed	0,01	0,93	9,63
{snow m., all OK}	Closed	0,05	0,90	9,38
{with snow, on}	Closed	0,02	0,70	7,30
{snow m., accept, ice w.}	Closed	0,02	0,96	9,91
{snow m., accept, rain w.}	Closed	0,02	0,91	9,43
{snow m., accept, with snow}	Closed	0,01	0,94	9,77
{snow m., accept, wet}	Closed	0,02	0,90	9,35
{snow m., accept, off}	Closed	0,01	0,93	9,59
{snow m., all OK, accept}	Closed	0,04	0,90	9,35
{snow m., all OK, ice w.}	Closed	0,02	0,95	9,86
{snow m., all OK, rain w.}	Closed	0,03	0,92	9,58
{snow m., all OK, ice alarm}	Closed	0,01	0,92	9,49
{snow m., all OK, with snow}	Closed	0,02	0,95	9,84
{snow m., all OK, wet}	Closed	0,02	0,90	9,36
{snow m., all OK, off}	Closed	0,01	0,93	9,63
{snow m., with snow, rain w.}	Closed	0,01	0,98	10,15
{snow m., with snow, ice w.}	Closed	0,01	0,95	9,89
{snow m., with snow, on}	Closed	0,01	0,94	9,79
{snow m., ice w., rain w.}	Closed	0,01	0,96	9,90
{with snow, all OK, on}	Closed	0,02	0,70	7,29
{with snow, on, rain w.}	Closed	0,01	0,83	8,55
{snow m., all OK, accept, ice w.}	Closed	0,02	0,96	9,91
{snow m., all OK, accept, rain w.}	Closed	0,02	0,91	9,43
{snow m., all OK, accept, with snow}	Closed	0,01	0,94	9,77
{snow m., all OK,, accept, wet}	Closed	0,02	0,90	9,35
{snow m., all OK, accept, off}	Closed	0,01	0,93	9,59
{snow m., all OK, ice w., rain w.}	Closed	0,01	0,96	9,90
{snow m., all OK, with snow, ice w.}	Closed	0,01	0,95	9,89
{snow m., all OK, with snow, rain w.}	Closed	0,01	0,98	10,15
{with snow, all OK, on, rain w.}	Closed	0,01	0,83	8,55
{with snow, all OK, on, snow m.}	Closed	0,01	0,94	9,79

4.1 Apriori on weather conditions

Apriori algorithm was applied in categorical data set to study the variables and conditions that associate with a closed road state (See. Table 2). Two cases are shown in this study with different minimal support:

Case 1: min support = 0.01, min confidence = 0.70 min lift = 3, min length = 2

Case 2: min support = 0.02, min confidence = 0.70 min lift = 3, min length = 2

In this work, the focus is to determine the variables that lead to a {closed} road condition. The results for the {closed} road state, for both cases studied, are shown in Table 6 and 7. We can see, from the first and second columns, that the {snow m.} item, belonging to the categorical observed data Rain status happen together to {closed} road status with a confidence higher than 90%. This makes sense since in all cases where {snow m.} item is contained, the road state is closed. Likewise, the Rain status:{snow m.} is always accompanied for all or some of the following items: {all OK}, {accept} and/or {with snow}, belonging to the observed variables Atmosphere State, Visibility sensor status and Surface state Site (1,2) respectively. Special interest have those cases where the confidence is less than 90%, and in which the Rain status:{snow m.} is not present, this suggest that Rain status:{snow m.} is decisive to get a closed road state. Finally, the lift tells us that the road status {closed} is over 9 times more likely to occur when the weather condition {snow m.} is present.

Table 6
Summary of Apriori rules, Case 2, for the road condition forecast min support = 0.02, min confidence = 0.70, min lift = 3, min length = 2

Weather conditions Road status Support Confidence Lift

{snow m.} Closed 0,05 0,90 9,38

{with snow, on} Closed 0,02 0,70 7,30

{snow m., accept} Closed 0,04 0,90 9,35

{snow m., ice w.} Closed 0,02 0,95 9,86

{snow m., rain w.} Closed 0,03 0,92 9,58

{snow m., wet} Closed 0,02 0,90 9,36

{snow m., all OK} Closed 0,05 0,90 9,38

{snow m., all OK, accept} Closed 0,04 0,90 9,35

{snow m., all OK, ice w.} Closed 0,02 0,95 9,86

{snow m., all OK, rain w.} Closed 0,03 0,92 9,58

{snow m., all OK, wet} Closed 0,02 0,90 9,36

{snow m., rain w., accept} Closed 0,02 0,91 9,43

{all OK, with snow, on} Closed 0,02 0,70 7,29

{snow m., all OK, rain w., accept} Closed 0,02 0,91 9,43

Weather conditions	Road status	Support	Confidence	Lift
{snow m.}	Closed	0,05	0,90	9,38
{with snow, on}	Closed	0,02	0,70	7,30
{snow m., accept}	Closed	0,04	0,90	9,35
{snow m., ice w.}	Closed	0,02	0,95	9,86
{snow m., rain w.}	Closed	0,03	0,92	9,58
{snow m., wet}	Closed	0,02	0,90	9,36
{snow m., all OK}	Closed	0,05	0,90	9,38
{snow m., all OK, accept}	Closed	0,04	0,90	9,35
{snow m., all OK, ice w.}	Closed	0,02	0,95	9,86
{snow m., all OK, rain w.}	Closed	0,03	0,92	9,58
{snow m., all OK, wet}	Closed	0,02	0,90	9,36
{snow m., rain w., accept}	Closed	0,02	0,91	9,43
{all OK, with snow, on}	Closed	0,02	0,70	7,29
{snow m., all OK, rain w., accept}	Closed	0,02	0,91	9,43

5 Conclusions

In this paper, we presented an approach for early warnings for the open or close state of the road analysis based on a Hidden Markov Models and the Apriori algorithm. The weather road conditions were modelled as a time-dependent binary random variable, which is not observed directly. Instead, the causal relationship between the current weather conditions and early warnings were modelled as a Hidden Markov Model. The model provides a probabilistic measure of the operational risk related to the road. Furthermore, the model can be also extended to include decision variables for the given road safety recommendations.

Most machine learning models are treated as black-boxes, where the output of the model is used to make decisions. Markov models provide explicit insight into the sequential nature of the road safety conditions. However, the results are not directly interpretable for human decision making.

To provide interpretable rules for decision making, the Apriori algorithm was used on the categorical variables. The two cases presented in this study, with different minimal support, showed the {snow m.} item happen together to {closed} road status with a confidence higher than 90%, while those cases where Rain status:{snow m.} it is not present, the confidence is less than 90%, and Rain status:{snow m.} is decisive to get a closed road state. In this way, the HMM provides the predictive power to predict the open or closed state of the road according to the different weather variables studied, and the rule learning method provides a structural assessment and informative intuition about road safety conditions.

The Apriori algorithm, allows us to understand which were the weather variables directly involved in the open and closed state of the road. In this aspect, this work constitutes a complete exploration from the point of view of the use of machine learning methods to solve decision-making on roads where the different weather variables are involved. This makes sense since in all cases where snow m. item is contained, the road state is closed. Likewise, the Rain status:{snow m.} is always accompanied for all or some of the following items: {all OK}, accept and/or {with snow}, belonging to the observed variables Atmosphere State, Visibility sensor status and Surface state Site (1,2) respectively. Special interest have those cases where the confidence is less than 90%, and in which the Rain status: {snow m. } is not present, this suggests that Rain status:{snow m.} is decisive to get a closed road state. Finally, the lift value tells us that the road status {closed} is over 9 times more likely to occur when the weather condition {snow m.} is present. In this way, the combinations between HMM and Apriori algorithm allowed us to obtain a confidence higher than 90% and an interpretable rule for decision making in road safety management. Further research would include evaluation measures between different interpretable models for road safety recommendations and their use in real world scenarios.

Footnotes

Acknowledgments

The authors are grateful to the Ministerio de Obras Públicas (MOP, Región del Maule - Chile), Dr. Juan Figueroa Meriño, Dr. Luis Morales-Salinas for providing the Figure of the region of study and Mario Guiachetti Valenzuela for providing the database used in this research.

Appendix A: Additional weather variables

References

Transportation Research Board and National Research Council. Potential Impacts of Climate Change on U.S. Transportation: Special Report 290. The National Academies Press, Washington, DC, 2008.

Shen

, Yu

, Shi

, Doan

, Yang

and Xu

, Transportation routes evaluation: A delphi and cfpr approach, Journal of Intelligent Fuzzy Systems, pp. 1–14, 05 2021.

Qiao

, Dawson

A.R.

, Parry

and Flintsch

G.W.

, Evaluating the effects of climate change on road maintenance intervention strategies and life-cycle costs, Transportation Research Part D: Transport and Environment 41 (2015), 492–503.

Yuan

and Li

, A survey of traffic prediction: from spatio-temporal data to intelligent transportation, Data Science and Engineering 6 (2021), 03.

Moudon

, Lin

, Jiao

, Hurvitz

and Reeves

, The risk of pedestrian injury and fatality in collisions with motor vehicles, a social ecological study of state routes and city streets in king county, washington, Accident; Analysis and Prevention 43 (2011), 11–24.

Castro

, Paleti

and Bhat

, A spatial generalized ordered response model to examine highway crash injury severity, Accident; Analysis and Prevention 52C (2013), 188–203.

Xiong

, Tobias

and Mannering

, The analysis of vehicle crash injury-severity data: A markov switching approach with road-segment heterogeneity, Transportation Research Part B: Methodological 67 (2014), 109–128.

Martín

, Rosete-Suárez

, Alcala-Fdez

and Herrera

, A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules. Evolutionary Computation, IEEE Transactions on 18 (2014), 54–69.

Fenton

and Neil

, Risk Assessment and Decision Analysis with Bayesian Networks. Chapman and Hall/CRC, 2nd edition, 2018.

10.

Dongyao

, Zhang

and Lv

, Evaluation of road condition based on ba-bp algorithm, Journal of Intelligent Fuzzy Systems 40 (2020), 1–18.

11.

Mor

, Sood

and Goyal

, Application of machine learning technique for prediction of road accidents in haryana-a novel approach, Journal of Intelligent Fuzzy Systems 38 (2020), 1–10.

12.

Sharma

, Phan

and Lee

, An application study on road surface monitoring using dtw based image processing and ultrasonic sensors, Applied Sciences 10 (2020), 4490.

13.

Salerno

, Ceppi

, Mancini

, Corbari

, Ravazzani

and Perotto

, Francesco spada, enrico maggioni, and maria vecchio. A study of an algorithm for the surface temperature forecast: From road ice risk to farmland application0, Applied Sciences 10 (2020), 4952.

14.

Grande

, Castillo

, Mora

and Lo

H.K.

, Highway and road probabilistic safety assessment based on bayesian network models, Computer-Aided Civil and Infrastructure Engineering 32(5) (2017), 379–396.

15.

Castillo

, Grande

and Calviño

, Bayesian networks-based probabilistic safety analysis for railway lines, Computer-Aided Civil and Infrastructure Engineering 31(9) (2016), 681–700.

16.

Karimnezhad

and Moradi

, Road accident data analysis using bayesian networks, Transportation Letters 9(1) (2017), 12–19.

17.

Deublein

, Schubert

, Adey

B.T.

, Köhler

and Faber

M.H.

, Prediction of road accidents: A bayesian hierarchical approach, Accident Analysis and Prevention 51 (2013), 274–291.

18.

Zou

and Yue

W.L.

, A bayesian network approach to causation analysis of road accidents using netica, Journal of Advanced Transportation 2017, 2017.

19.

Schlögl

and Stütz

, Methodological considerations with data uncertainty in road safety analysis, Accident Analysis and Prevention 130 (2019), 136–150. Road Safety Data Considerations.

20.

Bergel-Hayat

, Debbarh

, Antoniou

and Yannis

, Explaining the road accident risk: Weather effects, Accident Analysis and Prevention 60 (2013), 456–465.

21.

Qiu

and Nixon

W.A.

, Effects of adverse weather on traffic crashes: Systematic review and meta-analysis, Transportation Research Record 2055(1) (2008), 139–146.

22.

Shanmuganathan

and Sallis

, Data mining methods to generate severe wind gust models, Atmosphere 5 (2013), 12.

23.

Al-Matarneh

, Sheta

, Bani-Ahmad

, Alshaer

and Al-oqily

, Development of temperaturebased weather forecasting models using neural networks and fuzzy logic, International Journal of Multimedia and Ubiquitous Engineering 9 (2014), 343–366.

24.

, Xue

, Zhang

and Lu

, Neural fuzzy inference system-based weather prediction model and its precipitation predicting experiment, Atmosphere 5 (2014), 788–805.

25.

Aung

, Zhang

, Dhelim

and Ai

, Accident prediction system based on hidden markov model for vehicular ad-hoc network in urban environments, Information (Switzerland) 9 (2018), 12.

26.

Orsini

, Gecchele

, Gastaldi

and Rossi

, Large-scale road safety evaluation using extreme value theory, IET Intelligent Transport Systems 14 (2020), 01.

27.

, Jia

and Sun

, Identifying factors that influence the patterns of road crashes using association rules: A case study from wisconsin, united states, Sustainability 11 (2019), 1925.

28.

Zahra el Mazouri

, chaouki Abounaima

and Zenkouar

, Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of france, Journal of Big Data 6 (2019), 01.

29.

Zou

, Zhang

and Cheng

, Exploring the impact of climate and extreme weather on fatal traffic accidents, Sustainability 13(1), 2021.

30.

López

J.L.

, Hernández

, Urrutia

, López-Cortés

X.A.

, Araya

and Morales-Salinas.

, Effect of missing data on short time series and their application in the characterization of surface temperature by detrended fluctuation analysis, Computers Geosciences 153 (2021), 104794.

31.

Foken

, The energy balance closure problem: An overview, Ecological Applications 18(6) (2008), 1351–1367.

32.

Williams

, Richardson

A.D.

, Reichstein

, Stoy

P.C.

, Peylin

, Verbeeck

, Carvalhais

, Jung

, Hollinger

D.Y.

, Kattge

, Leuning

, Luo

, Tomelleri

, Trudinger

C.M.

and Wang

Y.-P.

, Improving land surface models with fluxnet data, Biogeosciences 6(7) (2009), 1341–1359.

33.

Monteith

J.L.

, Evaporation into the atmosphere: Theory, history and applications, wilfried h. brutsaert, d. reidel publishing co., 1982. no. of pages; vii-299. price: $34.95, Journal of Climatology 3(2) (1983), 213–213.

34.

Weston

K.J.

, Boundary layer climates (second edition). by t. r. oke. methuen. 1987. pp. 435 + xvi. £39.95 hardback; £14.95 paperback, Quarterly Journal of the Royal Meteorological Society 114(484) (1988), 1568–1568.

35.

Cappé

, Moulines

and Ryden

, Inference in Hidden Markov Models (Springer Series in Statistics). Springer-Verlag, Berlin, Heidelberg, 2005.

36.

Rao

and Gupta

, Implementing improved algorithm over apriori data mining association rule algorithm, IJCST 3 (2012), 489–493.

37.

Guo Anfu

and Yu , Method for mid-long-term prediction of landslides movements based on optimized apriori algorithm, Applied Sciences 9 (2019), 3819.

38.

Agrawal

and Srikant

, Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 94, pp. 487–499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.

39.

Karimi-Majd

A.-M.

and Mahootchi

, A new data mining methodology for generating new service ideas, Information Systems and e-Business Management 13 (2015), 421–443.

40.

Wang

, Li

, Huang

and Su

, Association rules mining based analysis of consequential alarm sequences in chemical processes, Journal of Loss Prevention in the Process Industries 41 (2016), 03.

41.

Varzaneh

, Neysiani

B.S.

, Ziafat

and Soltani

, Recommendation systems based on association rule mining for a target object by evolutionary algorithms. 2, 05 2018.

42.

Han

, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery 8 (2004), 53–87.

43.

Agrawal

and Srikant

, Fast algorithms for mining association rules. Proc. 20th Int. Conf. Very Large Data Bases VLDB, 1215, 08 2000.

44.

Bhandari

, Gupta

and Das

, Improvised apriori algorithm using frequent pattern tree for real time applications in data mining, Procedia Computer Science 46, 11 2014.

45.

Tank

, Improved apriori algorithm for mining association rules, International Journal of Information Technology and Computer Science 6 (2014), 15–23.

46.

Al-Maolegi

and Arkok

, An improved apriori algorithm for association rules, International Journal on Natural Language Computing 3, 03 2014.

47.

Doshi-Velez

and Kim

, Towards a rigorous science of interpretable machine learning. arxiv preprint. http://arxiv.org/abs/1702:08608, 2017.

48.

Molnar

, Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/, 2019.

49.

Ribeiro

M.T.

, Singh

and Guestrin

, “why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery.

50.

Lundberg

S.M.

and Lee

S-I.

, A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc.

51.

Chakraborty

, Ba¸sağaoğlu

and Winterle

, Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling, Expert Systems with Applications 170 (2021), 114498.

52.

Chaibi

, Benghoulam

E.M.

, Tarik

, Berrada

and El Hmaidi

, An interpretable machine learning model for daily global solar radiation prediction, Energies 14(21), 2021.

53.

Parsa

A.B.

, Movahedi

, Taghipour

, Derrible

and Abolfazl (Kouros) Mohammadian , Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis, Accident Analysis Prevention 136 (2020), 105405.

54.

Apache spark frequent pattern mining. https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html.

55.

Kangas

, Heikinheimo

and Hippi

, Roadsurf: A modelling system for predicting road weather and road surface conditions, Meteorological Applications 22 (2015), 544–553.

56.

Williams

, Design heat requirements for snow melting systems. 01 2010.

57.

and Tan

, Development and testing of heat- and mass-coupled model of snow melting for hydronically heated pavement, Transportation Research Record 2282(1) (2012), 14–21.

58.

Liu

, Rees

and Spitler

, Modeling snow melting on heated pavement surfaces. part i: Model development, Applied Thermal Engineering 27 (2007), 1115–1124.

59.

Rees

, Spitler

and Xiao

, Transient analysis of snow-melting system performance, ASHRAE Transactions 108 (2002), 406–423.

60.

Ryden

, Estimating the order of hidden markov models, Statistics 26(4) (1995), 345–354.

61.

Giudici

, Ryden

and Vandekerkhove

, Likelihood-ratio tests for hidden markov models, Biometrics 56(3) (2000), 742–747.