Abstract
Recommendations analysis of road safety requires decision-making tools that accommodate weather uncertainties. Operation and maintenance of transport infrastructure have been one of the sub-areas that require attention due to its importance in the quality of the road. Several investigations have proposed artificial neural networks and Bayesian networks to assess the risk of the road. These methods make use of historic accident records to generate useful road safety metrics; however, there is less information on how climatic factors and road surface conditions affect the models that generate recommendations for safe traffic. In this research, Bayesian Network, as a Hidden Markov Models, and Apriori method are proposed to evaluate the open and closed state of the road. The weather and road surface conditions are explicitly written as a sequence of latent variables from observed data. Different weather variables were studied in order to evaluate both road states (open or close) and the results showed that the Hidden Markov Model provides explicit insight into the sequential nature of the road safety conditions but does not provide a directly interpretable result for human decision making. In this way, we complement the study with the Apriori algorithm using categorical variables. The experimental results show that combining the Hidden Markov Model and the Apriori algorithm provides an interpretable rule for decision making in recommendations of road safety to decide an opening or closing of the road in extreme weather conditions with a confidence higher than 90%.
Introduction
Climate change poses a significant threat to several economic activities in developing and developed countries. Most studies have focused on the long-term impacts of climate change and severe weather events to the transportation sector. However, despite the importance of roads and public infrastructure, less attention to the potential consequences of such events to the operation and maintenance of transportation infrastructure [1, 2].
Experts have been warning about the systematic changes in the frequency of extreme climate and weather events. These events may have a direct or indirect impact on the roads. Indirect impacts can be related to traffic loading as a result of demographic changes [3]. Direct impacts are mainly due to weather conditions such as temperature, rain, wind and snow. Yuan and Li wrote a review of studies on traffic prediction. The authors focused on the spatio-temporal data layer and its application to intelligent transportation. In particular, the authors discussed pre-processing, prediction and traffic application based on the existing techniques for addressing these challenges [4].
Conversely, a changing pattern in the frequency of extreme events requires incorporating such uncertainties into the operational and emergency management of roads and highways. Generate accurate early warnings to safer driving and operational maintenance of transportation infrastructure, on extreme weather, require decision-making tools that accommodate to the weather uncertainties. Previous works have used binary logit/probit models [5], spatial and temporal correlations [6], Markov switching models [7], and genetic algorithms [8], artificial neural networks and Bayesian networks, among other machine learning methods, to road safety analysis and provide recommendations for driving [9–11]. In particular, Sharma et al. proposed an efficient road surface monitoring using an ultrasonic sensor and image processing technique to improve the classification and accuracy of road surface detection conditions [12]. Dongyao et al. [10] proposed a novel method based on BA-BP (Bat-Back Propagation) algorithm which is applied on five kinds of road features to safety condition evaluation (including roughness, curvature, obstacle width to height ratio, obstacle effective area ratio, obstacle coefficient). On the other hand, Del Vecchio et al. developed an algorithm to assess the occurrence of road ice and its potential application to determine the surface temperature of agricultural land. The algorithm works operationally every day in real-time and has been tested for different weather conditions [13]. The basic components of a probabilistic assessment tool include random variables associated with hazards and assets as well as the utility of decisions. In the context of climate change, hazards represent changing weather conditions and extreme events. Conversely, assets represent the infrastructure and its economic value in terms of the transportation system and the cost of replacement. Probabilistic safety analysis is a well known technique for evaluating the safeness of a system. Bayesian networks have been proposed for the probabilistic evaluation of road and safety analysis [14]. The authors developed a probabilistic safety analysis methodology (PSA) of highways and roads. The main idea consists of identifying and reproducing all the elements encountered when travelling the road, identifying the direct dependencies among variables to reproduce the qualitative structure of the Bayesian network, and where the human error is used to model how the driver’s behaviour evolves with A probabilistic safety analysis methodology (PSA) of highways and roads is presented. The main idea consists of identifying and reproducing all the elements encountered when travelling the road, identifying the direct dependencies among variables to reproduce the qualitative structure of the Bayesian network, and where the human error is used to model how the driver’s behaviour evolves with driving. Bayesian networks are suitable for this problem because allow to express the multi-dimensional variables into several factors that follow conditional independence properties. Furthermore, Bayesian networks provide a simple causal model that surpasses the limitations of other competing frameworks for risk assessment [9]. Bayesian networks have been also used to analyse safety of other transportation infrastructure such as railway lines [15], road accidents [16, 17] and causation analysis of road accidents [18] among other applications. Most research on probabilistic road safety analysis have considered explanatory variables such as drivers attention, road geometry, weather conditions and traffic volume [19]. On the other hand, weather variables such as temperature and rainfall can be associated to driver’s behaviour and road conditions.
In particular, significant correlations among the weather variables and the aggregate number of injury conditions have been found [20, 21]. Besides, the relevant effects of adverse weather conditions on traffic crashes. The authors compensated the number of crashes with the low traffic volumes observed during adverse weather conditions (which to a greater extent can be used to explain the number of crashes). The authors reported increased crash rates for wet pavement (more than 300%), rain (71%) and snow (84%). However, considering the effect of a single variable in isolation might not reveal the interaction between other weather variables (low temperature, fog, etc) and its effects on visibility and road conditions.
Related works
Building accurate and explainable models for extreme weather events is a complex task. An approach to reveal patterns that can deliver meaningful information for decision making was developed in [22]. The authors used several data mining techniques such as Artificial Neural Networks and Self-Organizing Maps in conjunction with rule-based systems such as C5.0 and CART. The results of their work indicate that the data mining algorithms can be used to explain the weather patterns that lead to wind gusts at a specific location.
Traditional weather forecasting methods use continuous data as an input. Instead, a fuzzy inference engine transforms the crisp data into fuzzy sets (linguistic variables) that can be used to generate explainable rules. Conversely, neuro-fuzzy inference systems were proposed in [23, 24]. However, fuzzy sets can handle ambiguity in the definition of the categorical variables (such as low-medium-high) rather than the uncertainty of the event.
An accident prevention system for urban environments using Hidden Markov Models (HMM) have been previously studied in [25]. The system make use of injury statistics and related weather conditions obtained from nearby stations to estimate a latent variable that represents the risk of a crash. The authors performed evaluations on simulation results and demonstrated the effectiveness and robustness of their approach. A large-scale road safety evaluation was proposed in [26]. The authors considered extreme value theory to estimate the probability of extreme events and used microwave Doppler radar data for calibration.
Road safety models that produce interpretable predictions have received less attention. Yu et al. studied factors that influence the patterns of road crashes using association rules [27]. The study shows how the Apriori algorithm is used to mine the significant association rules between the severity and the factors influencing the occurrence of crash accidents. Mazouri et al. proposes a data mining combined to the multi-criteria decision analysis for analyzing road accident data. The authors show significant correlations between conditions that led to accidents multi-criteria decision methods [28]. In addition, Zou et al. explore the impact of climate and extreme weather on fatal traffic accidents. A negative binomial model and a log-change model are proposed to analyze the impact of various factors on fatal traffic accidents, showing that both models can provide accurate fitting results and how the climate variables can significantly affect the frequency of fatal traffic accidents [29].
In this paper we propose an unsupervised approach to obtaining explainable rules to issue early warnings for the open or close state of the road. In the proposed approach, we assume that there is a historical records generated by in-situ weather stations. A Hidden Markov Model (HMM) with continuous meteorological data along with the Apriori algorithm with categorical data are both used. The combined method provides a predictive model for the open or close state of the road, while also delivering explainable rules for decision-making. The main contributions of this approach can be summarized as follows: Early warnings for the open or close state of the road are treated as a latent binary indicator variable that can be estimated using a continuous HMM from the observed weather variables. The Apriori algorithm is used to build frequent itemsets from the categorical weather variables (which can be derived from the continuous data) along with the indicators of early warnings for the open or close state of the road. The resulting itemsets achieve strong confidence in a real-world database and provides explainable rules for decision making.
The contributions of this article aim to provide rules for decision-making. The rules can be used proactively to helps to decide the opening or closing of the customs border between Chile-Argentina which should operate every day of the year. This customs border, from May it presents regular storms and snowfalls, which interrupt vehicular and cargo traffic between both countries, conditions that endanger business, communications and integration between the two countries [30].
Materials and methods
Energy balance in a terrestrial surface layer
Undoubtedly, under extreme weather conditions and sudden changes in the weather, the control and safety of road traffic is a task of great difficulty. Prior knowledge of the state of the roads is of utmost importance to ensure maintenance, provide safety for drivers, and ensures economic activity. Several studies in this regard have been developed previously, however, the large number of variables included in the different researches make it difficult to develop a model that effectively describes the state of the roads. For this purpose, the earth’s surface constitutes a highly influential interface for processes in the atmosphere and for exchanges between the surface and the atmosphere. Likewise, it is a fundamental fact that the development of climatic and meteorological models that can help describe the surface condition of roads must be based on the conservation or balance of energy on the earth’s surface, under different atmospheric conditions, water availability and land use, among others, and that are useful when calibrating and evaluating their performance [31, 32]. The energy balance is based on a one-dimensional formulation that considers the four main forms of energy transfer through a surface layer [33, 34]; radiation, conduction, sensible heat flux, and latent heat flux, and that can be expressed as:
where the incoming energy flow is taken as positive and outgoing as negative. R
n
is the net radiation on the surface, L
e
E is the latent heat flux, H is the sensible heat flux, L
p
is the thermal conversion factor for fixation of carbon dioxide, F
p
the specific flux of CO2, G the energy flux leaving the lower boundary layer, A
h
the energy advection into the layer, and
The net radiation can be broken down into:
The short-wave radiation

Energy balance schema in a terrestrial surface layer.
Hidden Markov Models (HMMs) are probabilistic graphical models for sequential data [35]. In this context, weather data arrives in batches and Markov models are well suited for this task since allows to explicitly write a factored distribution for a sequence of latent variables from observed data. The HMM consists of a discrete-time Markov chain with state x t representing the hidden road conditions and an observation model with state z t representing the weather variables. The factored distribution can be written as follows:
The hidden and observed states convey a state space model, which is a generative representation for the noisy observations. The state space model consists of a transition probability p (x t |xt-1) for the hidden states, an initial probability p (x1) and an emission probability p (z t |x t ) that accounts for the conditional probability of observing z t given x t . The hidden states x t are discrete and the emission probabilities are continuous. Figure 2 shows the Bayesian network representation for the HMM.

Bayesian network representation. The hidden states of the road x1, …, x T are observed through the observed weather variables z1, …, z T
The association’s study is basic for the development of different tasks that involve correlations, classifiers, associations, and groupings, to name a few [36, 37]. Association rules are one of the most important branches in data mining. The Apriori, an algorithm was developed by Agrawal and Srikant in 1994 [38]. This technique can be used to find correlations and patterns within a set of elements [39], such as identification of pathological conditions [40], and recommendation systems [41], among others. The algorithm starts by mapping the database to find frequent elements from the set that appear together several times (item-sets). Then, the algorithm reduces the number of item-sets by eliminating rare elements and combining item-sets with a single element.
Several strategies for database scanning have been proposed. The Candidate Generation-based (CGB) is a breath first strategy [38] and the Pattern Growth-based (PGB) is a depth-first strategy [42]. Also, various efforts have been made to improve and optimize the Apriori algorithm [36, 43–46]. In [43] a novel and efficient algorithm for mining frequent patterns in large databases was presented. More recently, [36] presented a new scheme that overcomes some of the disadvantages of the original Apriori algorithm. The improved method takes into account the time complexity, number of database scans, memory consumption, and the pattern evaluation rules. [44] also proposed an improvised Apriori algorithm using an FP-tree data structure. The method reduces the memory space and time complexity of the algorithm by considering a frequent pattern tree.
Apriori algorithm
Apriori is a simple algorithm and can be summarized as follows: Scan the database to find k frequent item-sets. Verify that each k-itemset must be greater than or equal to the minimum support threshold. If an itemset is not frequent, prune from search space in the database. Repeat the procedure to obtain k + 1 itemsets until there are no more patterns.
Explainable Machine Learning
An interpretable machine learning system can be defined in terms of the model ability to explain or to present its outcomes in a form that can be easily understood by a human counterpart [47]. This requirement is not necessarily accomplished by traditional metrics to evaluate the model performance and therefore this topic has remained elusive in the machine learning literature [48].
Examples of interpretable systems are simple models such as association rules, that are intrinsically able to provide explanations along with their predictions. When the model does not provide such insights, post-hoc techniques can be applied after training and thus can be considered as being model-agnostic.
LIME (Local Interpretable Model-Agnostic Explanations) is one example of post-hoc processing [49]. In this case, the authors propose local surrogate models that are used to explain individual predictions. Another method named SHAP (SHapley Additive exPlanations) is also inspired by local surrogate models [50]. SHAP provides explainable predictions using additive feature importance measures.
Recently, Chakraborty et al. [51] applied an explainable machine learning framework to quantify the importance of hydro-climatic variables to impute hourly reference evapotranspiration. Also, Chaibi et al. [52] proposed an interpretable machine learning model for daily global solar radiation prediction. The authors used the SHAP method to identify the influence of each one of the features and their interactions using the period of time as one of the features. Parsa et al. [53] developed a methodology for real time accident detection using Gradient Boosting and SHAP for feature analysis. Due to the class imbalance (accidents versus non-accidents records), the authors used oversampling techniques in order to obtain accurate predictions.
These model-agnostic techniques do not take into account the time domain which is necessary for evaluating road safety. Post-hoc methods can be applied to temporal aware models such as recurrent neural networks (e.g using model distillation). However, these techniques are usually used in the context of supervised learning which is not directly applicable to the road safety analysis case. To the best of the author’s knowledge, the method proposed in this paper is the first approach that combines temporal Bayesian networks such as HMMs and Apriori rules to provide explainable decision making for road safety recommendations in extreme weather conditions.
Proposed model
The proposed approach was implemented using Python 3.7 and R. In particular, the following combination of software libraries were used: HMM training and inference: hmmlearn 0.2.4 https://hmmlearn.readthedocs.io/en/latest/ Data manipulation: pandas 0.2.4 https://pandas.pydata.org/. Data analysis: R project 3.6.3 https://www.r-project.org/. Apriori algorithm https://github.com/ymoch/apyori.
The model consists of a continuous HMM from the observed weather variables. The latent states of the HMM represent the road safety indicators which are then used along with the categorical weather variables to build frequent itemsets. The time complexity for training the HMM is O (K2T) with T being the number of data points and K = 2 the number of road safety indicators. In the other hand, the time complexity of the Apriori algorithm is O (2D+1) with D = 9 being the number of categorical variables (see Table 2). The overall complexity of the proposed approach remains exponential with respect to the number of categorical variables, therefore model refinement could make use of a distributed FP-growth representation [54].
The HMM model is used to produce latent factors associated with the current road conditions (road safety indicators). These factors are then used to produce explainable recommendations by means of the Apriori algorithm. Figure 3 shows a depiction of the proposed approach for explainable decision making.

Decision making is based on the categorical weather variables and the estimated road safety indicators obtained from the HMM.
Accidents
Accidents records were obtained from the data observatory of the Chilean National Committee of Transportation Security (CONASET). The dataset contains N = 704 records from years 2015-2019 along the CH-115 route in the Region of Maule, Chile. The accident types were collisions, injuries and rollovers while the causes vary from alcohol ingestion, mechanical failures, driver’s attention and lost control among others. Figure 4 show the distribution of the number of accidents given the year, month and weather conditions. Most accidents occur in summertime when there is more traffic congestion. Conversely, accidents tend to occur under clear sky conditions during the summer period of the southern hemisphere.

Distribution of number of accidents per year/month/weather conditions
In winter time, the local authorities send road closure notifications in order to prevent accidents. The observed records does not not reflect the effect of adverse weather on the number of accidents. Figure 5 shows the location and weather conditions of the accidents in the CH-155 route.

Accidents in CH-115 route.
Weather and road conditions data were registered from 2017 at 2020 for the winter period in the southern hemisphere (May - October) and obtained from two weather stations located at kilometer 121 (Lo Aguirre, 1230 meters above sea level) and kilometre 138 (Los Condores, 2000 meters above sea level) on the CH-115 route (Ministerio de Obras Públicas MOP, Pehuenche international road, Región del Maule - Chile). Probabilistic assessment of risk is modelled as a hidden state and the observed variables is the combined information from both weather stations. Figure 6 shows the location of the weather stations along the CH-115 route.

CH-115 route in the Region of Maule in Chile.
The observed weather data, that contains numerical and categorical variables, are shown in Table 1 and Table 2 respectively.
Numerical observed data for meteorological conditions and road surface state
Categorical observed data for meteorological conditions and road surface state
This article proposes an explainable Hidden Markov Model as a method to provide safety recommendations in extreme weather conditions. In order to evaluate the feasibility of the approach, weather and road surface data are used to train the model and provide interpretable rules. As previously mentioned, extreme weather due to climate change has a direct impact on the transportation and infrastructure sectors. Therefore, decision-making tools that accommodate to the uncertainty due to a changing pattern in the frequency of extreme events are required. In particular, accident records, weather and road surface conditions from the CH-155 route that connects Chile and Argentina is studied and the resulting Apriori rules are shown as evidence of the proposed approach.
The daily fluctuations of temperature and seasonal changes of weather are mostly due to the earth’s energy balance (see Sec. 3.1 for a brief description). Conversely, predictive models for surface temperature and road conditions are based on the energy balance, depending on the weather variables involved and the conditions imposed. Recent models developed include effects of a include the effects of a superficial layer of snow or ice on the pavement, hydrological surface condition, surface friction, wind conditions, snow melting, and thermal conductivity due to the density of these superficial layers [55–59]. The database used includes several variables that directly appear in the energy balance model: global radiation (R
n
), surface temperature (T
s
), and air temperature (T
a
) to name a few. In the other hand, variables such as the accumulated snow amount or the presence of the ice layer on the surface, can indirectly be related to terms such as the latent heat flux (L
e
E) or the rate of energy storage per unit of area (

Surface temperatures and states for the observed sites
Using the HMM, the road condition is modelled as a binary latent variable x t = {Open, Closed} and the numerical observed meteorological variables z t follow a multivariate Gaussian distribution with spherical covariance matrix. The forward-backward algorithm is used to estimate the transition and emission probabilities. The estimated parameters of the HMM can be found in Table 3.
Estimated HMM parameters
Once the parameters of the HMM have been found, the most likely sequence for latent road conditions can estimated using the Viterbi algorithm. This algorithm uses a dynamic programming approach to find the latent sequence of road conditions (that can be either {open} or {closed}). Figure 8 shows the estimated road conditions and the associated climate variables.

Climate variables associated with road conditions
Now we are interested in observing differences in the weather variables given the predicted labels. Figure 9 shows the box-plots for each of these weather variables related to the open and closed state of the road. An analysis of the statistical relevance of the weather variables was also conducted (see Table 7). Several variables such as global radiation, relative air humidity, air temperature, surface temperature 1, Water layer thickness and snow layer thickness, among others, showed statistical significance (p < 0.001).

Conditional distribution of the climate variables given the road conditions
So far, we have assumed the number of hidden states to be known. This assumption is based on the utility of the latent variable used for the road conditions given the observed climate variables. The number of hidden regimes, namely the order of the HMM cannot be estimated consistently from the observed data [60]. Also, standard methods such as likelihood-ratio tests do not hold the required regularity conditions when applied to non i.i.d samples. Nevertheless, likelihood-ratio tests for multivariate Gaussian HMM have been used for the purpose of performing model comparisons between switching regimes [61]. Table 4 shows the result of the statistical tests for the climatic variables under two alternating conditions.
p-values of the Wilcoxon and T test for numerical observed data for meteorological conditions and road surface state
Summary of Apriori rules, Case 1, for the road condition forecast (min support = 0.01, min confidence = 0.70, min lift = 3, min length = 2)
Apriori algorithm was applied in categorical data set to study the variables and conditions that associate with a closed road state (See. Table 2). Two cases are shown in this study with different minimal support: Case 1: min support = 0.01, min confidence = 0.70 min lift = 3, min length = 2 Case 2: min support = 0.02, min confidence = 0.70 min lift = 3, min length = 2
In this work, the focus is to determine the variables that lead to a {closed} road condition. The results for the {closed} road state, for both cases studied, are shown in Table 6 and 7. We can see, from the first and second columns, that the {snow m.} item, belonging to the categorical observed data Rain status happen together to {closed} road status with a confidence higher than 90%. This makes sense since in all cases where {snow m.} item is contained, the road state is closed. Likewise, the Rain status:{snow m.} is always accompanied for all or some of the following items: {all OK}, {accept} and/or {with snow}, belonging to the observed variables Atmosphere State, Visibility sensor status and Surface state Site (1,2) respectively. Special interest have those cases where the confidence is less than 90%, and in which the Rain status:{snow m.} is not present, this suggest that Rain status:{snow m.} is decisive to get a closed road state. Finally, the lift tells us that the road status {closed} is over 9 times more likely to occur when the weather condition {snow m.} is present.
Summary of Apriori rules, Case 2, for the road condition forecast min support = 0.02, min confidence = 0.70, min lift = 3, min length = 2
Summary of Apriori rules, Case 2, for the road condition forecast min support = 0.02, min confidence = 0.70, min lift = 3, min length = 2
In this paper, we presented an approach for early warnings for the open or close state of the road analysis based on a Hidden Markov Models and the Apriori algorithm. The weather road conditions were modelled as a time-dependent binary random variable, which is not observed directly. Instead, the causal relationship between the current weather conditions and early warnings were modelled as a Hidden Markov Model. The model provides a probabilistic measure of the operational risk related to the road. Furthermore, the model can be also extended to include decision variables for the given road safety recommendations.
Most machine learning models are treated as black-boxes, where the output of the model is used to make decisions. Markov models provide explicit insight into the sequential nature of the road safety conditions. However, the results are not directly interpretable for human decision making.
To provide interpretable rules for decision making, the Apriori algorithm was used on the categorical variables. The two cases presented in this study, with different minimal support, showed the {snow m.} item happen together to {closed} road status with a confidence higher than 90%, while those cases where Rain status:{snow m.} it is not present, the confidence is less than 90%, and Rain status:{snow m.} is decisive to get a closed road state. In this way, the HMM provides the predictive power to predict the open or closed state of the road according to the different weather variables studied, and the rule learning method provides a structural assessment and informative intuition about road safety conditions.
The Apriori algorithm, allows us to understand which were the weather variables directly involved in the open and closed state of the road. In this aspect, this work constitutes a complete exploration from the point of view of the use of machine learning methods to solve decision-making on roads where the different weather variables are involved. This makes sense since in all cases where snow m. item is contained, the road state is closed. Likewise, the Rain status:{snow m.} is always accompanied for all or some of the following items: {all OK}, accept and/or {with snow}, belonging to the observed variables Atmosphere State, Visibility sensor status and Surface state Site (1,2) respectively. Special interest have those cases where the confidence is less than 90%, and in which the Rain status: {snow m. } is not present, this suggests that Rain status:{snow m.} is decisive to get a closed road state. Finally, the lift value tells us that the road status {closed} is over 9 times more likely to occur when the weather condition {snow m.} is present. In this way, the combinations between HMM and Apriori algorithm allowed us to obtain a confidence higher than 90% and an interpretable rule for decision making in road safety management. Further research would include evaluation measures between different interpretable models for road safety recommendations and their use in real world scenarios.
Footnotes
Acknowledgments
The authors are grateful to the Ministerio de Obras Públicas (MOP, Región del Maule - Chile), Dr. Juan Figueroa Meriño, Dr. Luis Morales-Salinas for providing the Figure of the region of study and Mario Guiachetti Valenzuela for providing the database used in this research.
Appendix A: Additional weather variables
