Application of machine learning technique for prediction of road accidents in Haryana-A novel approach

Abstract

Over the last few years, road accidents in developing countries are increasing at an alarming rate. In India, almost 3% of GDP is getting wasted in road accidents, which not only cause social problems but, also, imposes a huge burden on the Indian economy. Various researches have been done to analyze this situation using different methods and techniques on different stretches and intersections. This paper makes one of the first attempts to develop an Accident Prediction Model (APM) in the Indian State of Haryana. This study describes the procedure for collection and analysis of accident data, as well as the detailed methodology used to develop APMs. The Models were developed using one of the most common algorithms of machine learning i.e. linear regression technique. Results obtained from APM of Haryana State were compared with the results given by some of the highly successful APMs like Smeed’s Model, Valli’s Model and their comparisons were discussed to find the most efficient model. It was observed that the proposed model shows highly accurate results in predicting road accidents in Haryana. The output of this work can be used for theoretical as well as practical applications like road safety management for improving existing conditions of the road network in Haryana and to regulate new traffic safety policies in the future.

Keywords

Accident prediction model linear regression road safety accidents

1 Introduction

Accident Prediction Models (APMs) are very powerful tools that are being used widely for determining the factors associated with road accidents, and their patterns over a period of time [1]. As the prediction of a road accident is a complex phenomenon, it gives serious challenges to the road safety management team, planners and engineers as well. The present investigation was carried out in Haryana, a Northern state of India, lying on the fringe of the national capital area, New Delhi. Over the recent two decades, (1991–2011), the street crashes in Haryana have expanded six times, while the fatalities in the street have increased 11 times [2]. Although, the geographical area of Haryana is 1.34% of India and the population of the state is 2.2% of the country, but the contribution of the state in fatalities and road accidents is 3.5% and 2.2% respectively making the state in top 13 unsafe state of India (MORTH, 2016). Fig. 1 shows that the values for all Road-Safety-Indicators in the state are more terrible than the national average. Fig. 1 also shows the value of Accident Severity Index (ASI) for the country. ASI is defined as the Index which defines the seriousness of the accident. Higher the value of ASI, more dangerous the accident is. Due to the heterogeneity nature of traffic in Haryana, the scenario of road accident prediction has grown more complex.

Fig. 1

Road safety indicators in Haryana 2016.

Haryana has been experiencing a very fast and unprecedented growth of the road network and the vehicles [3]. In the last five years, the growth rate of registered motor vehicles in Haryana has been above 10%. For meeting this demand, constant up-gradation, construction, and widening of the existing road are being done but still after all these efforts, traffic accidents are not decreasing. As per the long profile of accidents in Haryana, during the period 1996–2016, there was a total 190403 Road Traffic Accident (RTA) in Haryana, out of which 76044 people died in road crashes [4]. The analysis also shows that out of these deaths, 29.6% were female and 70.4% male. During this period motor vehicle occupants i.e. passengers, motorcyclists and drivers accounted for an average of 83% while the contribution of cyclists and pedestrians accounted for 17%. Since the usage of the bicycle is not so common in Haryana, the rate of pedestrians and cyclists fatalities declined from 27% in 1999 to 18% in 2016.

Although, there is no such definite pattern of accidents in Haryana, but overall, it can be seen that an average number of fatalities, accidents, and injuries in the state are 4235, 10299 and 9474 respectively. This is in spite of the fact that Haryana is the primary Indian state to present “Vision Zero” activity on road safety in 2016, an idea adopted by just a couple of developed nations to reduce fatalities on street but still the issue is yet to be resolved.

1.1 Machine learning and its applications in civil engineering

The world is moving towards the latest and upgraded technology nowadays. With increasing number of road accidents, there is an urgent need to study the factors responsible for this issue and in order to analyze such huge data we definitely need some latest and accurate technique. Machine learning is one of the latest methods for prediction of any kind of data. In brief, its application in civil engineering includes approaches related to hydraulic problems, stability of structure, prediction of strength of concrete, analysis for different kind of loads, determination of energy consumption and carbon emission, design of structures, for construction management and for prediction of road accidents for sustainable road infrastructure. In this study, we primarily focused on application of machine learning techniques for prediction of road accidents using linear regression technique. for this kind of analysis One of the latest computer science technique which is being used by different agencies for advanced and précised results regarding prediction

1.2 Linear regression technique

Linear regression is possibly one of the most well-known and well-understood algorithms in machine learning and statistics. This technique is being widely used to develop a relationship between the dependent variable and the independent variables. Some of the basic requirements of a good linear regression model in machine learning are:

The dependent variable should have a high correlation coefficient (preferably >0.5) with the independent variable and the independent variables should have low correlation with each other.

The goodness of fit, i.e. R2 value should be preferably greater than 0.7.

The standard error of estimates should be minimum.

The signs of the coefficients should be logical.

In the present study, we used this concept of machine learning for the estimation of road accidents in the state of Haryana.

1.3 Objectives of the present study

To identify the factors responsible for the occurrence of accidents in the state of Haryana.

To propose an efficient and safe transportation model for reducing the number of road accidents in the state of Haryana.

To validate the proposed model.

To compare the proposed model with some of the most successful models to find the best one.

2 Literature review

“The first successful APM was given by Smeed in 1949 for predicting road accidents in 20 developing countries including India. The equation was used for many years by many countries and the same is given in Equation (1) (Table 1)”.

Table 1
Smeed’s, Andreasen, and Valli’s equation for prediction of road accidents

Model Name Equations Eq. No. Nomencla-ture

Smeed’s Model “D / N = 0.0003 (N / P) ^–^0.67” 1 Here, D/C represents total death, P = population of the area, F = Fatalities, I = Injuries, N = number of registered vehicles

Andreasen Model “C = e21.26 * N0.495 * ^P^–^0.83 2

F = e13.47 * N0.613 * P^–^0.63 3

I = e13.47 * N0.604 * P^–^0.54” 4

Valli’s equation “C/N = 0.000817(N/P)^–^0.75 5

F/N = 0.000315(N/P)^–^0.58

I/N = 0.001453(N/P)^–^0.57” I/N = 0.001453(N/P)^–^0.57”

Model Name	Equations	Eq. No.	Nomencla-ture
Smeed’s Model	“D / N = 0.0003 (N / P) ^–^0.67”	1	Here, D/C represents total death, P = population of the area, F = Fatalities, I = Injuries, N = number of registered vehicles
Andreasen Model	“C = e21.26 * N0.495 * ^P^–^0.83	2
	F = e13.47 * N0.613 * P^–^0.63	3
	I = e13.47 * N0.604 * P^–^0.54”	4
Valli’s equation	“C/N = 0.000817(N/P)^–^0.75	5
	F/N = 0.000315(N/P)^–^0.58
	I/N = 0.001453(N/P)^–^0.57” I/N = 0.001453(N/P)^–^0.57”

In 1991, Andreasen introduced one more independent variable i.e. road length and modified Smeed’s model for the same purpose and the equations given by him are presented in Equations (2–4).

In 1991, “P. Valli further studied various responsible factors for road accidents in 23 large metropolitan cities of India. The author concluded that there is no single model that could predict road accidents in all areas as parameters like population, road length, vehicle density varies from place to place”. The equations given by the author are presented in Equations (5–7).

William et al. (2011) found that traffic flow, length of the segment, junction density, and type of terrain are the significant factors responsible for the occurrence of accidents for two-lane highways in Ashanti region.

S. Jaglan et al. [5] identified the factors responsible for the occurrence of accidents on selected highways of Haryana. The authors found that vulnerable users are more affected in the state and detailed study is required to explore the reason behind the accidents.

Singh et al. (2016) [6] carried out a study to predict road accidents on non-urban sections of Haryana using the M5 regression tree and negative binomial technique. The authors found that minor access to the highways need to be properly designed and service road needs to be made functional.

Singh et al. (2015) studied to identify the factors responsible for accidents on NH-1, SH-11, and SH-20 of Haryana using accidental and traffic volume data. The authors concluded that cars and trucks are the main factors responsible for accidents on the selected stretch and vulnerable users are highly affected in this zone [8].

Mor and Sood (2017) studied to find the correlation between accidents and traffic volume on NH-1 of Haryana and the main conclusion given by the authors was that there is a significant relationship between the two variables.

However, there is no study on predicting road accidents for the state of Haryana. The primary objective of the research is to develop an APM to identify the factors that are likely to affect injuries in the state of Haryana. Furthermore, the models were to be used to identify those significant variables which are affecting the safety of users, i.e. land use, geometry, registered vehicles, etc.

3 Choice of the explanatory variable

Normally, the criteria for choosing dependent and independent variables are the accessibility and availability of data and those variables which were found significant in the literature (past-studies) [8]. For the present study, population and registered motor vehicles are chosen based on literature as these two variables are found to be significant by most of the researchers [9]. A highly significant APM was introduced by Pramada et al. by adding road length as an additional parameter. Road length has been considered by a lot of researchers thereafter [7 , 11] and other parameters are taken as per time availability for collecting data. Based on the previous research five variables were taken as independent variables, whereas, accident frequency in terms of accident/year was taken as a dependent variable.

4 Dataset

Accidental data covering a period of 21 years (1996–2016 inclusive) for the selected State was obtained from police, traffic department and National Crime Record Beruae (NCRB). Data contained information like the number of people killed, injured and the total number of accidents. Inescapably, the database is liable to some gauge of under-reporting, in any case, since no broad investigations have been completed to evaluate the scale, it will be hard to represent it in any deliberate way in the present examination. Other data included was GDP at current and constant prices (in crores) from the statistical department, road length (in km) from PWD department, population (in thousand), registered vehicles (in thousand) from transport department and employed people(in thousand) from employment exchange department of Haryana. All the data was compiled in Ms-Excel 2013 for further analysis.

5 Methodology of the research work

For research to be carried out, the available relevant literature was studied to find out the factors responsible for the causation of accidents. After that, data was collected from different departments of Haryana and were compiled in MS-Excel for selecting significant variables, scatters plots were plotted between variables and the number of crashes and values of R² was calculated. Significant variables with higher R² values are selected for development of the model. The equations are developed in SPSS software using linear regression algorithm of Machine Learning. A detailed methodology adopted to carry out the research is shown in Fig. 2.

Fig. 2

Methodology of the proposed work.

5.1 Procedure for development of accident prediction model

For the development of model, we used a linear regression algorithm of machine learning for predicting accidents, fatalities, and injuries in the state, the data have been split into two parts: training and testing. Training is the part used for the development of the model while the testing part is used to check the predictive power of the model. The data for the year 1996–2015 has been taken for the training of model and the data for the year 2015–2017 have been taken to test the model power. Based on the priority of factors which is depending on the value of R², one by one, factor has been added in the Statistical Package for the Social Sciences (SPSS) software using linear regression for developing the equation. The value of R² shows how many percentages of cases are explained by the given data. Significant variables with higher R² values are selected for development of the model. In the output of the model, if the significance value of each factor is less than 0.05(at 95% confidence level) then the factor is incorporated into the model and if the significance value is greater than 0.05 then the factor was excluded as that factor is not suitable for developing the model. Based on the different number of trials, the equation for predicting the road accident in Haryana is given by Equation 1, Equation 2, and Equation 3.

For the present study, total accident, fatal accidents and injured person are taken as the dependent variable and other parameters like GDP, population, road length, etc. are taken as independent variables. The method of minimum squares and maximum likelihood are used to estimate the variable coefficients. T-test is used to check the distribution of variables (should be normally distributed). Variance-Inflation-Factors (VIF) (Tables 2, 3, 4 column: 10) is used to explain the amount of multi-collinearity (correlation between predictors) present in the regression analysis. While Tolerance factor (Tables 2, 3, 4: column: 9) is checked to measure the effect of one independent variable on all other independent variables. All of the above tests are used to check the significance of the incorporated variables. A detailed value of the significant variables and their coefficients using SPSS software are given in Table 2, Tables 3, and 4.

Table 2
Significant Coefficients and their values for the Accident Prediction model

Model 1 Unstandardized Coefficients Standardized Coefficients Beta T Sig. 95.0% Confidence Interval for B Collinearity Statistics

B Std. Error Lower Bound Upper Bound Tolerance VIF

(Constant) 44.787 9.575 – 4.68 0.01 24.48 65.08

Road length per 1000 RV 0.100 0.047 0.657 2.1 0.05 0.000 0.200 0.036 27.8

Road length per 1000 population –19.981 12.20 –0.358 –1.6 0.02 –45.85 5.891 0.073 13.8

GDP per 1000 population –0.003 0.004 –0.327 –0.9 0.07 –0.011 0.004 0.027 36.7

RV per 1000 population –0.332 0.43 –0.359 –0.7 0.05 –1.240 0.576 0.016 61.9

Model 1	Unstandardized Coefficients	Standardized Coefficients Beta	T	Sig.	95.0% Confidence Interval for B	Collinearity Statistics
(Constant)	44.787	9.575	–	4.68	0.01	24.48	65.08
Road length per 1000 RV	0.100	0.047	0.657	2.1	0.05	0.000	0.200	0.036	27.8
Road length per 1000 population	–19.981	12.20	–0.358	–1.6	0.02	–45.85	5.891	0.073	13.8
GDP per 1000 population	–0.003	0.004	–0.327	–0.9	0.07	–0.011	0.004	0.027	36.7
RV per 1000 population	–0.332	0.43	–0.359	–0.7	0.05	–1.240	0.576	0.016	61.9

Table 3

Significant Coefficients and their values for the Fatality Prediction model

Model 2	Unstandardized Coefficients		Standardized Coefficients	T	Sig.	95.0% Confidence Interval for B		Collinearity Statistics Beta
	B	Std. Error				Lower Bound	Upper Bound	Tolerance	VIF
(Constant)	20.178	7.313	–	2.75	.014	4.675	35.68	–
Employed persons per 1000 population	0.181	0.058	0.534	3.14	0.006	0.059	0.303	0.31	3.1
Road length per 1000 population	–2.798	5.430	–0.167	–0.51	0.013	–14.30	8.714	0.8	11.5
Road length per1000 RV	–0.064	0.014	–1.412	–4.67	.001	–0.094	–0.035	0.1	10
RV per 1000 population	–0.478	0.077	–1.723	–6.18	0.001	–0.642	–0.314	0.11	8.5

Table 4

Significant Coefficients and their values for Injury Prediction model

Model 3	Unstandardized Coefficients		Standardized Coefficients	T	Sig.	95.0% Confidence Interval for B		Collinearity Statistics
	B	Std. Error	Beta				Lower Bound	Upper Bound	Tolerance	VIF
(Constant)	12.3	13.9	–	0.87	0.039	–17.31	41.9	–	–
RV per 1000 population	–1.6	0.39	–0.917	–4.1	0.001	–2.42	–0.78	0.016	62.0
Road length per 1000 RV	0.08	0.04	0.274	1.8	0.013	–0.012	0.169	0.036	27.8
Road length per 1000 population	28.49	13.1	0.270	2.1	0.017	0.480	56.5	0.050	19.8
GDP per 1000 population	0.01	0.003	0.488	2.86	0.012	0.002	0.017	0.027	37.2
Employed persons per 1000 population	–0.05	0.107	–0.024	–0.47	0.042	–0.279	0.177	0.311	3.22

Tables 2, 3, and 4 show the effect of each predictor (independent variable) with their weight on the dependent variable (accident). All these variables are significant at the 0.05 confidence level. The common significant variables primarily responsible for the occurrence of an accident in the state are Road length per 1000 registered vehicles, Road length per 1000 population, and registered vehicles per 1000 population. The rest of the significant variable for model 1 is GDP per 1000 population, model 2 is employed persons per 1000 population and model 3 is GDP per 1000 population and employed persons per 1000 population.

Using all these variables and their coefficients, three models were developed using SPSS software for accident prediction in the state and the same are given in Equation 8, Equation 9, and Equation 10.

$\begin{matrix} Model - 1, A / R = 0 . 1 * (L / R) - 19 . 981 * (L / P) \\ - 0 . 003 * (G / P) - 0 . 332 * (R / P) + 44 . 787 \end{matrix}$ (1) $\begin{matrix} Model - 2, F / R = 0 . 181 * E - 2 . 798 * (L / P) \\ - 0 . 064 * (L / R) - 0 . 478 * (R / P) + 20 . 178 \end{matrix}$ (2) $\begin{matrix} Model - 3, I / R = 0 . 01 * (G / P) + 28 . 493 * (L / P) \\ + 0 . 079 * (L / R) - 0 . 51 * (E / P) - 1 . 601 * (R / P) + 12 . 302 \end{matrix}$ (3)

Where, A/R = Accidents/1000 registered vehicles, F/R = Fatalities/1000 registered vehicles, I/R = Injuries/1000 registered vehicles, L/R = road length/ 1000 motor vehicle, L/P = Road length/ 1000 population, E = number of employed people, R/P = registered vehicle/ 1000 population, G/P = GDP/ 1000 population, E/P = employed person/ 1000 population.

5.2 Details of accident prediction model

Usually, goodness-of-fit of any model is measured by the coefficient of determination (R²) i.e. explained variation/total variation. Based on Table 5 and Fig. 3, it can be explained that the coefficient of determination, is 0.945, 0.854 and 0.988 for Model 1 (Accident Prediction), Model 2 (Fatalities prediction), and Model 3(Injuries prediction) respectively. Higher the value of R², the higher will be the predictive power of the model. This shows that the accident can be predicted accurately using 94.5% (for model 1), 85.4% (for model 2), and 98.8% (for model 3) of the independent variables incorporated in the model and the remaining 5.5%, 14.6% and 1.2% is explained by those variables which are not incorporated in the model. Adjusted R² needs to be check when we have a large number of independent variables. The difference between R² and adjusted R² should be minimum. As the difference between R² and adjusted R² is very small, it shows that the variables selected for the development of the model are important and significant enough to explain the causes of accidents.

Table 5
Model Summary

Sr. No. Model Summary Model 1 (Accident Prediction) Model 2 (Fatalities Prediction) Model 3 (Injuries Prediction)

1 R 0.972 0.924 0.994

2 R Square 0.945 0.854 0.988

3 Adjusted R Square 0.931 0.818 0.984

4 Std. Error of the Estimate 1.585 0.771 1.423

Sr. No.	Model Summary	Model 1 (Accident Prediction)	Model 2 (Fatalities Prediction)	Model 3 (Injuries Prediction)
1	R	0.972	0.924	0.994
2	R Square	0.945	0.854	0.988
3	Adjusted R Square	0.931	0.818	0.984
4	Std. Error of the Estimate	1.585	0.771	1.423

Fig. 3

Model summary.

5.3 Correlation among data

Generally, we find the relationship between the dependent and independent variables but there is a need to check the relationship among independent variables also i.e. how independent variables are affecting each other. In order to check that the Pearson correlation matrix is presented in Table 6. Pearson-correlation coefficient (r) (Table 8, row 2) measures the strength of the relationship between two variables. It varies from -1 to 1. If the value of r is -1, it shows a perfect negative linear relationship between variables, if the value of r is 0, it indicates no linear association between variables, and an r with value 1 indicates a perfect positive linear relationship between variables. Higher the value of Pearson correlation, the higher the dependency of variables. For example, from Table 6, a high relationship can be seen between accident/1000 RV and Registered Vehicle/1000 population and road length/1000 registered vehicles. Parameters with a higher value were considered for the development of the model. N = 21 represents that there were total 21 entries for each parameter in the dataset. For the given dataset, due to the strong correlation among variables, it was expected to estimate the true significant variables.

Table 6
Pearson-Correlation matrix for the variables in SPSS

Parameters RV/ 1000 RL/1000 RL/1000 GDP/ 1000 EP/1000

P RV P P P

RV/1000 P Pearson Correlation 1 –0.920 –0.886 0.960 0.559

Sig. (2-tailed) – 0.000 0.000 0.000 0.008

Covariance 42.39 –236.91 –0.622 3630.6 19.4

N 21 21 21 21 21

RL/1000 RV Pearson Correlation 0.920 1 0.921 –0.820 –0.68

Sig. (2-tailed) 0.000 – 0.000 0.000 0.001

Covariance –236.90 1565.383 3.930 –18854.3 –142.67

N 21 21 21 21 21

RL/1000P Pearson Correlation –0.886 0.921 1 –0.868 –0.776

Sig. (2-tailed) 0.000 0.000 – 0.000 0.000

Covariance –0.622 3.930 0.012 –54.387 –0.446

N 21 21 21 21 21

GDP/1000 P Pearson Correlation 0.960 –0.820 –0.868 1 0.531

Sig. (2-tailed) 0.000 0.000 0.000 – 0.013

Covariance 3630.6 –18854.3 –54.38 337515.4 1645.6

N 21 21 21 21 21

EP/1000 P Pearson Correlation 0.559 –0.677 –0.776 0.531 1

Sig. (2-tailed) 0.008 0.001 0.000 0.013 –

Covariance 19.40 –142.67 –0.446 1645.6 28.4

N 21 21 21 21 21

	Parameters	RV/ 1000	RL/1000	RL/1000	GDP/ 1000	EP/1000
RV/1000 P	Pearson Correlation	1	–0.920	–0.886	0.960	0.559
	Sig. (2-tailed)	–	0.000	0.000	0.000	0.008
	Covariance	42.39	–236.91	–0.622	3630.6	19.4
	N	21	21	21	21	21
RL/1000 RV	Pearson Correlation	0.920	1	0.921	–0.820	–0.68
	Sig. (2-tailed)	0.000	–	0.000	0.000	0.001
	Covariance	–236.90	1565.383	3.930	–18854.3	–142.67
	N	21	21	21	21	21
RL/1000P	Pearson Correlation	–0.886	0.921	1	–0.868	–0.776
	Sig. (2-tailed)	0.000	0.000	–	0.000	0.000
	Covariance	–0.622	3.930	0.012	–54.387	–0.446
	N	21	21	21	21	21
GDP/1000 P	Pearson Correlation	0.960	–0.820	–0.868	1	0.531
	Sig. (2-tailed)	0.000	0.000	0.000	–	0.013
	Covariance	3630.6	–18854.3	–54.38	337515.4	1645.6
	N	21	21	21	21	21
EP/1000 P	Pearson Correlation	0.559	–0.677	–0.776	0.531	1
	Sig. (2-tailed)	0.008	0.001	0.000	0.013	–
	Covariance	19.40	–142.67	–0.446	1645.6	28.4
	N	21	21	21	21	21

^*Here, A = total accidents, FA = Fatal accidents, IP = injured person, RV = Registered vehicles, RL = registered vehicles, P = population, N = values for each parameter, and EP = employed person in Haryana.

5.4 Results and discussions

Model 1, Model 2 and Model 3 are expected to give almost actual results in order to have maximum efficiency. To adopt the accurate and best model, actual values of accidents/1000 registered vehicles are compared to the predicted values using Smeed equation, Andreessen equation, Valli equation (Equation 1–7) and the proposed model (Equation 8–10) and their difference are noted in Table 7, Table 8, and Table 9. The results show that the predicted accidents by the proposed model are almost near to the actual values of accidents in the state. Hence, the model can be considered in accordance with the existing conditions to predict the actual number of accidents.

Table 7
Actual and Predicted values of accidents using different models

Year Actual accidents/1000RV Predicted values of accidents per 1000 motor vehicle by Chi-square value

Smeed’ model Andreassen equation Valli equation Haryana Model difference in predicted and actual accidents

1996 34.8759 0.0007886 3.63431 0.0001793 33.83296 1.04293 0.0321

1997 29.7248 0.0007141 3.32233 0.0001604 31.29822 –1.57347 0.0791

1998 25.9532 0.0005969 2.99339 0.0001313 27.32066 –1.36750 0.0684

1999 27.2104 0.0006182 3.30360 0.0001365 28.27854 –1.06816 0.0403

2000 31.1708 0.0006456 3.47106 0.0001433 29.46205 1.70879 0.0991

2001 29.0276 0.0006318 3.36825 0.0001399 29.30559 –0.27803 0.0026

2002 27.6044 0.0006163 3.31797 0.0001360 28.90522 –1.30086 0.0585

2003 26.0846 0.0005841 3.22226 0.0001281 28.06070 –1.97606 0.1392

2004 27.6700 0.0005498 3.32035 0.0001197 27.00422 0.66577 0.0164

2005 24.2269 0.0004902 3.10130 0.0001053 25.06313 –0.83627 0.0279

2006 24.3962 0.0004470 1.86120 0.0000950 23.39534 1.00087 0.0428

2007 24.3740 0.0004385 1.85946 0.0000930 22.42219 1.95180 0.1699

2008 24.0062 0.0004587 1.92213 0.0000978 22.22505 1.78110 0.1427

2009 20.2215 0.0004195 2.07622 0.0000885 19.85721 0.36427 0.0067

2010 18.5674 0.0003782 2.11933 0.0000788 18.37335 0.19406 0.0020

2011 16.9803 0.0003628 2.10627 0.0000752 17.62653 –0.64621 0.0237

2012 16.9618 0.0003564 3.19935 0.0000737 17.07235 –0.11055 0.0007

2013 16.5482 0.0003556 3.40566 0.0000735 17.72162 –1.17346 0.0777

2014 15.6034 0.0003412 3.15831 0.0000702 16.88116 –1.27774 0.0967

2015 15.6817 0.0003354 2.76421 0.0000689 16.57193 –0.89027 0.0478

2016 15.5344 0.0003354 3.89001 0.0000688 16.13714 –0.60269 0.0225

Year	Actual accidents/1000RV	Predicted values of accidents per 1000 motor vehicle by	Chi-square value
1996	34.8759	0.0007886	3.63431	0.0001793	33.83296	1.04293	0.0321
1997	29.7248	0.0007141	3.32233	0.0001604	31.29822	–1.57347	0.0791
1998	25.9532	0.0005969	2.99339	0.0001313	27.32066	–1.36750	0.0684
1999	27.2104	0.0006182	3.30360	0.0001365	28.27854	–1.06816	0.0403
2000	31.1708	0.0006456	3.47106	0.0001433	29.46205	1.70879	0.0991
2001	29.0276	0.0006318	3.36825	0.0001399	29.30559	–0.27803	0.0026
2002	27.6044	0.0006163	3.31797	0.0001360	28.90522	–1.30086	0.0585
2003	26.0846	0.0005841	3.22226	0.0001281	28.06070	–1.97606	0.1392
2004	27.6700	0.0005498	3.32035	0.0001197	27.00422	0.66577	0.0164
2005	24.2269	0.0004902	3.10130	0.0001053	25.06313	–0.83627	0.0279
2006	24.3962	0.0004470	1.86120	0.0000950	23.39534	1.00087	0.0428
2007	24.3740	0.0004385	1.85946	0.0000930	22.42219	1.95180	0.1699
2008	24.0062	0.0004587	1.92213	0.0000978	22.22505	1.78110	0.1427
2009	20.2215	0.0004195	2.07622	0.0000885	19.85721	0.36427	0.0067
2010	18.5674	0.0003782	2.11933	0.0000788	18.37335	0.19406	0.0020
2011	16.9803	0.0003628	2.10627	0.0000752	17.62653	–0.64621	0.0237
2012	16.9618	0.0003564	3.19935	0.0000737	17.07235	–0.11055	0.0007
2013	16.5482	0.0003556	3.40566	0.0000735	17.72162	–1.17346	0.0777
2014	15.6034	0.0003412	3.15831	0.0000702	16.88116	–1.27774	0.0967
2015	15.6817	0.0003354	2.76421	0.0000689	16.57193	–0.89027	0.0478
2016	15.5344	0.0003354	3.89001	0.0000688	16.13714	–0.60269	0.0225

Table 8

Actual and Predicted values of fatalities using different models

Year	Actual value of Fatalities 1000RV	Predicted values of accidents per 1000 motor vehicle by					Chi-square value
		Smeed’ model	Andreassen equation	Valli equation	Haryana Model	difference in predicted and actual values
1996	4.3309	0.00037	3.5251	0.0009436	5.32166	–0.9907	0.0347
1997	12.0050	0.00035	3.5534	0.0008660	6.45195	5.5530	0.7408
1998	8.6541	0.00035	3.6135	0.0007415	7.55087	1.1032	0.0213
1999	11.1628	0.00032	3.5883	0.0007643	7.66402	3.4987	0.2084
2000	13.8572	0.00028	3.5596	0.0007936	10.5924	3.2647	0.0950
2001	11.6967	0.00032	3.5567	0.0007789	10.7937	0.9029	0.0070
2002	11.8261	0.00032	3.5590	0.0007623	10.9926	0.8335	0.0057
2003	11.3748	0.00029	3.5716	0.0007277	11.2338	0.1410	0.0002
2004	11.3822	0.00027	2.2871	0.0006905	11.2434	0.1387	0.0002
2005	9.4960	0.00024	3.6243	0.0006253	10.8892	–1.3932	0.0164
2006	10.9385	0.00022	3.6529	0.0005772	10.6166	0.3219	0.0009
2007	10.3393	0.00019	3.6525	0.0005678	10.1992	0.1401	0.0002
2008	9.9946	0.00019	3.6669	0.0005903	11.0870	–1.0925	0.0097
2009	8.7535	0.00018	3.7004	0.0005464	10.2281	–1.4747	0.0208
2010	8.3280	0.00018	2.7094	0.0004995	9.20057	–0.8726	0.0090
2011	7.5320	0.00017	3.7067	0.0004819	8.80437	–1.2724	0.0209
2012	7.3984	0.00016	2.9984	0.0004745	8.51131	–1.1129	0.0171
2013	7.1185	0.00014	3.1130	0.0004736	8.96939	–1.8509	0.0426
2014	6.5521	0.00014	3.2209	0.0004569	8.65588	–2.1038	0.0591
2015	6.8472	0.00013	3.2300	0.0004502	8.62793	–1.7807	0.0426
2016	6.9472	0.00012	3.2156	0.0004502	8.63352	–1.6863	0.0381

Table 9

Actual and Predicted values of fatalities using different models

Year	Actual values of injured cases	Predicted values of accidents per 1000 motor vehicle by					Chi-square value
		Smeed’ model	Andreassen equation	Valli equation	Haryana Model	difference in predicted and actual values
1996	51.2506	2.8636	0.0057	0.00045	50.4205	0.8301	0.0137
1997	44.9171	2.8770	0.0058	0.00041	45.7142	–0.7971	0.0139
1998	38.0344	2.8913	0.0058	0.00035	38.0162	0.0182	0.0000
1999	39.1579	2.9057	0.0858	0.00037	38.5940	0.5639	0.0082
2000	41.3398	2.9221	0.1258	0.00038	39.2622	2.0775	0.1099
2001	34.6891	2.9435	0.3352	0.00037	37.2007	–2.5116	0.1696
2002	35.5345	2.9542	0.4252	0.00036	35.8112	–0.2767	0.0021
2003	31.7619	2.9723	0.8359	0.00035	32.9992	–1.2373	0.0464
2004	29.1851	2.9844	0.7160	0.00033	30.3890	–1.2040	0.0477
2005	25.9680	2.9968	0.3368	0.00030	25.3883	0.5797	0.0132
2006	20.9639	3.0039	0.1963	0.00028	21.5400	–0.5762	0.0154
2007	24.3078	2.9868	0.0060	0.00027	22.4776	1.8302	0.1490
2008	26.2394	2.9966	0.6104	0.00028	24.5274	1.7120	0.1195
2009	22.0649	2.9815	0.0060	0.00026	22.5884	–0.5235	0.0121
2010	17.7930	2.9905	0.3060	0.00024	18.1472	–0.3542	0.0069
2011	15.8508	2.9973	0.0060	0.00023	16.7424	–0.8917	0.0475
2012	14.8676	3.0046	0.7262	0.00023	17.1856	–2.3179	0.3126
2013	14.8116	3.0567	0.5861	0.00023	16.1337	–1.3221	0.1083
2014	14.3918	3.0673	0.2161	0.00022	14.1379	0.2539	0.0046
2015	14.5239	3.0878	0.2962	0.00022	13.5472	0.9767	0.0704
2016	14.5623	3.0956	0.3461	0.00022	15.1786	–0.6163	0.0250

Fig. 4 clearly indicates that Smeed, Valli, and Andreessen equations are not applicable for predicting road accident scenario in the State of Haryana while the proposed model is highly suitable and accurate in predicting accident scenario for the selected State. As the predicted values of the proposed model are following the trend line, it indicates that our predicted values of accidents are approximately near to the actual values of accidents in the State of Haryana.

Fig. 4

Actual and predicted values of Accidents using different models.

Fig. 5 indicates that the equation given by Andressen is comparatively better than the Smeed and Valli equation for predicting fatalities in Haryana, but, in this case, also, the proposed model shows high accuracy among all four models.

Fig. 5

Actual and Predicted values of Fatalities using different models.

Fig. 6 indicates that all three models other than the proposed model are not applicable in predicting injuries in Haryana. While the prediction of injuries by the proposed model is almost equal to the actual scenario of the state. In addition to this, the predicted values, in this case, follow the trend line most accurately among all three cases.

Fig. 6

Comparison of Actual and predicted values of Injuries using different models.

6 Conclusions and recommendations

In order to assist a sustainable safe transport system, we need a more practical, planning-level-approach to road safety. The primary objective of this study was to provide a proactive, feasible, detailed-planning-level tool that can help in reduction in accidents and better safety of the road users, complementing the traditional road safety methods. The kind of model is the first successful model that we have given for the state of Haryana. After analysis, the results show that it is possible to develop a relationship between characteristics pertaining to accidents and the crashes on a macro level using machine learning techniques.

The study confirmed the status of RTAs as a major and increasing cause of social, health and economic problems in Haryana. From the data analysis and observation of road accidents in the state of Haryana, several conclusions can be made:

1. The models for predicting road accidents in the state are formulated as: $\begin{matrix} Accident prediction (A / R) = 0 . 1 * (L / R) - \\ 19 . 981 * (L / P) - 0 . 003 * (G / P) - 0 . 332 * (R / P) + 44 . 787 \\ Fatality Prediction (F / R) = 0 . 181 * E - 2 . 798 * (L / P) \\ - 0 . 064 * (L / R) - 0 . 478 * (R / P) + 20 . 178 \\ Injuries Prediction (I / R) = 0 . 01 * (G / P) \\ + 28 . 493 * (L / P) + 0 . 079 * (L / R) - 0 . 51 * (E / P) \\ - 1 . 601 * (R / P) + 12 . 302 \end{matrix}$

2. Smeed model, Andreessen equation and Valli equation show poor results in predicting road accidents in Haryana hence, these equations cannot be used to predict road accidents scenario in the state while the proposed model shows highly accurate results. So, the proposed model can be used for theoretical as well as practical applications like road safety management for Haryana state as well as for any state with similar accidental and socio-economic conditions.

3. The main factors responsible for the occurrence of accidents in the state are population, road length and registered vehicles but employment and GDP are also contributing significantly. The reason behind this may be due to the high purchasing capability of vehicles by the people as GDP and employment rate of Haryana is very high.

4. Accidents can be reduced by imposing strict traffic rules and there is a need to introduce some new policies regarding encouragement for the use of public transport vehicles and simultaneously to reduce the growth of personalized vehicles.

5. Apart from the engineering efforts, it is also a public duty to promote and contribute to safe road transportation system by educating other people regarding road safety in terms of use of seat belts, helmets, following traffic rules.

6. The use of computer science techniques like machine learning for prediction of road accidents is found to be most effective and highly accurate, hence, these techniques can be used in the planning phase for improving road safety.

References

Mor

, Sood

and Goyal

, Development and Corroboration of crash prediction Model, International Journal of Pure and Applied Mathematics119(15) (2018), 413–421.

Chetna , Mor

and Sood

, Black Spots Identification on Pinjore to Baddi Road, International Journal of Pure and Applied Mathematics120(6) (2018), 6473–6488.

MORTH, Road Accidents in India, 2017, Available at: [http://morth.nic.in/showfile.asp?lid=3369]. [Accessed: 10- May- 2019].

Mor

, Sood

and Goyal

, Development of aModel for Accident Prediction, Proceedings of International Conference on Urban Sustainability: Emerging Trends, Themes, Concepts and Practices (ICUS), Jaipur, India, 2018.

Dass

and Jaglan

, Identification and Management of Traffic Accidents on Selected Stretch of NH73A, Journal of General Management Research4(1) (2017), 18–32.

Singh

, Sachdeva

S.N.

and Pal

, M5 model tree based predictive modeling of road accidents on non-urban sections of highways in India, Accident Analysis and Prevention96 (2016), 108–117.

Singh

, Sachdeva

S.N.

and Pal

, Analysis of Causal Factors of Accidents on Highways in Haryana, IOSR Journal of Mechanical and Civil Engineering4(1), 2015.

Mor

and Sood

, Correlation of Accident with Traffic Volume of NH-1. International Journal of Engineering Technology Science and Research4(7) (2017), 948–950.

Eenink

, Martine Elvik

, Sofia

and Stefan

, Accident prediction models and road safety impact assessment: recommendations for using these tools, Institute for Road Safety Research, Leidschendam, (2008).

10.

Dinu

and Veeraragavan

, Random parameter models for accident prediction on two-lane undivided highways in India, Journal of safety research42(1) (2011), 39–42.

11.

Chikkakrishna , Kumar

, Manoranjan

and Sukhvir

, Crash prediction for multilane highway stretch in India, Proceedings of the Eastern Asia Society for Transportation Studies, (2013).