Abstract
BACKGROUND:
The data mining of construction accidents based on a robust modeling process can be used as a practical technique for reducing the frequency of construction accidents.
OBJECTIVE:
This study was designed to data-mine construction accidents.
METHODS:
This study was conducted in 2020 on construction accidents in Iran for ten years (2009–2018). The instruments to collect the required data were the checklists and descriptive reports of the accidents. The dependent variables of the study included reactive safety indicators related to construction accidents (lost working days (LWD) and total accident costs (TAC)). The independent variables consisted of four latent factors: personal variables, organizational variables, unsafe working conditions, and unsafe acts. The data were collected based on the conceptual model designed for data mining. The data mining process was carried out based on the structural equation modeling by IBM AMOS V. 23.0.
RESULTS:
A total of 5742 construction accidents occurring in 10 years were analyzed. The means of TAC and LWD indicators were estimated to be 248.20±52.60 days and 1893.10±152.22 $. These two indicators directly correlated with the two latent factors of unsafe conditions and unsafe acts and their related variables and were indirectly influenced by latent personal and organizational factors. The relationship between unsafe conditions and unsafe acts was significantly positive. The relationship between latent personal and organizational factors and the two construction accident indicators was significantly negative (p <0.05).
CONCLUSION:
The model results showed that personal and organizational variables could, directly and indirectly, affect reactive safety indicators in construction projects. Thus, these findings can be used to design and improve safety strategies to prevent and decrease construction accidents and incidents.
Introduction
With rapid economic formation, the construction industry typically continues to rank among the most dangerous industries in the world [1–3]. Occupational safety in general, and construction safety in particular, is a complex phenomenon. Hence, construction safety has still been a substantial concern for practitioners and researchers [4, 5]. Construction projects are one of the industrial sectors that always have many shortcomings in terms of safety performance [6, 7]. Studies have shown that safety in the construction sector is weak compared to other industrial areas, and various incidents, from minor accidents to many fatal accidents, occur in this sector [8]. Construction workshops can be considered as the most challenging field of construction for various reasons. Various reports have shown that a high percentage of accidents and deaths in the construction sector are related to construction workshops [9]. Some studies have shown that construction workers recorded the highest number of deaths among construction projects [10]. In construction sites, every worker is directly exposed to safety risks. In addition to causing all kinds of damage to workers, safety risks can affect the cost of projects, the schedule, and the quality of implementation of these construction projects [11]. Risk factors that cause high-risk conditions in these projects can include constant changes in the project location, use of many different resources, poor working conditions, unstable employment, unsafe and risky environmental conditions (such as noise, vibration, dust and heat strain), unsafe acts, personal, occupational, managerial and organizational factors [12–17].
In Europe, construction projects create 30% of fatal occupational accidents yet employ merely 10% of the working population. In the United States of America (USA), the incidence rate of accidents in construction projects is twice that of the industrial average. According to the USA National Safety Council (NSC), there are an estimated 2200 deaths and 220,000 disabling injuries each year due to construction project accidents [18]. Almost 37 percent of all occupational accidents in Iran occur in construction projects. Yet employs merely 10% of the working population in the country [19]. Generally, the construction sector employs about 7% of the world’s workforce but is responsible for 30–40% of occupational injuries [20]. Furthermore, construction injury causes considerable economic losses. Globally, construction injury’s direct and indirect costs have been estimated at over 10 billion USD per year [21].
Despite the importance of investigating occupational safety in Iran’s construction projects, this subject matter is not studied well, and there is a significant research gap for comprehensive analysis in this setting. Besides, each of the performed studies only presented some of the causes of construction accidents, and direct/indirect and hidden/overt roles of these risk factors have not specified [12, 23]. Albeit some hazard identification and mitigation efforts have been performed to better workplace safety in the construction industry, accidents still occur all over the world because of the dynamic, complex, and unpredictable nature of construction sites and various operations [24].
Therefore, using a comprehensive approach to modeling accidents in construction sites to identify the most critical risk factors affecting these accidents’ occurrence can help provide some insight into the key risk items and ultimately design safety strategies to prevent these accidents and reduce the resulting damage. Also, using modeling software can estimate the relationship between the identified risk factors and the effect of each of these risk factors on occupational accidents with a more realistic understanding. Therefore, using the Structural Equation Modeling (SEM) approach as a mathematical, technical, and software modeling tool to analyze and identify complex direct/indirect relationships and interactions of different parameters, analyzing hidden factors and parameters, determine the impact of each element on a phenomenon, can be very suitable for the present study [25, 26].
Therefore, according to those mentioned above and the lack of a comprehensive and fundamental study with sufficient sample size, the present study was designed and conducted for the first time in Iran. Accordingly, this study used the SEM approach to data-mine construction accidents and model the risk factors affecting such accidents on the one hand, and investigate relationships among variables influencing reactive safety indicators in construction projects.
Methods
This retrospective and descriptive-analytical study was conducted in 2019–2020, covering construction accidents in 10 years (2009–2018). The study included all the accidents in construction projects in the specified period. The statistical samples consisted of those construction accidents that resulted in at least one lost working day incurring direct or indirect financial losses.
The instruments used to collect the required data were an accident checklist corroborated by the Inspection Office of the Ministry of Roads and Urban Developments in Iran and a comprehensive report of such accidents (Appendix 1). Also, the researchers referred to the registered documents and records of construction projects. The checklists of accidents comprised information regarding construction projects, the location of the accidents, the type of the accidents, affected workers’ knowledge, the consequences and results of the accidents, the parameters of activities at the time of accidents, the time and place of the accidents, work conditions at the time of accidents and unsafe acts leading to the accidents. The study was carried out in five steps (Fig. 1).
Data gathering

Study steps of the construction accidents data mining.
Initially, the official resources and documents were referred to collect the checklists of construction accidents and their relevant reports. A total of 6087 accidents were extracted in this step.
In the present study, the most critical risk factors affecting the occurrence of accidents were investigated Using a self-administrated checklist of researchers. The classification of the mentioned variables is presented below:
This item includes age, work experience, education level, health status, smoking, drug use, and average daily sleeping hours.
Organizational factors
This item includes working hours per day, income, contraction type, type of job, risk assessment programs, training programs, personal protection equipment (PPE), toolbox meetings (TBM), workload and safety climate. This information was obtained using a standard questionnaire after calculating each item’s score (such as safety climate and workload).
2.1.2.1. Safety climate measurement. To calculate safety climate, a standard questionnaire of occupational safety climate with 37 questions including eight components (management commitment for safety (10 questions), the knowledge of the employees and obeying safety rules (7 questions), the mindset of the employees regarding safety (4 questions), the collaboration of the workers and commitment to pursuing safety (5 questions), the safety of workplace (4 questions), the priority of safety over products (2 questions), and ignoring dangers (2 questions) were used. Scoring was performed on a five-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = no idea, 4 = agree, 5 = strongly agree). The validity and reliability of this questionnaire have been confirmed in previous studies [27].
2.1.2.2. Mental workload measurement. The NASA-TLX is a multi-dimensional procedure that provides an overall workload score based on a weighted average of six intellectual and mental demands, physical demands, temporal demands, effort, overall performance, and frustration levels. The participant scores each of the six dimensions from zero to one hundred based on their working situations. Using the hierarchical analytical method, the importance of each dimension relative to the other dimensions is examined. In this case, the person chooses the most suitable option for the activity between the two points. Each selection is equal to a weighted score for that item. By multiplying the weight of each dimension of the workload (ranging from 0–1) by the scale score of each dimension (ranging from 0 to 100), the total workload of the individual is calculated numerically from 0 to 100. The overall score is expressed as a weighted workload. According to the questionnaire, if the overall workload score is less than 50, the risk level is low, and the risk level is high if it is above 50. The validity and reliability of this questionnaire have been confirmed in previous examinations (Cronbach’s alpha 0.897) [28].
Unsafe act factors
This item includes a perceptual error, skill-based error, decision error, routine error, and exceptional error.
Unsafe acts measurement
Direct observation was employed to record unsafe acts while employees performed their duty. For this goal, a checklist of unsafe acts was prepared based on the risk factors of accidents and near-miss events on the construction sites over the past 20 years and then approved by experts to achieve the ability to use it as a suitable tool for measuring unsafe acts. The checklist consisted of 15 questions and was completed by 20 certified safety experts who worked on 15 construction sites. In this step, each employee was surveyed in an intangible way for 30–45 minutes to record their unsafe acts during working time. The range of unsafe act scores was between 0 and 100% (Appendix 2).
Unsafe conditions factors
This item includes light, noise, thermal strain, ergonomics, equipment, working procedures, and housekeeping factors. The mentioned parameters were recorded as a checklist (presence/absence) during the workplace inspection. Also, all reported measurements of harmful factors in the workplace were reviewed.
Safety performance indicators
This item includes lost working days and total accident costs.
2.1.6.1. Lost working days (LWD). The LWR equation is the total number of workdays lost multiplied by 200,000, divided by the total number of hours worked by all workers within a given period [29]. The equation for calculating the LWD index is given below:
2.1.6.2. Total accident costs (TAC). The TAC is the sum of direct accident costs (DAC) and indirect accident costs (IAC). Direct incident costs include visible and payable costs, such as the cost of fire and compensation for staff illness. Indirect costs include costs that are not easily and instantly visible and can be much higher than direct costs, such as reduced productivity due to employee injury, damage to company reputation and loss of skilled human resources. Some studies, such as the iceberg theory, have stated that the direct to indirect cost ratio is 1 : 4 [30].
Data screening
After gathering the data, the screening process was done based on the inclusion criteria. The inclusion criteria consisted of the following: checklists had to be complete with all necessary information. Only accidents with at least one lost working day and direct or indirect financial losses were included. Those accidents that could not meet any of these criteria were excluded. Accordingly, 345 accidents were excluded, and 5,742 accidents remained to be investigated in the study.
Data entry
The data of construction accidents gathered in the first and second step were entered in the IBM SPSS V.23 software (SPSS Inc., Chicago, IL, USA). Four experts in two phases evaluated the accuracy and precision of the entered data.
Conceptual model design
The researchers designed the study’s conceptual model based on the research’s variables and data and based on different occupational accident analysis algorithms. The cause-effect relationships of variables and factors were investigated to develop the conceptual model.
Descriptive analysis
In the last step, the data mining process of construction accidents was conducted using a structural equation model (SEM) to determine the type and strength of relationships, interactions, and mutual effects of all risk factors.
Data analysis
Data mining is used in this study to analyze the construction accident data. The data analysis was based on the SEM approach, a robust, comprehensive multivariable analytical technique from the multivariable regression family. This technique can determine complex relationships among different variables. It can simultaneously investigate the internal and external variables simultaneously on one hand and incorporate latent variables into the model. The SEM analysis efficiently understands the complex relationships among variables and factors directly/indirectly and manifestly/latently affect accidents. The IBM SPSS AMOS V.23 software was employed in this study to run the SEM analysis. Moreover, the analysis of the model’s goodness of fit was carried out based on the general indexes of χ2/df (2-3), and root mean square error of approximation (RMSEA) (0.05–0.08) and Comparative Fit Index (CFI) (0.95–1.0), Normed Fit Index (NFI) (0.95–1.0), (Non) Normed Fit Index (NNFI) or Tucker-Lewis Index (TLI) (0.95–1.0) [31]. The statistical tests were two-tailed with the significance level assigned at 0.05.
As can be seen from the study results, various mixed variables were used, including continuous, binary, and Likert spectrum variables. A correlation matrix was used to examine these data. It is noteworthy that the polycore package was used in the R environment to estimate the correlation matrix of the desired variables, which are both binary and continuous.
The SEM approach was used because there was a correlation among the independent variables, to control for collinearity among the independent variables, to create a component or combination from the set of independent variables, and to improve the predictive model. The use of SEM in this study is also justified by the fact that there are many variables to consider in the analysis of occupational accidents. Therefore, the variables were examined in 5 groups, and factor analysis was used to characterize the correlations between the causes and factors affecting the incidents.
Results
Descriptive results
The means of LWD and TAC indicators for 5742 investigated accidents were estimated to be 248.20±52.60 days and 1893.10±152.22 dollars (1 dollar = 250,000 Rials), respectively. Additionally, the evaluation of the consequences of these accidents showed that 6,132 workers were influenced by the accidents: 312 cases (5.4%) died, 420 cases (7.3%) were mutilated, 384 cases (6.7%) were maimed and 5,109 (88.9%) suffered from different types of traumas. The data was captured from 3,840 construction projects.
The descriptive results of personal factors revealed that the means of affected workers’ age and work experience were 32.11±12.58 and 6.8±5.92 years, respectively. The mean of affected workers’ sleep time at the time of the accident was 5.72±2.25 hours. Less than 5% of these workers had academic educations. More than one-third of them were in good physical health, and about one-fourth were smokers. Besides, less than one-fifth of the affected workers had a history of using illegal substances (e.g., drugs and alcohol) (Table 1).
Descriptive results of personal factor
Descriptive results of personal factor
The results of organizational factors demonstrated that the workers’ mean working hours/day was 10.27±1.85 hours, and the mean income/month was 180.20±65.55 dollars. More than four-fifth of affected workers were contracting workers, and approximately 90% of the accidents happened to construction workers. Only about one-fourth (24%) of construction projects underwent risk assessments, and less than one-fifth (18.7%) had run safety training for their staff. Personal protective equipment (PPE) was provided to the workers in 29.7% of these projects, and toolbox meetings (TBM) were only held in 3.5% of the construction projects. The mental/physical workload was reported to be high in 95.1% of the projects, and their safety climate was estimated to be unsuitably low (Table 2).
Descriptive results of organizational factor
Note: PPE: Personal Protective Equipment, TBM: Tool Box Meeting.
The results of unsafe act factors showed that perceptual errors, decision errors, and skill-based errors caused 78.1%, 38.2%, and 34.6% of accidents, respectively. The findings also revealed that unsafe acts, including routine and exceptional violations, played a role in 18.3% and 8.5% of the accidents, respectively (Fig. 2). Furthermore, the results of safety conditions factors suggested that inappropriate housekeeping was the main reason behind 46% of construction accidents. There were other factors also involved in construction accidents related to safety conditions: improper working procedures (36.2%), inappropriate equipment (31.8%), and thermal stresses (20.3%). The less effective safety conditions factors were inappropriate light (12.7%), noise (14.2%), and inappropriate ergonomics (14.8%) (Fig. 2).

Descriptive results of unsafe acts and conditions factors.
Data mining results showed that personal and organizational factors and their related variables, directly and indirectly, affected reactive safety indicators (TAC and LWD). There was a significant, negative correlation between these two factors and the indicators (p < 0.05). Likewise, the unsafe conditions and unsafe acts factors, together with their related variables, directly affected reactive safety indicators (TAC and LWD), yet their relationship with these indicators was significantly positive (p < 0.05) (Fig. 3). The SEM model results also demonstrated that the strongest and weakest indirect correlation coefficients were observed in organizational factors. Risk assessment, training, and safety climate were the most robust indirect correlation coefficients of safety indicators. The weakest indirect correlation coefficients of safety indicators were job title and income.

Data mining results of the construction accidents.
The results also revealed that the strongest direct correlation coefficients of TAC and LWD safety indicators were observed in perceptual errors, skill-based errors, and decision errors (unsafe act factors). Similarly, the weakest direct correlation coefficients were related to light, ergonomics, and noise (unsafe conditions factor). The model’s goodness of fit suggested that the designed model was a suitable conceptual model. The goodness-of-fit indices were measured as follows: χ2/df = 2.89, RMSEA = 0.060, CFI = 0.972, NFI = 0.974 and NNFI (TLI) = 0.985.
The construction sector is one of the most dangerous and challenging industry types. The present study’s findings also showed that the construction industry is one of the most hazardous industries, and various risk factors can affect the frequency and severity of accidents. The multifactorial nature of the consequences of construction accidents has been emphasized in various studies [2]. The findings of the data mining of construction accidents based on the SEM approach reported in this study also demonstrated that a group of factors and variables could be considered as the main risk factors of construction accidents: (a) personal factors (age, experience, sleeping, education, health status, smoking, and drug use) [22, 32], (b) organizational factors (working hours per day, income per month, contraction, job, risk assessment, training, PPE, TBM, workload and safety climate) [32], unsafe conditions factors (light, noise, thermal stress, ergonomics, equipment, procedures and housekeeping) [22, 34] and unsafe acts factors (perceptual error, skill-based error, decision error, routine error and exceptional error) [35, 36]. The results revealed that these risk factors have a significant relationship with reactive safety indicators (TAC and LWD) investigated in this study [32, 38]. In agreement with the findings of previous research, these results demonstrated that the environment of the construction project is extremely accident-prone and hence can easily result in accidents or disastrous emergencies [33, 39]. One of the most important reasons for the high frequency and severity of accidents in the construction industry is the high variability of environmental conditions that can severely affect workers’ performance.
Data mining results also revealed that various factors and variables could directly or indirectly influence the occurrence of accidents and safety indicators in construction projects. Consequently, personal and organizational factors and related variables have direct or indirect relationships with safety indicators. The results revealed that they have a significant, negative correlation with safety indicators. Moreover, it was observed that unsafe work condition and unsafe act factors and their variables have a substantial and positive relationship with safety indicators in construction projects, Which is consistent with the results of previous studies [40].
Different studies have emphasized the role and effect of personal risk factors on occupational accidents, such as age, work experience, education, and other individual habits. For instance, Soltanzadeh et al. reported that occupational accidents are significantly related to personal factors such as age, work experience, education [32]. Likewise, Biswas et al. found that occupational accidents can be influenced by the health status and smoking habits of workers [41]. On the other hand, Mohamdfam et al. concluded that organizational factors could also heavily affect construction projects’ safety indicators. They showed that the number of workers in a project, their job and construction duties at the time of the accident, contract type (main or subsidiary contractor), risk assessment, safety training programs, TBM, and PPE could impact the occurrence of accidents and safety indicators [22, 25]. The results of Filho et al.’s study showed that organizational factors, proper planning of workers’ activities, and their familiarity with the technologies used in the industry and how to work safely are among the critical parameters in reducing accidents in this industry [42]. Moreover, Rodrigues et al.’s study demonstrated that effective risk prevention in the construction industry could only be accomplished by a correlation of causal factors, including production and client needs, financial climate, design group competence, risk management, and health and safety procedure [23].
It must be mentioned that workers play a crucial role in accidents in construction projects. Workers’ unsafe acts can be influenced by several risk factors such as personal, organizational, and even environmental variables as well as safety training and risk assessment programs; in addition, these risk factors can directly affect the incidence and severity rate of safety indicators [8, 43]. Data mining results revealed that the strongest direct correlation coefficients of safety indicators belonged to risk assessment, training programs, and safety climate variables. Mohamadfam et al. reported that the reactive indicators of accidents, such as AFR and ASR, have a significant relationship with safety training and safety management variables. A lack of appropriate and sufficient training can lead to carelessness and dangerous acts, and different types of human errors in construction projects [22, 44]. Similarly, some studies have reported that the process of hazard identification and risk assessment can be enhanced by training and educational interventions [45]. Thus, it can be concluded that paying attention to safety training programs and increasing their standards can better understand risks in construction projects, eventually improving safety conditions and reducing the frequency of occupational accidents. Embarking on regular and systematic activities aiming to identify safety, hazardous and dangerous conditions in construction projects (using safety checklists), conducting risk assessment assessments before the onset of a project, codifying a practical and comprehensive guideline for recording and reporting different unsafe conditions and anomalies, maintaining the appropriate quality of PPE and monitoring the observance of rules and regulations in the workplace can all result in the reduction of accidents in construction projects [46, 47]. The results of this study also suggested that inappropriate housekeeping is the most effective variable among unsafe conditions factors impacting safety indicators.
Strengths and limitations
Finally, the model’s goodness of fit showed that this study’s findings could be a practical instrument for making significant decisions to provide insight into the key risk items and prevent construction accidents as the most critical type of occupational accidents in Iran. Thus, drawing on this study’s findings, future studies can investigate the effectiveness and practicality of these findings in decreasing reactive safety indicators in construction projects. The present study results can create a novel scientific insight into the field of construction safety and various risk factors affecting it.
Although this study included 5742 accidents happening in 10 years in 3840 construction projects and data-mined the relationships of 29 variables and four independent factors with two reactive safety indicators, it also faced some limitations. The study’s most noticeable limitations that future studies can solve can be considering more risk factors and variables, more attention to reactive safety indicators in construction projects, and designing a longitudinal and futuristic study.
Conclusion
The results revealed that safety improvement in construction projects needs practical and logical planning in which different risk factors are considered. The main risk factors in construction projects comprise practical training, paying more attention to personal (e.g., experience and health status) and organizational (work conditions, training, and risk assessment programs) risk factors, and implementing controlling measures like housekeeping via regular screening. Moreover, the study results demonstrated that adopting a multilevel analytical analysis and using a software-based approach such as SEM can prove extremely useful for the data mining of safety data. The present study results can be a practical tool to determine the various risk factors affecting the occurrence of accidents on construction sites and develop a proactive safety management algorithm.
Conflict of interest
The authors declare that there is no conflict of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Not applicable.
Funding
Not applicable.
Supplementary materials
The appendix is available from https://dx-doi-org-s.web.bisu.edu.cn/10.3233/WOR-220128.
