Abstract
As an important basic industry of national economy, the iron and steel industry has provided an important raw material guarantee for a long time. However there are a large number of hazard sources which are prone to safety accidents in the production process. Then safety evaluation in the production system is highly needed to effectively prevent the occurrence of accidents in iron and steel enterprises. Hence we introduce a method based on deep learning model to evaluate safety of the enterprises. Firstly, the risk factors and casualties in production process are investigated, and a set of safety evaluation index system is constructed.Secondly, since deep neural network model has the characteristics of strong feature extraction ability and simple model structure, we design a safety evaluation model based on deep neural network. The 25-dimensional evaluation index value is the input of the network, and the network output corresponds to five risk levels. On this basis, the optimization algorithm of deep neural network model is designed to explore the mapping relationship between risk characteristics and safety level. Tensorflow deep learning framework is used to build the evaluation model, classification loss function and network optimization method are designed to train the model. Finally, through experiments, the optimal model structure is determined by comparing the influence of different parameter optimization strategies, different hidden layer structures, and different activation functions on the safety evaluation performance. A three hidden layer structure with the Adam back propagation algorithm and LeakyRelu activation function is adopted to obtain higher accuracy and faster convergence rate. The experiments show that our evaluation model provides an operational method for evaluating the safety management status of iron and steel enterprises.
Keywords
Introduction
In recent years, there have been many fatal accidents in iron and steel enterprises due to their neglect of safety issues, the negative social impact of these accidents should not be underestimated. These accidents not only caused serious casualties, but also caused huge property losses, which also put forward higher requirements for the safety management of iron and steel enterprises. Fig. 1 shows statistics on the number of accidents and deaths in steel enterprises from 2010 to 2019. The number of deaths decreased year by year from 2012 to 2017, but it increased sharply in 2018, indicating that the overall safety situation of China’s iron and steel enterprises is not stable. The safety situation of iron and steel enterprises deserves more attention.

Statistics on the number of accidents and deaths in steel enterprises from 2010 to 2019.
Nowadays, as the production process becoming more and more complex, safety evaluation of iron and steel enterprises can be regarded as a complex nonlinear problem, which can’t be solved by simple mathematical modeling methods. In addition, the risk factors of enterprise production system are often uncertain and fuzzy. However, neural network has the characteristics of parallel processing, dynamic processing and adaptive learning, which provides a novel idea for solving the evaluation problem. On the other hand, neural network has a strong learning ability, which can provide rich functions and create a model that approximates to the target with high precision. Therefore, in recent years, neural network has been widely used in safety evaluation methods [1–3]. Deep learning [4] provides a relatively reliable quantitative tool for safety evaluation. As one of the main platforms for deep learning, TensorFlow runs seamlessly on CPU and GPU. In addition, with the help of TensorFlow, it is convenient to realize the structure and algorithm design of the network model.
We introduce a method based on deep learning to evaluate the safety of iron and steel enterprises, the contributions of our works are:
First, by analyzing the dangerous factors and casualty accidents with accident-causing theory in enterprise production process, we construct a set of safety evaluation index system which is convenient for statistics and assignment. Therefore, we determine the characterization method suitable for network structure design and result evaluation.
Second, we explain the forward topology of network, the design of loss function, and the training method of back propagation. Neural network is data-driven and easy to find hidden information in high-dimensional data. We introduce the evaluation index data set X for network input and the security level label data set Y for target fitting. The problem of safety evaluation of iron and steel enterprises is to find the mapping relationship between input value X and target value Y. At the same time, we also discuss the design, optimization and evaluation methods of safety evaluation model.
Third, through experiments,We verify that the performance of the model trained by Adam algorithm is better than that trained by small batch stochastic gradient descent algorithm. At the same time, the activation function, the number of nodes in hidden layer and the number of hidden layers are also verified and determined. In addition, compared with the nearest neighbor classification algorithm in traditional machine learning, the neural network model is proved to be more effective in dealing with safety evaluation problems. Finally, our designed safety evaluation model provides a basis for safety management in iron and steel enterprises.
With the continuous development of the basic theory of safety assessment in China, safety assessment [5–7] has been successfully practiced in many industries. The combination of mathematical model and computer technology provides a new way for the development of safety assessment methods, and also expands the application scope for safety assessment. For chemical industry, transportation, coal mine, machinery, construction and other industries, safety evaluation system plays an important role in ensuring safety operation procedure. In recent years, safety assessment [30–34] based on neural network has been applied more and more widely, such as special equipment safety assessment, subway safety assessment, oil pipeline safety assessment, occupational hazard assessment, human cause reliability assessment, etc.
Gu et al. [8] construct an index system for risk evaluation of overseas mining investment with quantitative evaluation criteria. In addition, they built the risk evaluation model with deep learning method. Based on actual coal mine production monitoring data, Zhang et al. [9] propose a long short-term memory (LSTM) [10, 11] neural network to improve the accuracy of gas concentration prediction, which has an important guiding significance for improving coal mine safety management. In order to classify the sampling points in acoustic waveform as signals or noises, Guo et al. [12] establish a convolutional neural network (CNN) [13, 14] based automatic acoustic first arrival picking method to pick the first arrivals of massive acoustic emission waveforms effectively and accurately. By accurately processing the monitoring signals, this technology can contribute to characterizing the evolution process of rock mass damage. Xu et al. [15] propose a temporal attention conventional neural network for blade icing detection of wind turbine, the framework can effectively process highly unbalanced sensor data and identify icing conditions. Yang et al. [16] establish an available crop disease early warning system based on deep transfer learning. In general, these studies apply deep learning to risk evaluation, which not only expands the application boundary of deep learning, but also provides a scientific solution for the risk evaluation and analysis.
In addition to deep learning methods, Izvekov et al. [35] adopt an evaluation method based on random functions and process theory to analyze the risk of accidents or disasters of bridge crane load-bearing structures, so as to enable people to make correct management decisions. Ersz et al. [36] use k-means clustering algorithm to determine the similarities and differences of risk perception levels of employees in terms of demographic characteristics. The study finds that education level, service hours, working status and age are important index factors determining risk perception levels of workers. Nitin et al. [37] use the risk assessment technology to identify the chemical hazard factors existing in the stainless steel factory, and put forward relevant engineering control measures according to the assessment results as the initiative measures of follow-up management. At present, the safety evaluation of iron and steel enterprises based on neural network has not been fully studied. Because the safety evaluation of iron and steel enterprises is very complex and comprehensive, it is difficult to use the traditional criterion method to model and solve this problem. Here, we apply deep learning to solve the safety evaluation problem of iron and steel enterprises to a large extent.
Approach
There are many uncertain and dangerous factors in the production process, which bring a lot of negative effects to the safety production of iron and steel enterprises. We study the factors and their internal relations, realize the safety evaluation through the safety theory and determine the current safety state of iron and steel enterprises. Furthermore, managers can take timely and effective measures in the production process of enterprises to reduce the occurrence of safety accidents. Finally, the safety management of iron and steel enterprises is based on scientific and systematic safety theories.
Deep neural network as the basic model
Safety assessment method [17–19] is a tool to analyze the risk factors and determine the risk degree. Therefore, safety evaluation of iron and steel enterprises is actually a complex nonlinear problem. The core issue of safety evaluation is to determine the weights of evaluation index in the evaluation process, credibility of evaluation results is directly related to the reasonable weights.
Neural network is a complex network which is composed of a large number of simple components connected with each other. It is highly nonlinear and can realize complex logic operation. Each node in the network is interconnected to form a topology structure, which includes input layer nodes, output layer nodes and hidden layer nodes. The information is transmitted to the hidden layer through the input layer. After the action of activation function, output information of the hidden layer is transmitted to the output layer and finally the output result is obtained.
When the neural network method is used to solve a problem, data is input into the trained network, then deducting and reasoning are carried out according to the knowledge stored in the network, so as to obtain solution of the problem. Neural network has strong fault-tolerant ability and the ability of large-scale parallel processing information, it finds out the internal relationship between input and output through training, rather than based on empirical knowledge of the problem, so it has the function of self-adaptation and self-learning.
Construction of safety evaluation index system
The establishment of index system is the core content of safety evaluation. The elements of the index system are very important to the evaluation process. If there are too many factors, the key influencing factors may be weakened, and the complexity of evaluation may be increased. If there are too few index factors, although the evaluation process is easy to operate, it is difficult to comprehensively and objectively represent the safety status of the system. Therefore, it is of great significance to scientifically construct an objective and comprehensive index system.

Safety evaluation index system of iron and steel enterprises.
Safety state of the evaluation index
We screened the evaluation indicators from four aspects: human, material, environment and management. Material factors are decomposed into facilities, hazardous substances and process parameters. Environmental factors are decomposed into the workplace and the working environment. Based on these, the criterion layer of the index system is first constructed, and then each criterion layer is refined and decomposed to obtain several indexes. As shown in Fig. 2, the safety evaluation index system includes 6 criterion layers, corresponding to 25 indicators.
The indexes are mainly extracted from the factors that have high influence on production safety. Qualitative index and quantitative index have different ways of obtaining information. The qualitative index is obtained by consulting the relevant data and field investigation, and it has a certain fuzziness. Quantitative indicators are obtained through detection, statistical analysis and other methods, which are highly contingent, but all have profound reasons.
The classification method is adopted to divide safety state of the evaluation indexex into five levels, namely Level 1, Level 2, Level 3, Level 4 and Level 5, which respectively represent safe, relatively safe, general safe, unsafe and very unsafe, as shown in Table 1.
The training and testing of deep neural networks are driven by data. Data is indispensable as the fuel that drives network model. Networks are very good at extracting complex structures and hidden information from high-dimensional input data. In this section, we define the input value (evaluation index data set) X and the target value (security level label data set) Y of security evaluation network. We assume that there’s a mapping between X and Y.
According to their corresponding characteristics and attributes, these factors can be quantified into a series of continuous or discrete data indexes, the matrix of input data X set can be expressed as:
In matrix X, D represents the number of all risk indicators quantified from various risk factors. T is the amount of data in the training set, V is the amount of data in the validation set, and S is the amount of data in the test set. In each row of X, each entry containing D index items can be treated as a vector in space.
As shown in the following formula, Y is the label value of security level corresponding to the network input value in the training set, which is also the target value of the network model.
In the formula, C is the total number of categories. In the safety evaluation of iron and steel enterprises, specifically, C represents five safety levels: safe, relatively safe, general safe, unsafe and very unsafe (corresponding to level 1, Level 2, level 3, level 4 and Level 5). Five safety levels correspond to five category labels.
To evaluate the safety of iron and steel enterprises is to solve the mapping problem between X and Y. Since the input data contains many evaluation indicators, the dimension of the input data is relatively high. At the same time, each evaluation index may also affect each other, so the classification of safety level is essentially a high-dimensional feature classification problem with high complexity, and the deep learning method based on neural network model is very good at solving this kind of problem.
The effect of neural network model and the objective of optimization are defined by loss function. In the course of network training, the goal is measured by minimizing loss function. Choosing proper loss function is very important to solve the problem, cross entropy loss function can be used for classification problems.
Classification solves the problem of grouping different samples into predefined categories. The most common method to solve multi-classification problem by using neural network is to set n output nodes (n is the number of categories). For each input sample, the network can get an n-dimensional vector as output. Each dimension in the vector (that is, each output node) corresponds to a category. Ideally, if a sample belongs to category K, output node value of this category should be 1, while output of other nodes is 0. Cross entropy reflects the distance between two groups of probability distributions. It is defined as shown in the following formula, where p represents the target distribution, q represents the output probability distribution, n represents the vector dimension, and x i represents value of the vector in ith dimension.
The output of neural network may not conform to probability distribution (the sum of probabilities in each dimension is not 1). Softmax regression can effectively process output of the neural network into a probability distribution, and its formula is shown as follows, where, y1, y2, ⋯, y n is output of the neural network.
Gradient descent algorithm calculates the gradient through all the training data in each iteration, which leads to the time-consuming training process. However, the stochastic gradient descent algorithm, which only uses one data for each training iteration, can accelerate the training speed; but the network parameters may not be updated in the real minimum direction each time. Stochastic gradient descent is difficult to make the network achieve global or even local optimal. In practice, the loss calculation of a small part (one batch of data) is usually used each time for weight update, that is, small batch stochastic gradient descent.
For stochastic gradient descent algorithm, the update mode for parameter θ is:
The gradient
A further optimization is to add momentum. Momentum causes parameters to be updated not only based on the current gradient, but also on the previous gradient, which is beneficial for network parameters to jump out of local minimum points.
For parameter θ i , at some time i, the learning rate of the parameter in the model is η i . In the initial training of the network, the speed variable v0 is initialized to the value 0. At the subsequent training moment of i > 0, the additional momentum algorithm modifies the parameters through the iterative process as shown in the formula. Where, the value range of hyperparameter momentum γ is [0,1).
In addition rin the process of network training rlearning rate defines the updating speed of network parameters. If learning rate is set too large rthe parameters will jump back and forth on both sides of the optimal value rwhile if learning rate is set too small rconvergence speed of the parameters will be greatly reduced. In reality rexponential decay method is often used to set the learning rate. At the beginning ra larger learning rate is set so that the network can get an optimal solution at a fast speed. With the increase of iterations rthe learning rate will gradually decrease. Finally rthe model will be more stable and the optimal solution will be obtained gradually in the later training period. Many effective optimizers rsuch as Adam 20">0,1).
In addition, in the process of network training, learning rate defines the updating speed of network parameters. If learning rate is set too large, the parameters will jump back and forth on both sides of the optimal value, while if learning rate is set too small, convergence speed of the parameters will be greatly reduced. In reality, exponential decay method is often used to set the learning rate. At the beginning, a larger learning rate is set so that the network can get an optimal solution at a fast speed. With the increase of iterations, the learning rate will gradually decrease. Finally, the model will be more stable and the optimal solution will be obtained gradually in the later training period. Many effective optimizers, such as Adam [20] optimizer, combine multiple weight optimization algorithms such as momentum and adjustable learning rate.
The flow of network training is shown in Fig. 3. Multiple network layers are linked together to form a neural network, and the input data is mapped to predicted values after passing through the network model. Then loss function calculates the difference between predicted value and the real target value. The obtained loss value is used to measure the matching degree between the two values. The optimizer is designed to update the weights of the network according to the loss value.

Network training process.

Structure diagram of fully connected neural network.
We implement the prediction algorithm with deep neural network, including model structure design, model training, model optimization, and prediction. In the procedure of minimizing the objective function, Adam optimization algorithm is utilized to continuously update the weights in the neural network; the node number of hidden layer and batch size as parameters are tuned to the optimal chosen ones. Using this model, we predict the level of enterprise security, which is helpful for steel enterprises to avoid risks.
Preprocessing of index data
Training samples largely determine the "knowledge quantity" of neural network model and the credibility of evaluation results. In order to ensure the accuracy of model evaluation, these samples should reflect various safety states as much as possible. Our original data is provided by a safety evaluation service company in Beijing, and the data comes from safety standardization material of a non-ferrous metal smelting enterprise.
Since the value range of various indicators is very different, the indicators should be distributed in the same interval during neural network training, so as to facilitate the training. Therefore, the original sample data are normalized according to the following formula, and the index values within any value range are converted to values within the range [0,1]:
Where, x min , x max respectively represent the minimum and maximum values corresponding to each indicator in the original data. x oldvalue , x newvalue represent the original value and normalized value of an index value respectively.
We use Python language and Tensorflow deep learning framework to build DNN-based [21, 22] evaluation model. The classification loss function and back propagation method are designed to train the model. We compare the influence of different parameter optimization strategies, hidden layer structure and activation functions on evaluation performance of the model, then determine the network structure as a solution for the safety evaluation of iron and steel enterprises. Topological structure of the safety evaluation network model is shown in Fig. 4.
We start the experiment with a three-layer DNN model, that is, the number of hidden layers is 1. The label of input data in the experiment is one hot vectors. For a one-hot vector, the number in all dimensions is 0 except that the number in one dimension is 1. For example, security level 1 is represented as ([1, 0, 0, 0, 0]), security level 2 is represented as ([0, 1, 0, 0, 0]), and security level 3 is represented as ([0, 0, 1, 0]), etc.
The choice of network optimization method
Stochastic gradient descent method updates all parameters in a way that the learning rate remains unchanged in training process. However, there are many parameters in deep learning model, and the updating frequency of different parameters is different. The adjustable learning rate sets different update step sizes for different parameters as the number of iterative training increases. For parameters that update frequently, it is appropriate to set smaller update step sizes for them in order to make their training stable. For parameters that are not updated frequently, it is appropriate to set a larger update step in order to enable them to master more knowledge. As mentioned above, Adam optimization method combines additional momentum and adjustable learning rate. In the experiment, we use Adam and small batch stochastic gradient descent algorithm to train the model, and observe the influence of different optimization methods on the model performance. Initial topological structure of the model is 25-16-5.
Meanwhile, Sigmoid is used as the basic activation function during the experiment. As shown in Figs. 5 and 6, by observing the curve of loss value and accuracy in training process, it is found that with deepening of the training process, when Adam algorithm is used for parameter optimization, loss value of the model decreases faster and the accuracy rate increases more obviously. And small batch stochastic gradient descent algorithm has slow convergence and easily be trapped in a local optimum. It can be seen that when a network model adopts the same data set for training, different optimization methods will result in different effects. The performance of the model trained with additional momentum and adjustable learning rate is better than that trained with small batch stochastic gradient descent algorithm.

Loss variation curve of mini batch SGD and Adam trained models in the training process.

The accuracy of mini batch SGD and Adam trained models on validation sample data.

The accuracy comparison of models with different activation functions in validation samples.

The accuracy comparison of models with different activation functions in validation samples.
After it is clear that the back propagation training algorithm with additional momentum is more conducive to the optimization of the model, we then verify the performance comparison of the DNN safety evaluation network model with different activation functions. Here, we apply Tanh [23, 24], Relu [25, 26], Sigmoid [27] and LeakyRelu [28, 29] activation functions to train the model in turn and verify accuracy of the trained models on the test set, respectively. First, as in the previous section, we adopt the network topology of 25-16-5. Fig. 7 shows the change curve of accuracy in validation set during model training. It is found that compared to Relu and LeakyRelU, models with Sigmoid or Tanh activation functions have a significant decrease in accuracy.

Loss variation curve of mini batch SGD and Adam trained models in the training process.
At present, there is no clear formula to define the number of hidden layer neurons, which often requires certain intuitive judgment and domain knowledge. We try to verify the influence of different number of hidden layer neurons on evaluation accuracy in the experimental process. Fig. 8 shows the change curve of accuracy on verification set when the number of neurons in hidden layer increases to 24. By observing the change curve, we find accuracy of the models with different activation functions is improved to some extent, and there is little difference in the accuracy of each model.In addition, after further comparison of the accuracy curve in Figs. 7 and 8, it is also found that the accuracy trend is more stable when adopting LeakyRelU as the activation function, and the accuracy is highest when model converges. Therefore, we choose LeakyRelU activation function and node number 24 as the structure of the first hidden layer.
We also observed the change of loss value during training when the number of hidden layer nodes was different. As shown in Fig. 9, taking the model with Sigmoid activation function as an example, it is observed that when the number of nodes in hidden layer is increased from 16 to 24, the loss value decreases more rapidly. It can be found that sufficient nodes of hidden layer can make the model learn more knowledge, which is conducive to convergence of the model. As the nodes of the hidden layer continue to be increased (up to 30), the experimental results show that the accuracy does not further improve, and the loss value does not continue to decline. Considering the stability of model training and the complexity of calculation, we finally determines that the number of nodes in the hidden layer is 24.
After determining the weight optimization method, activation function and the number of neurons at a hidden layer, we continued to explore the settings of hidden layers. The experiment continues to adopt one hidden layer structure: 25-24-5, two hidden layer structure: 25-24-16-5, and three hidden layer structure: 25-24-16-8-5 to build the security evaluation model. Fig. 10 shows the change curve of accuracy on the verification set when the number of hidden layers increases from 1 to 3. It was observed that when the number of hidden layers increased to 2 layers, the accuracy of the evaluation model improved to a certain extent, with the highest value of 0.92. However, when the number of hidden layers increased to 3, the accuracy of the evaluation model did not continue to improve, but decreased to a small degree, with a maximum value of 0.90. At present, there is no definite method to define the structure of the hidden layer, and the network structure is mostly determined by trial and error. Therefore, according to the experimental situation, we adopts the structure of two hidden layers, that is, the topology structure of the model is determined as 25-24-16-5. Table 2 shows the precision rate, recall rate and F1-score calculated by the DNN model in all categories on the test set.

Change curve of accuracy on the verification when the number of hidden layers changes.
Evaluation results of DNN model on test set

Schematic diagram of K-Nearest Neighbor algorithm.
Evaluation results of K-Nearest Neighbors on test set
Test results of inspection samples
In addition, considering the uniform distribution of various samples in the data set, we adopted K-Nearest Neighbor (KNN) algorithm in machine learning to evaluate (classify) the indicators, so as to verify the superiority of the neural network model. In the experiment, the nearest neighbor algorithm adopts the same training set and test set as the neural network model.
The nearest neighbor algorithm is a classification operator based on distance and instance. The larger the distance, the smaller the similarity. The algorithm stores training samples and uses them to predict the categories of test samples. As shown in Fig. 11, the nearest neighbor algorithm can be used to classify a test sample in three steps: 1) Calculate the distance between the test sample (orange ellipse) and other training samples (green triangle, blue diamond and red square); 2) Find the K nearest training samples of a test sample; 3) Determine the category of a test sample according to the label with largest number in K neighbors.
The nearest neighbor algorithm does not explicitly establish the model, so it has a large amount of computation to classify unknown samples, but the classification results are simple and effective. The accuracy of the nearest neighbor algorithm is 0.58 on the test set. In the experiment, k=3 and 5 are respectively set for classification verification. In this case, the nearest neighbor algorithm with K =5 is selected, with a slightly higher accuracy of 0.58 for safety evaluation. Table 3 shows the value of the precision rate, recall rate and F1-Score calculated for all categories of the test set. Among them, the mean of recall rate is equivalent to the accuracy rate on the test set. Experimental results show that the safety evaluation algorithm based on deep neural network is obviously better than KNN algorithm.
Usability analysis of safety evaluation model
After training and verification, the steel enterprise safety evaluation network model has mastered the non-linear mapping relationship from the 25-dimensional safety evaluation index to the safety evaluation levels. Next, we select several sets of index data provided by a non-ferrous metal manufacturing company in Beijing for qualitative analysis.
Table 4 shows that evaluation results output by the safety evaluation model for the 5 sets of safety index data are consistent with label values of the 5 sets of data. To a large extent, it shows that the evaluation model has practical feasibility, and the model can meet the requirements of safety evaluation performance of steel enterprises.
Conclusion
With the help of deep learning technology widely studied today, we integrate the safety evaluation method with neural network to build an efficient model. First, We analyze the influencing factors of safety production in iron and steel enterprises, establish an index system, and sort out typical sample data. Next, we design the safety evaluation model based on DNN network, and determine input and output structure of the safety evaluation network. The number of neurons in the input layer reflects various factors affecting the safety production of iron and steel enterprises, and the number of neurons in the output layer corresponds to the safety evaluation level. Finally, we use the deep learning framework to realize the safety evaluation model. During the experiment, the optimal model structure is determined by comparing the influence of different parameter optimization strategies, different hidden layer structures, and different activation functions on the model safety evaluation performance. In addition, the experiments prove the superiority of deep neural network model in safety evaluation by comparing with the classification effect of nearest neighbor algorithm. Through the safety evaluation, it is helpful for enterprises to understand their own safety state more systematically, and it is helpful for each department to adjust the production links they belong to.
Footnotes
Acknowledgment
This work is supported by Research Foundation, and the Artificial Intelligence Research Institute of Yantai University.
