Abstract
Along with the booming of intelligent manufacturing, the reliability management of intelligent manufacturing systems appears to be becoming more significant. Failure mode and effects analysis (FMEA) is a prospective reliability management instrument extensively utilized to manage failure modes of systems, products, processes, and services in various industries. However, the conventional FMEA method has been criticized for its inherent limitations. Machine learning can handle large amounts of data and has merits in reliability analysis and prediction, which can help in failure mode classification and risk management under limited resources. Therefore, this paper devises a method for complex systems based on an improved FMEA model combined with machine learning and applies it to the reliability management of intelligent manufacturing systems. First, the structured network of failure modes is constructed based on the knowledge graph for intelligent manufacturing systems. Then, the grey relation analysis (GRA) is applied to determine the risk prioritization of failure modes. Hereafter, the k-means algorithm in unsupervised machine learning is employed to cluster failure modes into priority classes. Finally, a case study and further comparative analysis are implemented. The results demonstrate that failure modes in system security, production quality, and information integration are high-risk and require more resources for prevention. In addition, recommendations for risk prevention and monitoring of intelligent manufacturing systems were given based on the clustering results. In comparison to the conventional FMEA method, the proposed method can more precisely capture the coupling relationship between the failure modes compared with. This research provides significant support for the reliability and risk management of complex systems such as intelligent manufacturing systems.
Keywords
Introduction
The upsurge of the global intelligent manufacturing revolution has promoted the continuous deepening of the integration of advanced manufacturing technology and new-generation information technology [1]. Countries around the world are actively participating in this revolution and have formulated relevant strategic plans, such as the National Strategic Plan for Advanced Manufacturing in the US, Industry 4.0 in Germany, New Industrial France, The Future of Manufacturing: A New Era of Opportunity and Challenge for the UK and so on. Based on the reshaping of the international industrial pattern and the challenges confronted by the Chinese traditional manufacturing industry, China put forward the Made in China 2025 plan in 2015. In recent years, China has continuously issued relevant policies, laws, and regulations to vigorously support manufacturing enterprises and promote their transformation and upgrading to accelerate the transition from a massive manufacturing country to a manufacturing powerhouse. Significantly, intelligent manufacturing systems are crucial to realize the digitalization, networking, and automation of the manufacturing industry. It breaks the limitations of traditional industries by connecting industrial production and manufacturing processes with a new generation of information technology. However, it brings a high degree of convenience while also increasing risks. Risk evaluation and early warning of intelligent manufacturing systems is an important means of ensuring the safety and stability of industrial production. Due to the complexity, it is difficult to manage the risks of intelligent manufacturing systems. Therefore, the reliability analysis of intelligent manufacturing systems is of great significance to the high-quality development of intelligent manufacturing, and many scholars have done related research. For instance, He et al. [2] proposed an integrated predictive maintenance (PdM) strategy to improve the mission reliability of manufacturing systems and the quality of products. Chen et al. [3] established a reliability evaluation model for multistate intelligent manufacturing systems based on operational quality data. Yang et al. [4] developed an integrated mission reliability evaluation approach based on the extended quality state task network (EQSTN) for intelligent multistate manufacturing systems to assure the predictability of the reliability of finished products. Dahiya et al. [5] proposed a novel heuristic algorithm applied to the pharmaceutical manufacturing system to maximize system reliability. The current identification, classification, and management of failure modes in intelligent manufacturing systems are still in the exploratory stage.
Identification and evaluation of the risks of failures are essential to improve the maintenance strategy and management. It obtains more importance in complex engineering systems [6]. To achieve this, several techniques such as failure mode and effects analysis (FMEA), fault tree analysis (FTA), event tree analysis (ETA), the reliability block diagram (RBD), Monte-Carlo simulation (MCS), Markov analysis (MA), and Bayesian networks (BN) have been developed and applied [7–11]. Among them, FMEA is a structured and proactive reliability management technology utilized to enhance the safety and reliability of systems, products, processes, and services [12–14].
So far, FMEA is still one of the most valuable and effective reliability analysis methods used in various industries [15]. Geramian et al. [16] proposed a new approach integrating the Fuzzy Logic-based FMEA and collective process capability analysis to investigate the production quality of an electrical-equipment-manufacturing company. Wang et al. [17] verified that the FMEA method combined with the extended MULTIMOORA approach based on the entropy method can be used for risk analysis of equipment failure in a certain oil and gas industry. Hassan et al. [18] proposed a new FMEA model based on the fuzzy rule-based system and the grey theory to identify hazards of pipeline systems. Yu et al. [19] developed an improved FMEA model for submarine pipeline risk analysis by combining ExpTODIM and PROMETHEE-II methods. Zha et al. [20] proposed an FMEA approach based on the Avoidance of Aggregation Discrepancy utilized for reliability and security analysis in engineering management fields.
In the conventional FMEA method, the risk ranking of each failure mode is determined by risk priority number (RPN), which is produced by multiplying the values of the risk factors severity (S), occurrence (O), and detection (D) [21]. However, the conventional FMEA method has inherent limitations, such as ignoring the weight of risk factors, obtaining identical RPNs that may represent different meanings of the failure mode risks, and lacking scientific bases for the calculation of the RPN [15, 22–24]. In order to overcome the deficiencies associated with the conventional FMEA method, numerous techniques have been employed to enhance the FMEA model. On the one hand, the weight of risk factors is considered in the risk analysis. Risk factors are given varying weights by subjective weight methods like the analytic hierarchy process (AHP) [25] and analytic network process (ANP) [26], objective weight methods like the entropy weight method [27], or comprehensive weight methods [28, 29]. On the other hand, a variety of strategies have been applied to increase the reliability and rationality of risk prioritizing. Baghery et al. [30] prioritized the manufacturing processes based on the process failure mode and effects analysis (PFMEA), interval data envelopment analysis, and grey relational analysis. Shafiee and Animah [31] proposed an integrated risk management framework based on the FMEA model and a hybrid Multi-Criteria Decision Analysis (MCDA) model for supporting the life extension of subsea facilities in high-pressure–high-temperature environments. Ouyang et al. [32] established an FMEA approach based on the ensemble learning technique and considered risk indexes of six single FMEA approaches: the conventional FMEA, interval probability-based FMEA, the fuzzy FMEA, the exponential method, TOPSIS-FMEA model, and grey relation analysis (GRA)-FMEA method. Huang et al. [15] integrated probabilistic linguistic term sets and the TODIM (an acronym in Portuguese for interactive multi-criteria decision-making) method to evaluate and prioritize the risk of failure modes. Spasenic [33] proposed an FMEA methodology improved by the dempster-shafer theory to apply. This study proposed a novel approach for conducting the credit risk assessment of financing a small hydropower plant project in Serbia. Behnia et al. [26] introduced a fuzzy optimized model based on FMEA, Decision-making Trial and Evaluation Laboratory (DEMATEL), ANP, and goal programming approach for solving maintenance strategy selection problems in the paper industry. Park et al. [11] utilized FMEA with a Rule-based Bayesian Network for maritime cybersecurity risk analysis.
Machine learning models are data-driven learning approaches that can capture the complexity of reliability by training software to make generalized predictions from previous data [34]. Because of the powerful data processing capabilities of machine learning, it has also become one of the improvement directions of the conventional FMEA method. For example, Jomthanachai et al. [35] integrated DEA and machine learning for risk assessment to address the limitations of FMEA. Duan et al. [36] developed a new FMEA model based on double hierarchy hesitant fuzzy linguistic term sets and k-means clustering for evaluating and clustering risks of failure modes. Samast et al. [37] promoted a decision support system (DSS) to analyze data using the supervised machine learning approach and utilize the FMEA method to determine the severity of the DSS model defects. Among them, the k-means algorithm is the most popular partition-based clustering algorithm among unsupervised machine learning algorithms, which seeks to completely assimilate data points within a cluster while dissimilating the clusters [38]. It enables risk-prioritized classification of failure modes for failure prediction [39].
Due to the numerous failure modes and intricate relationships present in large-scale systems, it is possible to take into account creating a structured network of failure modes while doing reliability analysis. As a valuable tool for visualization and information reasoning, the knowledge graph is a structured semantic network used to depict the relationship between entities. It offers information that is semantically structured and computer-interpretable, which is regarded as an important ingredient for creating more intelligent machines [40–42]. The knowledge graph’s application to failure mode and effect analysis is highly advantageous for the intelligence of reliability management. Compared with the conventional FMEA method, the FMEA method based on the knowledge graph offers substantial advantages in forming a failure mode knowledge base, fault reasoning ability, fault range analysis, and fault multi-level analysis.
Inspired by the aforementioned discussions, the lack of research on failure mode identification and management of intelligent manufacturing systems can be addressed by constructing a knowledge graph. Improvements can be made to overcome the shortcomings of conventional FMEA models by considering risk factor weights and combining them with GRA ranking models. The shortcomings of existing research on failure mode classification and prediction for smart manufacturing systems can be analyzed by machine learning algorithms. Thus, this paper focuses on the following research objectives: (1) mining the major failure modes faced by intelligent manufacturing systems and constructing a knowledge graph; (2) obtaining the risk prioritization ranking of failure modes through an improved FMEA method; (3) predicting and classifying the failure modes through machine learning algorithms to manage the failure modes; (4) providing targeted recommendations for the prevention and monitoring of failure modes in intelligent manufacturing systems. To achieve these objectives, this paper explores a reliability analysis method based on improved FMEA combined with machine learning for intelligent manufacturing systems. The main contributions of the paper are as follows. The knowledge graph of failure modes of the intelligent manufacturing systems is constructed, which counts for a great deal for the establishment of the structured network of failure modes and knowledge base of the intelligent manufacturing systems. In addition, the knowledge reasoning ability and knowledge retrieval ability contained in the knowledge graph have great value of guidance and reference for evaluating and preventing failure modes of intelligent manufacturing systems. Combining GRA and k-means clustering, an improved FMEA model is established to classify and manage failure modes. The proposed model can more reasonably reflect the risk prioritization of failure modes compared with the conventional FMEA method and provides the theoretical basis for the prevention and monitoring of failure modes of complex systems such as intelligent manufacturing systems. Combining the knowledge graph and clustering results to propose risk prevention suggestions for failure modes of intelligent manufacturing systems. The methodology of this paper can be extended to solve reliability management problems of large and complex systems.
The remainder of this paper is organized as follows. In Section 2, an improved FMEA model that prioritizes and categorizes failure modes is proposed by utilizing knowledge graph theory, GRA, and machine learning. Section 3 provides the application of the proposed FMEA approach to the reliability analysis of intelligent manufacturing systems. In Section 4, the results are discussed, and suggestions are made for the risk prevention and monitoring of intelligent manufacturing systems. Section 5 presents threats of validity. Finally, concluding remarks and further research proposals are presented in Section 6.
The proposed method
In this section, we propose a new reliability analysis method for FMEA based on the knowledge graph, GRA, and machine learning. The proposed method mainly consists of three phases: evaluating the risk of failure modes, determining the risk prioritization of failure modes, and extracting the features of failure modes. The method is detailedly described in subsequent sections.
Evaluate the risk of failure modes
In order to evaluate the risk of failure modes, it is necessary to effectively identify the potential failure modes, determine the standard of evaluation linguistic terms, and organize an FMEA team for evaluation. Therefore, a structured network of failure modes based on the knowledge graph is proposed in this phase.
The construction process of the knowledge graph includes five key technical modules: knowledge extraction, knowledge representation, knowledge fusion, knowledge reasoning, and knowledge storage, which structure scattered data and integrate them into a complete knowledge base. In the early stage, the knowledge graph of failure modes is, to a certain extent, dependent on experts’ subjective judgment. Consequently, experts are required to operate manually to form training sets in information extraction, information processing, and information fusion. The architecture of the knowledge graph of failure modes is shown in Fig. 1.

The construction process of knowledge graph.
Suppose a general FMEA problem including m failure modes FM
i
(i = 1, 2, ⋯ , m) based on n risk factors, which are evaluated by l FMEA team members D
k
(k = 1, 2, ⋯ , l). In order to reflect the relative importance of experts in the evaluation process, each team member should be assigned a weight λ
k
> 0 (k = 1, 2, ⋯ , l) satisfying
Linguistic terms for rating failure modes
In this phase, the weight of risk factors is taken into consideration, and GRA is utilized to rank the risk of failure modes. GRA is a method of multi-factor statistical analysis, which usually takes the uncertain system as the research object. It is a method to quantitatively describe the changing trend of the system. This method can greatly reduce the analysis difficulties caused by unclear and missing information and is often used to improve the ranking accuracy in FMEA.
The concrete procedure of the AHP method [43] is summarized as follows.
(1) Establish the judgment matrix
Each expert of the FMEA team compares the importance of the three risk factors S, O, and D, and establishes a judgment matrix. The judgment matrix of the k-th expert is H
k
= (H
ij
) n×n (k = 1, 2, ⋯ , l), in which each pair of factors is compared using the numerical rating. H
ij
represents the relative importance of the i-th risk factor over the j-th risk factor, and
(2) Calculate the consistency ratio (CR)
CR is a ratio between the matrix’s consistency index and random index used to indicate the probability that the matrix judgments were randomly generated, and in general ranges from 0 to 1. A CR of 0.1 or less is considered acceptable. Otherwise, the judgments are untrustworthy and need to be reconstructed. CR is defined as [44]:
(3) Obtain the weight vector
Through normalization, the weights of factors based on the k-th expert’s opinion are obtained as:
GRA is adopted as a tool for risk prioritization, and the specific description is given below.
(1) Set the reference sequences and the comparative sequences
In the first stage, values in the FMEA evaluation matrix for each failure mode are processed into comparability sequences. The reference sequence which indicates the ideal state is set as
Besides, the matrix numbers should be normalized first by non-dimensional treatment. The normalized equation is defined as:
(2) Calculating the grey relational coefficient for each failure mode
Based on the normalized matrix, the relational coefficient was constructed using the following equation [45]:
(3) Calculating the grey relational grade
The grey relational grade can be calculated by Equation (8). The larger the value of γ0i, the higher the failure mode risk priority. The grey relational grade for the i-th failure mode is:
Failure modes are caused by different causes. In addition to identifying failure modes, we hope to carry out targeted maintenance according to different types of quality problems. Therefore, the classification and feature extraction of various failure modes is the basis of research. In this phase, we combine the k-means clustering algorithm to classify failure modes based on RPN calculated by the conventional FMEA and the grey relational grade obtained in the previous step.
Cluster analysis is a method utilized in machine learning and attempts to find clusters in a dataset [46, 47]. The k-means is one of the well-known unsupervised machine learning algorithms for discovering the cluster structure in data sets, where data in the same cluster have the greatest similarity [48]. The k-means method is often applied to the cluster analysis of scattered points in the two-dimensional coordinate system, which is suitable for the classification of failure modes and provides a theoretical basis for fault maintenance and continuous improvement.
Normalize RPN calculated by the conventional FMEA and the grey relational grade obtained in the previous step to form m two-dimensional data points representing failure modes. Divide failure modes into s clusters. Let Keep iterating the following until optimal centroids are found, which means the clusters will not change anymore. Calculate the sum of the squared distance between data points and centroids, and assign each data point θ
i
to the nearest cluster. Re-compute the centroids for the clusters by taking the average u
i
(i = 1, 2, ⋯ , s) of all data points of that cluster iteratively. K-means terminates since the centroids converge and do not change.
According to the k-means clustering results, failure modes are divided into s categories, and the corresponding measures for monitoring and preventing failure modes can be put forward based on the features of different categories.
Application in reliability analysis of intelligent manufacturing systems
In this section, the model proposed in the previous part is utilized to analyze the reliability of the intelligent manufacturing systems, including constructing the knowledge graph and evaluating failure modes, grey relational ranking of failure modes, and k-means clustering analysis.
Generally, the complete intelligent manufacturing systems consist of multiple subsystems, which also belong to different system dimension layers. Referring to the intelligent manufacturing standardization systems, a 5-tier architecture is established, consisting of the network layer, enterprise layer, management layer, control layer, and equipment layer. The network layer refers to the data information network based on Ethernet, which can realize the information interaction between enterprises and the data transmission and storage within the enterprise. The enterprise layer refers to the management and operation system built by the enterprise itself under the network layer. It is the most comprehensive and core functional layer in the enterprise, including subsystems such as enterprise resource planning (ERP), supply chain management (SCM), and customer relationship management (CRM) systems. The management layer, as a connecting link between the preceding and the following layer, realizes the transition from enterprise management to workshop production. It is mainly composed of subsystems that control the overall production of the enterprise, including the manufacturing execution system (MES), product lifecycle management (PLM), etc. The control layer is the functional layer that realizes the production of specific workshops. It is also one of the biggest characteristics of intelligent manufacturing systems that distinguish them from the conventional manufacturing field. It includes subsystems sh as supervisory control and data acquisition (SCADA), distributed control system (DCS), programmable logic controller (PLC), etc. The equipment layer refers to the frontline production workshop units, including a series of intelligent production equipment, which can most intuitively reflect the intelligence and informatization of the production process.
Based on the division of functional layers mentioned above, combined with relevant literature and actual investigations, a total of 26 failure modes of the intelligent manufacturing systems are determined in this paper. The specific failure modes and their causes are shown in Table 2.
The FMEA of the intelligent manufacturing systems
The FMEA of the intelligent manufacturing systems
In addition, in intelligent manufacturing systems, the failure cause corresponding to each failure mode may be caused by the comprehensive causes of different subsystems. Therefore, combining literature and actual conditions, this paper summarizes four types of failure causes, including human error, design defect, configuration defect, and force majeure. These four failure causes cover most of the defects and problems that may occur in the actual production process. Among them, force majeure includes not only some natural accidents and disasters but also some inevitable situations, such as changes in the relationship between supply and demand. For each enterprise, as long as the enterprise is in the macro environment of the market, the relation between supply and demand will inevitably exist and continue to change, which is difficult to avoid and eliminate. The classification of failure causes is shown in Table 3.
The classification of failure causes
As the knowledge graph of failure modes for intelligent manufacturing systems is in its preliminary development stage, it mainly relies on the experience and judgment of experts and the real practice situation to sort out the failure modes and their correlations. On this basis, a computer-understandable knowledge graph of failure modes of intelligent manufacturing systems is established using the editing tool Prot

Knowledge graph of failure modes of the intelligent manufacturing systems.
Among them, the orange line indicates the diffusion of failure mode effects from one system dimension layer to another, and the arrow direction is the diffusion direction. If there are two orange lines between system dimension layers, it indicates the bidirectional diffusion. It means that the failure modes of the network layer may cause the failure modes of the control layer and the equipment layer, that is, the possible superior level failure modes of the control layer and the equipment layer can be tracked in the network layer. As the link between the enterprise layer and the control layer, the management layer is critical, and its failure mode effects diffuse in both directions with those of the enterprise layer and the control layer.
Through the knowledge graph, the location and relevant information of each failure mode in the entire intelligent manufacturing system can be traced. Furthermore, the knowledge graph provides the functions of semantic search and semantic inference. In the knowledge base, users can retrieve failure modes according to keywords and infer the possible coupling relationship with other failure modes that are superior, subordinate, or peer. When the knowledge base is broader and more accurate, the reasoning ability of the knowledge graph will be stronger. The failure mode network can be used as the basis of the knowledge base and greatly improve the efficiency of knowledge reuse. For instance, for FM21 inaccurate equipment condition monitoring, the most likely failure cause comes from the testing equipment or the system itself. According to the knowledge graph of failure modes, we can realize that some failure modes at the network layer, such as FM5 transmission failure of the communication line, may appear as the superior level failure modes of FM21. Similarly, FM21 is also the superior level failure mode of FM19 inaccurate product quality control.
At present, the application of the knowledge graph in intelligent manufacturing systems still has resistance as follows. There is a lack of relevant databases in the field of intelligent manufacturing. Knowledge graph technology requires a large amount of labeled data to construct training sets. However, intelligent manufacturing is an emerging field. Due to the confidentiality of the database in the intelligent manufacturing field, the particularity of intelligent manufacturing enterprises, and the lack of relevant industry standards, the databases often cannot be effectively integrated, which makes it difficult to provide a good training database for the application of the knowledge graph. The advantages of knowledge graph theory applied to intelligent manufacturing systems are not clear enough. The knowledge graph has powerful processing capabilities for huge data sets, which reflects its potential for applications in the field of intelligent manufacturing. However, the issues are not clear enough on how to apply the knowledge graph technology to the design, assembly, manufacturing, and other processes of intelligent manufacturing systems and how effective it is for the process improvement of intelligent manufacturing systems.
The FMEA team was organized to undertake the risk evaluation, which consisted of five experts: A, B, C, D, and E. A is an intelligent manufacturing system implementation consultant, B is an enterprise informatization consultant, C is an intelligent manufacturing system reliability expert at the university, D is a senior manager of enterprise intelligence projects, and E is a technical manager of enterprise intelligence projects. In view of the experts’ different knowledge backgrounds and professional fields, distinct weights are allocated to them to reflect their importance in the FMEA process, i.e.
Linguistic evaluations on failure modes by the FMEA team members
The linguistic evaluation of each expert in Table 4 can be converted into specific values so that the evaluation matrices
The FMEA evaluation matrix of failure modes
The matrix is normalized using Equation (6), for example,
When carrying out FMEA, the smaller the risk factor value in the normalized matrix, the larger the risk of failure mode. Thus, the failure mode reference sequence consists of the minimum value of each risk factor in the normalized matrix, which is (0.1161, 0.0748, 0.1121). The grey relational grades are calculated using Equations (7)-(8). The results and the comparison of risk rankings between the RPN method and the GRA method are shown in Table 6.
Comparison of risk rankings
In practice, fault modes are generally divided into three categories, including failure modes with low risk, medium risk, and high risk. Therefore, let s = 3. In order to avoid excessive errors in clustering the data obtained using a single method, we consider the results from both methods simultaneously. On the basis of the above data, we normalize the RPNs and grey relational grade to form 26 two-dimensional data points representing failure modes of smart manufacturing systems. Through k-means clustering analysis by MATLAB programming, the clustering results are obtained and shown in Fig. 3 and Table 7.

The results of k-means clustering.
K-means clustering of failure modes
As shown in Fig. 3 and Table 7, 26 failure modes are divided into three clusters. Cluster 1 marked in blue includes FM5, FM9, FM14, FM19, FM21, FM22, FM24, FM25, and FM26. Cluster 2 marked in red includes FM1, FM2, FM4, FM6, FM8, FM11, FM17, and FM23. Cluster 3 marked in yellow includes FM3, FM7, FM10, FM12, FM13, FM15, FM16, FM18, and FM20. After communication and discussion with experts, it is considered that the clustering results are in line with the actual situation and reasonable.
Comparison and discussion
According to Table 6, the comparison of failure mode ranking between the conventional FMEA method and the improved FMEA method is shown in Fig. 4, where the horizontal coordinate denotes the serial number of failure modes and the vertical coordinate denotes its corresponding risk priorities. It can be obtained that the risk priority rankings of FM1 (malicious attacks on software systems), FM5 (transmission failure of the communication line), FM7 (illegally accessed and tampered with data) obtained by the improved methodology have a significant improvement compared to the ranking results obtained by the conventional FMEA methodology. In real practice, these three failure mode risks from the network layer spread to the control and equipment layers. Therefore, the risk prioritization of failure modes obtained by the improved FMEA method combining AHP, GRA, and k-means clustering proposed in this paper is more in line with the practical situation, which takes into account the weights of the risk factors and places the failure modes that can radiate to the other functional layers in a higher risk prioritization.

Comparison of failure mode ranking between the conventional FMEA method and the improved FMEA method.
From the perspective of risk priority, cluster 1 contains the failure modes with low values of RPN and grey relational grades, that is, whether in the conventional FMEA method or the improved FMEA method, the risk priority of these failure modes is low. From the perspective of risk factors, on the whole, the probability of occurrence and detection of failure modes in cluster 1 is relatively low. From the perspective of the structured network of failure modes, the failure modes in cluster 1 are widely distributed in all functional layers of the intelligent manufacturing system and mainly distributed in the downstream functional layers. From the perspective of risk causes, FM5, FM14, FM19, FM21, and FM22 are caused by design defects and configuration defects, and FM14, FM19, FM24 contain the element of human error. For intelligent manufacturing systems, human error is avoidable. With the dynamic development of intelligent manufacturing systems, human error will show a decreasing trend. FM9, FM25, and FM26 are only caused by force majeure, and they can be prevented by taking precautions in advance. In general, this type of failure mode has a relatively small impact on the reliability of intelligent manufacturing systems.
In cluster 2, from the perspective of risk priority, the ranking of RPN and grey relational grade are both at the medium level, or one value is larger and the other is smaller. From the perspective of risk factors, two of the probability of severity (S), occurrence (O), and detection (D) of the failure modes in cluster 2 are generally higher, and the other one is lower, which is why they don’t rank very high on the risk priority scale. From the perspective of the structured network of failure modes, the failure modes in cluster 2 are basically in the upstream position of the failure mode network. They not only affect their own functional layer but also affect their downstream functional layers, causing the overall function loss and downtime of the intelligent manufacturing systems. This is difficult to reflect in the conventional FMEA method and is the main reason for the difference in risk priority between the conventional FMEA method and the improved FMEA method.
From the perspective of risk priority, cluster 3 contains the failure modes with high values of RPN and grey relational grades. From the perspective of risk factors, the failure modes in cluster 3 have high values of S, O, and D, indicating that they have a strong impact on intelligent manufacturing systems. From their inherent nature, they mainly involve three dimensions, including system security, production quality, and information integration. Meanwhile, they are all failure modes that come from the network, enterprise, management, and control layers. Among them, the impact of failure modes FM3. and FM7. from the network layer can spread to the control and equipment layers. FM10, FM12, FM13, FM15, FM16, FM18 and FM20 from the enterprise, management, and control layers are prone to the bi-directional spreading of failure modes. Thus, these failure modes require more resources for preven and monitoring. These results correspond to the failure mode relationships reflected in the knowledge graph, confirming the scientific validity of the methodology of this paper.
Iconclusion, the improved FMEA method can reflect the characteristic that the impact degree of failure modes will dynamically change with the development of intelligent manufacturing systems. In addition, the improved FMEA method can rank the failure modes that affect other functional layers in a higher risk priority and then cause subsequent failures. The improved FMEA method is more suitable for the reliability analysis of intelligent manufacturing systems than the conventional FMEA method and is more reasonable and in line with reality.
Failure modes in cluster 1
The risk priority of failure modes in cluster 1 is not high, and they can be detected in time, and a series of control measures can be taken. Such failure modes are often in the scope of the existing monitoring system and have a relatively sound prevention mechanism. For enterprises, they only need to improve the existing prevention and control mechanism, and there is no need to invest additional resources to prevent and control failure modes in cluster 1.
Among them, the failure causes of FM14, FM19, and FM24 all include human error. Therefore, it is necessary to strengthen the process review mechanism and the relevant business training of employees in various departments and adopt a more comprehensive staff management mechanism. For intelligent manufacturing systems, the management level and operation level are bound to continue to improve, the requirements for relevant personnel will also increase, and the severity, occurrence, and detection of such failure modes will continue to decline. In addition, FM5, FM14, FM19, FM21, and FM22 are mainly caused by design defects and configuration defects. It is necessary to improve the software and hardware of the intelligent manufacturing systems, which can greatly reduce the occurrence of failure modes in cluster 1 and is the inevitable trend of the development of intelligent manufacturing systems. Since FM9, FM25, and FM26 are uncontrollable and difficult to predict, intelligent manufacturing enterprises need to formulate complete emergency plans and always pay attention to relevant early warning information so that the emergency plans can be activated and the failure modes can be restored in an orderly manner to ensure production.
Failure modes in cluster 2
A significant common feature of failure modes in cluster 2 is that most of them are upstream of the structured network of failure modes. This type of failure mode often leads to the occurrence of a series of subsequent failure modes and has a serious impact on the integrity of intelligent manufacturing systems. The suggestion for such failure modes is to focus more on prevention and control, carry out feedback treatment as soon as possible, and minimize the impact in time before the failure modes radiate to the whole system. Therefore, it is necessary to establish a complete predictive monitoring system and a maintenance system that can respond to such failure modes in a timely manner. In the future, when the field of intelligent manufacturing is becoming more and more mature, this type of failure mode will gradually be transferred to cluster 1, which are controllable risks that the intelligent manufacturing systems can carry out automated supervision.
Among them, FM1, FM2, FM4, and FM6 are all caused by system design defects. It is necessary to optimize the relevant system design and improve the control system, which will help enterprises minimize the impact of failure modes in time. FM8 are affected by the macro environment of the market. The characteristics of real-time changes in the market will make these two failure modes unavoidable and difficult to predict, so it is difficult to establish a monitoring mechanism. Then the better strategy is to enhance the flexibility and toughness of the enterprise itself so that the enterprise can make corresponding production adjustments in time to adapt to the changes in market supply and demand. FM6, FM11, FM17, and FM23. are related to human errors, and the impact can be reduced by establishing a complete information verification system.
Failure modes in cluster 3
Failure modes in cluster 3 have the greatest impact on intelligent manufacturing systems, and enterprises should invest the most resources to prevent them. Intelligent manufacturing systems contain a huge number and variety of related equipment. Each piece of equipment is closely coordinated and operated. Once a certain link goes wrong, it will cause the current pduction stagnation, and at worst, affect the function of the whole intelligent manufacturing system. FM3, FM7, FM10, FM12, FM13, FM15, FM16, FM18, and FM20 all involve important links in the internal operation of intelligent manufacturing systems. For these failure modes, it is necessary to establish an early warning and monitoring system to find failures in time and formulate a complete feedback mechanism. When a fault occurs, it shall be maintained in time to restore its function before irreparable losses are caused so as to ensure the continuous otion of the whole intelligent manufacturing system.
Threats to validity
Internal validity
First, the construction of the knowledge graph for intelligent manufacturing systems’ failure modes mainly relies on the subjective experience and judgment of experts in the early stage, and the identification of failure modes may be threatened. To eliminate the deviation, we selected five authoritative experts related to intelligent manufacturing systems through background checks. In this paper, we refer to the intelligent manufacturing standardization system in the stage of identifying failure modes so that the experts uniformly consider five dimensions, namely, the network layer, the enterprise layer, the management layer, the control layer, and the equipment layer, and finally determine the failure modes of intelligent manufacturing systems by combining relevant literature and practical investigations. Since we have selected several experts in the field and considered the failure modes of each dimension of the intelligent manufacturing system more comprehensively, the identification results are authoritative and effective.
The second threat to internal validity comes from the risk prioritization methodology for failure modes. On the one hand, during the risk level evaluation of risk factors, there may be a situation in which experts have different evaluation criteria due to differences in personal knowledge or background. Therefore, the experts in this paper uniformly follow the Likert five-level scoring method, with varied weights assigned based on their background, to generate the scores of each risk factor for each failure mode, ensuring the reasonableness of the results. On the other hand, the results of using AHP to calculate the weights of risk factors and applying the GRA method to calculate the risk prioritization ranking may have potential threats. To minimize this threat, we have analyzed the results obtained from the improved FMEA method proposed in this paper in comparison with those obtained from the conventional FMEA method, which illustrates the rationality and superiority of the improved method. We believe that the applicability of the proposed methodology is not a problem.
The third threat to internal validity is related to failure mode analysis. In this paper, k-means clustering is programmed through MATLAB software to categorize failure modes into three groups: low risk, medium risk, and high risk, which has been proven feasible in reliability analysis literature [36]. As a consequence, the findings of this paper are trustworthy.
External validity
Threats to external validity are related to the generalizability of the results. In this study, we examine intelligent manufacturing systems, and the findings in this study are somewhat typical. However, when the research methodology in this paper is extended to other large and complex systems, the difference in system architecture may pose threats to external validity. To minimize this threat, reliability analysis of other large and complex systems needs to be based on specific system architectures for failure mode identification and evaluation.
Conclusion
Taking the transformative development of the manufacturing field as an entry point, this paper proposed an improved FMEA method integrating the knowledge graph, GRA, and machine learning for the reliability management of intelligent manufacturing systems. Firstly, the structured network of failure modes is constructed based on the knowledge graph for intelligent manufacturing systems, which has merits in evaluating and preventing failure modes. Then, the GRA is applied to determine the risk prioritization of failure modes to reflect the risk prioritization of failure modes more reasonably. Additionally, k-means clustering analysis is employed to extract the features of failure modes, and different control measures can be applied to failure modes of intelligent manufacturing systems with different risk levels, providing an important reference for the prevention and monitoring of failure modes of complex systems such as intelligent manufacturing systems. Finally, a case study was conducted to validate the effectiveness and practicability of this method. The results indicate that “poor interface compatibility between systems”, “illegally accessed and tampered with data”, “logical confusion of the internal functions of the system”, “unreasonable production planning”, “inconsistent production line standards”, “products fail to satisfy market expectations”, “unreasonable production scheduling”, “incomplete functional testing”, and “inaccurate data collection” are high-risk failure modes faced by intelligent manufacturing systems, which mainly involve system safety, production quality, and information integration, requiring more resources to prevent and manage.
The improved FMEA can overcome the limitations of the conventional FMEA approach, such as ignoring risk factor weights and unclear risk ranking of failure modes, and reflect the characteristic that the impact degree of failure modes will dynamically change with the development of intelligent manufacturing systems. Moreover, the improved FMEA put the failure modes that are able to radiate other functional layers in a higher risk prioritization, which is more in line with reality. In addition, the most significant advantage of machine learning lies in its ability to process huge and complex databases and predict and classify a large number of failure modes. Despite the knowledge base of failure modes of intelligent manufacturing established in this paper being small, a complete structured network will be formed when the knowledge base becomes richer and richer, which can reflect the relationship between a large number of failure modes and provide decision-making references for their management. The improved FMEA method can bring its superiority in data processing into full play.
In summary, the proposed method is effective and robust for the reliability management of complex systems, including intelligent manufacturing systems. However, this study has some limitations that should be addressed in future research. First, the failure mode knowledge graph of intelligent manufacturing systems reflects simple failure mode relationships, which are limited to the influence between functional layers. Second, FMEA is a collective decision-making activity, and there may be conflicts in the evaluation opinions between experts. Finally, as the content of the knowledge graph of failure modes is gradually enriched and the structure is gradually improved, k-means clustering may be difficult to meet the needs of accurate classification and management. In future research, we can further consider how to build the knowledge base that can describe more complex relationships between failure modes and establish a more scientific classification logic of failure modes in reliability analysis. Second, we can consider a combination of probabilistic interval-value term sets, spherical fuzzy sets, and hesitant intuitionistic fuzzy sets to deal with uncertainty and ambiguity in expert evaluation and a combination of multi-criteria decision-making methods to improve ranking, such as PROMETHEE and MULTIMOORA. Third, we can use other clustering algorithms for risk management of failure modes, such as global intuitionistic fuzzy weighted c-ordered means clustering algorithm and hybrid integrated decision-making algorithm for clustering analysis based on bipolar complex fuzzy and soft sets. Finally, the model proposed in this paper can be applied to more empirical cases to verify its robustness and effectiveness.
Data availability
The datasets supporting the conclusions of this article are included within the article.
Conflicts of interest
The authors declare that they have no conflicts of interest.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China [Grant number 72171170), Fundamental Research Funds for the Central Universities [Grant number 22120210535, 2023-6-YB-02], and the Shanghai Pujiang Program [Grant number 20PJ1413700]. A preprint has previously been published [
].
