Double-weighted neighborhood standardization method with applications to multimode-process fault detection

Abstract

As modern industrial processes often have multiple production modes, multimode-process monitoring has become an important issue. In multimode processes, the operating condition may often switch among different modes. As a result, popular process monitoring methods such as principal component analysis (PCA) and partial least squares (PLS) method should not be directly applied because they are based on a fundamental assumption that the process only has one stable operating condition. In this paper, a novel multimode-process data-standardization approach called double-weighted neighborhood standardization (DWNS) is proposed to solve the problem of multimode characteristics. This approach can transform multimode data into approximately single-mode data, which follow a Gaussian distribution. By analyzing a concrete example, this study indicates that the DWNS strategy is effective for multimode data preprocessing. Moreover, a novel fault detection method called DWNS-PCA is proposed for multimode processes. Finally, a numerical example and the penicillin fermentation process are used to test the validity and effectiveness of the DWNS-PCA. The results demonstrate that the proposed data-standardization method is suitable for multimode data, and the DWNS-PCA process monitoring method is effective for detecting faults in multimode processes.

Keywords

multimode process double-weighted neighborhood standardization principal component analysis fault detection

1 Introduction

Process monitoring is the most important activity in many industrial processes [1 –3]. To ensure production-process safety and consistent high-quality products, many process monitoring methods have been widely researched over the past several decades [4 –9].

In the various types of process monitoring approaches, as the most popular method, the data-driven approach has recently attracted considerable attention. In particular, multivariate statistical process monitoring (MSPM) approaches are a type of data-driven methods that offer unique advantages in process monitoring. The MSPM approaches analyze the statistical relationships using the features extracted from the process data, identify the characteristics of different states, and predict the future states without using any physical models [10 –16]. PCA and PLS, as representative MSPM approaches, have been widely studied and applied to various industrial processes [17 –21].

Nevertheless, in the traditional MSPM approaches, the process variables are assumed to follow a Gaussian distribution. In contrast, when the variables follow a non-Gaussian distribution, the traditional MSPM methods show poor monitoring performance. However, in modern production processes, many complex manufacturing processes often include multiple operating modes because of various factors, including different raw-material characteristics, diverse product specifications, changing market demands, different external environment, and various production-process requirements [22 –24]. For example, in the electro-fused magnesia furnace manufacturing processes, the materials are diverse under different operating modes, such as powdery and massive magnesium [25]. Under multiple production-mode circumstances, the global process data follow an almost non-Gaussian distribution. The assumption that the process includes only one nominal operating mode becomes invalid. Hence, most MSPM approaches may suffer from inaccurate monitoring results. In recent years, to guarantee higher monitoring performance in multimode manufacturing processes, many multimode monitoring methods have been intensively researched.

The existing main monitoring approaches for multimode processes can be summarized as follows. One approach type is based on mixed models. Because a multimode-process dataset includes data derived from different production processes, an effective method is to construct multiple sub-models that are suitable for each mode. Then, the results of the local models are synthesized. However, in the modeling and monitoring phases of the multimode processes, classifying the historical-process and new data into different subsets corresponding to different operating modes is challenging. To address this problem, many techniques have been recently researched. For example, Zhao et al. proposed a multiple PLS model process monitoring scheme based on metrics in the form of principal angles to measure the similarities between any two models [26]. Ge et al. transferred the traditional monitoring statistics to fault probability in each operating mode and obtained the monitoring results under different operating modes using Bayesian inference [27]. More literature on mixed model approaches is available in [28 –31].

Another approach type is based on a global model. The critical issue in the global-model approach is how to fit the global model to each production mode. A fault detection using the k nearest neighbor (FD-kNN) approach was proposed by He in which a global detection model was established [32]. Some dimension-reduction methods, which are combined with kNN, have been researched to address the problem of the large calculation burden in FD-kNN. Another global model is based on data standardization. To address the problem of multimode characteristic, a new data preprocessing method called local neighborhood standardization (LNS) was proposed by Ma in which multimode data can be normalized into single-mode data. This method was applied to the Tennessee Eastman process monitoring [33].

According to these circumstances, the current study aims to develop a set of data-standardization and multimode-process monitoring methods based on the following motivations. First, although the most common data-standardization method, namely, the z-score method, can transform the original data into a normalized distribution [34], it does not change the distribution characteristics of the data, which means that if the original data include multiple modes, the data after standardization will still have multimode states. The main idea is shown in Fig. 1. Hence, most traditional MSPM methods cannot be directly employed because they are based on the assumption that the process has one nominal operating mode. Second, when the neighbors of the fault data belong to multiple normal modes, the neighborhood standard deviation of the fault data might be inflated. Hence, the fault data may be misclassified as belonging to normal data after standardization using LNS. Under this circumstance, a need arises to develop a novel data-standardization strategy that is aimed at completing the process monitoring of multimode manufacturing processes.

Fig. 1

Frameworks of the z-score and DWNS methods for multimode data.

In this paper, a novel multimode data-standardization method named DWNS is proposed to fit the multimode characters. Moreover, a novel fault detection approach called DWNS-PCA is proposed to address the problems suffered by the traditional PCA because of the assumption of Gaussian distribution and to improve the fault detection performance. In analyzing the data-standardization effect and fault detection performance, the main advantages of this method are the following. (i) In the data-standardization stage, the multimode-process data can be converted into single-mode data, which approximately obey a Gaussian distribution, using the DWNS method; thus, the hypothesis in most MSPM methods is made effective. The main idea is shown in Fig. 1. (ii) When faulty data are located between multiple normal-mode datasets, the proposed method can overcome the problem of misclassification of the faulty data as normal data. (iii) Because of the weight between the test and training data, the fault detection performance can be improved without relying on the number of nearest neighbors. (iv) For multimode-process monitoring, this method only requires a single-mode PCA monitoring model, instead of many sub-models, and does not depend on prior knowledge of the process.

The main contributions of this study paper are summarized as follows. (1) A novel data-standardization approach based on DWNS is proposed. (2) A novel multimode-process monitoring method based on DWNS-PCA is developed. (3) The monitoring procedures of the DWNS-PCA method are summarized. (4) Comparative studies of the proposed methods are presented based on the numerical example and penicillin fermentation process (PFP).

The remainder of this paper is organized as follows. Section 2 provides a detailed discussion and analysis of the multimode data characteristics. Section 3 introduces the proposed DWNS data-standardization method, and Section 4 describes the multimode-process monitoring approach based on DWNS-PCA. Section 5 discusses the numerical example and PFP simulation results. The final section concludes this paper.

2 Multimode features of data

Many complex manufacturing processes often include multiple stable working points. When the production process operates under different production modes, the process and statistical features are different, such as the mean, variance, and relationship of the variables, i.e., data with multimode features. Many factors can cause these multiple modes, such as the change in the raw-material characteristics, external environment, and process load conditions as well as equipment wear [35].

For ease in understanding, a numerical example is introduced to provide an intuitive representation and a detailed analysis. In the training data, 200 normal samples formulated according to Eq. (1) are available, including 100 samples from Mode 1 and 100 samples from Mode 2. Furthermore, 100 fault data, which are located between the two normal-mode data, are generated according to Equation (2). $\begin{matrix} Mode 1 : {\begin{matrix} x_{1} \sim N (- 11, 0 . 5^{2}) \\ x_{2} \sim N (55, 0 . 4^{2}) \\ x_{3} \sim N (80, 0 . 3^{2}) \end{matrix} \\ Mode 2 : {\begin{matrix} x_{1} \sim N (5, 0 . 5^{2}) \\ x_{2} \sim N (12, 1^{2}) \\ x_{3} \sim N (20, 1^{2}) \end{matrix} \end{matrix}$ (1) $Fault data : {\begin{matrix} x_{1} \sim N (- 0.5, 0 . 1^{2}) \\ x_{2} \sim N (20, 0 . 1^{2}) \\ x_{3} \sim N (35, 0 . 1^{2}) \end{matrix}$ (2)

The distribution of the primary training data is shown in Fig. 2. Obviously, the values of $x_{1}$ , $x_{2}$ , and $x_{3}$ have two stable operating states, which represent two modes. Figure 3 shows the probability density of $x_{1}$ , $x_{2}$ , and $x_{3}$ , which shows that they all do not follow a Gaussian distribution. The three-dimensional scatter diagram of the primary training data and 100 fault data is shown in Fig. 4. Clearly, the 100 fault data are located between the two normal modes. Figures 2 –4 clearly show that the primary data are distributed in two different modes.

Fig. 2

Primary data.

Fig. 3

Probability density of the primary data.

Fig. 4

Three-dimensional scatter diagram of the primary data.

Data processing is necessary in the MSPM methods. The primary data should be standardized to eliminate the influence of the variable dimensions. The z-score method is widely used as a type of data-standardization method. It standardizes each variable into a zero mean and unit variance. In the z-score method, the mean and variance are derived from the global process data. However, in the multimode-process data, when the production condition changes, the mean and covariance of the normal process data accordingly change. Because of the influence of the global mean and variance, the standardized dataset may still appear to follow different distributions.

The data-standardization result of the z-score method is shown in Fig. 5. Clearly, the data after the z-score standardization still appear to have two different distributions. The probability density and three-dimensional scatter plots of the z-score standardization are shown in Figs. 6 and 7, respectively. We can observe that the data after the z-score standardization still belong to two different modes, and the fault data cannot be distinguished from the normal data. In this case, using the PCA monitoring method will yield serious mistakes because the data distributions are not Gaussian distributions. Therefore, in the multimode process, the z-score method cannot provide an effective data-standardization result before the PCA monitoring method.

Fig. 5

Data using z-score standardization.

Fig. 6

Probability density of the data using z-score standardization.

Fig. 7

Three-dimensional scatter diagram of the data using z-score standardization.

3 Double-weighted neighborhood standardization approach

Many complex manufacturing processes often include multiple operating modes because of the different production-process requirements. To address the problem of the z-score standardization in a multimode production process, a local neighborhood standardization method was proposed by Ma, which solved the problem of the multimode characteristic in the process data. The specific LNS details can be found in Reference [33]. However, when the neighbors of the fault data belong to multiple modes, the neighborhood standard deviation of the fault data might be inflated. In this case, the fault samples after the LNS will be close to zero, which means that they will be classified as normal data. Figures 8 –10 show the results by the LNS when the number of nearest neighbors is 120. The results demonstrate that when the number of nearest neighbors is more than that in any sub-mode training dataset, the LNS method yields unsatisfactory results. Under an extreme condition, the number of nearest neighbors is close to that of the whole training data, and the LNS becomes similar to the z-score standardization.

Fig. 8

Data using LNS.

Fig. 9

Probability density of the data using LNS.

Fig. 10

Three-dimensional scatter diagram of the data using LNS.

Under this condition, a novel multimode data preprocessing approach called DWNS is proposed in this paper. This strategy standardizes each data based on the weight between the test and training data. The training data with a larger weight play a more significant role in the data standardization. DWNS is also a univariate analysis method. Therefore, it cannot change the structures and relationships between the different variables. The main DWNS steps are described in the following. First, the distance between sample $x_{i}$ and training dataset X ∈ R^n×m is calculated, where n is the number of samples and m is the number of variables. The nearest dataset of $x_{i}$ are sorted in ascending order. $N (x_{i}) = {x_{i}^{1}, \dots x_{i}^{j} \dots, x_{i}^{n}}$ (3) where $N (x_{i})$ represents the nearest neighborhood dataset of $x_{i}$ and $x_{i}^{j}$ is the jth nearest neighbor of sample $x_{i}$ . Then, the k nearest neighborhood dataset of each data in Equation (3) is searched. $N (x_{i}^{j})$ represents a dataset containing $x_{i}^{j}$ and its k nearest neighbor samples. $N (x_{i}^{j}) = {x_{i}^{j}, x_{i}^{j 1}, \dots, x_{i}^{jk}}$ (4)

Parameter k is the number of nearest neighbor samples. Finally, the mean value and standard deviation of dataset $N (x_{i}^{j})$ are calculated. In conclusion, for sample $x_{i}$ , the data-standardization procedure of the DWNS approach is expressed as: $x_{i}^{*} = \sum_{j = 1}^{n} w_{j} \frac{x_{i} - mean (N (x_{i}^{j}))}{std (N (x_{i}^{j}))}$ (5) where $x_{i}^{*}$ is the standardized sample, $mean (N (x_{i}^{j}))$ and $std (N (x_{i}^{j}))$ are the mean value and standard deviation of dataset $N (x_{i}^{j})$ , respectively, and w_j is the weight parameter based on the distances between sample $x_{i}$ and its jth nearest neighbor sample $x_{i}^{j}$ . The weight function monotonically decreases with the distance, which can be expressed as $w_{j} = \frac{\exp (- d {(x_{i}^{j}, x_{i})}^{2} / θ^{2})}{\sum_{j = 1}^{n} \exp (- d {(x_{i}^{j}, x_{i})}^{2} / θ^{2})}, j = 1, 2, \dots, n .$ (6) where constant θ is the tuning parameter and $\sum_{j = 1}^{n} w_{j} = 1$ should be satisfied. $d (x_{i}^{j}, x_{i})$ is the Euclidean distance between $x_{i}^{j}$ and $x_{i}$ [36]. $d (x_{i}^{j}, x_{i}) = \sqrt{{(x_{i}^{j} - x_{i})}^{T} (x_{i}^{j} - x_{i})}, j = 1, 2, \dots, n .$ (7)

According to the earlier discussions, the neighborhood set has been used twice in this study. We can conclude that in one aspect, in the data-standardization phase, the different training data play different roles according to their weights, which are based on the distances between the training and test data. The training data in a nearer neighborhood, i.e., with larger weights, play more significant roles, and the smaller ones might not have any effect. In this manner, the test data are standardized based on the weight of each training sample rather than depending on the number of nearest neighbors, which is a complicated problem in the LNS. In another aspect, when the fault data are in the middle of multiple modes, the jth nearest data in the neighborhood of the fault data belong to a normal data in one mode. At the same time, the data in its neighborhood belong to the same mode. Thus, the proposed method avoids the limitation where the neighborhood set spans multiple modes in the LNS. According to the proposed novel method, the data after standardization follow an approximately Gaussian distribution. The results using the DWNS are shown in Figs. 11 –13.

Fig. 11

Data using DWNS.

Fig. 12

Probability density of the data using DWNS.

Fig. 13

Three-dimensional scatter diagram of the data using DWNS.

4 Multimode-process fault detection based on DWNS-PCA

In this section, a process monitoring method based on DWNS is introduced to demonstrate the validity of the DWNS method. Because the multimode-process data using DWNS follow an approximately Gaussian distribution, PCA, as the most common monitoring method, is introduced in this study. We must note that DWNS is not only suitable for PCA but also for other monitoring methods such as PLS, which is not presented in this paper. The monitoring procedures of the DWNS-PCA are shown in Fig. 14. The confidence limits of the T² and squared prediction error (SPE) statistics are calculated using the kernel density estimation [37]. The procedures for off-line modeling and on-line monitoring are summarized as follows.

Fig. 14

Monitoring procedure of the DWNS-PCA.

The off-line modeling can be performed in the following steps:

The normal process data from different modes are used as the training dataset.

For each training sample $x_{i}$ , its nearest neighborhood set is sorted in ascending order according to the distance to the training dataset.

The k nearest neighborhood dataset of each data in Equation (3) is searched, and the mean value and standard deviation of $N (x_{i}^{j})$ are calculated.

Weight parameter w_j is calculated and $x_{i}$ is standardized using Eqs. (6) and (5), respectively.

For each training example after standardization, PCA is used to extract the principal components and reduce the dimensions.

The control limits of T² and SPE are calculated.

For a new sample, the monitoring phase mainly includes the following steps.

Given new sample $x_{new}$ , its nearest neighborhood set is sorted in ascending order according to the distance to the training dataset.

The new sample is standardized using Steps 3 and 4 in the modeling phase.

The new standardized sample is projected into the principal and residual spaces.

Statistics T² and SPE of $x_{new}$ are calculated.

Statistics T² and SPE of $x_{new}$ are compared with their respective control limits.

If the T² or SPE statistics exceeds its control limit, then a fault occurs in the multimode process.

The previous numerical example, which includes normal (Modes 1 and 2) and faulty data, are tested. The detection results of the z-score-PCA, LNS-PCA, and DWNS-PCA are shown in Figs. 15 –17, respectively.

Fig. 15

Monitoring results using z-score-PCA.

Fig. 16

Monitoring results using LNS-PCA.

Fig. 17

Monitoring results using DWNS-PCA.

5 Simulations

In this section, two cases are introduced to demonstrate the monitoring performance of the DWNS-PCA method. First, a numerical example is presented to prove the effectiveness of the proposed method for monitoring a multimode process. Second, the performance of the proposed method is verified using PFP.

5.1 Numerical example

In this part, a modified numerical example is introduced, which was originally suggested by Ge and Song [27]. In this numerical example, five variables are generated by two sources, namely, $s_{1}$ and $s_{2}$ , according to Equation (8) as follows: ${\begin{matrix} x_{1} = 0.5768 s_{1} + 0.3766 s_{2} + e_{1} \\ x_{2} = 0.7382 s_{1}^{2} + 0.0566 s_{2} + e_{2} \\ x_{3} = 0.8291 s_{1} + 0.4009 s_{2}^{2} + e_{3} \\ x_{4} = 0.6519 s_{1} s_{2} + 0.2070 s_{2} + e_{4} \\ x_{5} = 0.3972 s_{1} + 0.8045 s_{2} + e_{5} \end{matrix}$ (8) where $e_{1}$ , $e_{2}$ , $e_{3}$ , $e_{4}$ , and $e_{5}$ are noisy data whose mean values and standard deviations are zero and 0.01, respectively. The changes in two sources $s_{1}$ and $s_{2}$ are used to reflect the shifts in the different operating modes. In this part, two different operating modes are introduced according to Equation (9). $\begin{matrix} Mode 1 : {\begin{matrix} s_{1} \sim Uniform (- 10, - 7) \\ s_{2} \sim N (- 12, 1) \end{matrix} \\ Mode 2 : {\begin{matrix} s_{1} \sim Uniform (2, 5) \\ s_{2} \sim N (7, 1) \end{matrix} \end{matrix}$ (9)

The training dataset are constructed using 200 normal samples, including 100 samples from Mode 1 and 100 samples from Mode 2. Furthermore, to verify the effectiveness of the DWNS data-preprocessing and DWNS-PCA-based process monitoring methods, two test datasets that contain 200 samples are generated. Test Dataset 1 includes 100 normal and 100 fault samples from Mode 1. A step fault of $x_{5}$ (Fault 1) is introduced from the 101st to the 200th samples. Test Dataset 2 contains 100 normal and 100 fault samples from Mode 2. A slope fault of $x_{1}$ (Fault 2) is introduced from the 101st to the 200th samples.

In the process monitoring phase, the z-score-PCA and LNS-PCA monitoring methods are introduced for comparison. The monitoring results of T² and SPE are shown in Figs. 18 –23. For Fault 1 shown in Figs. 18 and 19, some difficulties in the step fault detection using the z-score-PCA and LNS-PCA methods are pointed out. In particular, the T² monitoring results show that these methods are not suitable for detecting faults in multimode processes. In contrast, the DWNS-PCA method can clearly detect faults with higher detection and accuracy rates, as shown in Fig. 20. For Fault 2, which is shown in Figs. 21 and 22, the z-score-PCA and LNS-PCA methods cannot accurately detect the faults using both T² and SPE. However, Fig. 23 shows that the fault detection results using the DWNS-PCA are more effective. The fault detection performance indexes of the three methods are listed in Table 1. In conclusion, the DWNS-PCA demonstrates better monitoring results than the z-score-PCA and LNS-PCA in multimode-process monitoring.

Fig. 18

Monitoring results of Fault 1 using z-score-PCA.

Fig. 19

Monitoring results of Fault 1 using LNS-PCA.

Fig. 20

Monitoring results of Fault 1 using DWNS-PCA.

Fig. 21

Monitoring results of Fault 2 using z-score-PCA.

Fig. 22

Monitoring results of Fault 2 using LNS-PCA.

Fig. 23

Monitoring results of Fault 2 using DWNS-PCA.

Table 1

Fault detection performance of the three methods in the numerical example

No.	Performance indexes	z-score-PCA		LNS-PCA		DWNS-PCA
		T ²	SPE	T ²	SPE	T ²	SPE
Fault 1	Detection rate (DR) [% ]	4.0	63.0	8.0	53.0	100.0	100.0
	Accuracy rate (AR) [% ]	42.5	75.0	44.5	71.0	95.0	100.0
Fault 2	Detection rate (DR) [% ]	27.0	10.0	3.0	29.0	83.0	69.0
	Accuracy rate (AR) [% ]	63.0	51.5	50.5	60.0	87.0	74.0

5.2 Case study of PFP

Penicillin is produced by microorganism fermentation. Until now, PFP has still attracted the interest of researchers because penicillin offers significant commercial and therapeutic benefits as well as influences the engineering field. PFP, as a very complicated biochemical process, has been widely researched. Typical PFP includes two major operational phases: bacteria growth and penicillin secretory phases. The PFP flow is shown in Fig. 24. In penicillin cultivation, most of the cell mass is usually generated, and penicillin cells begin to be produced in the bacteria growth phase. Then, the cells continue to grow to become penicillin in the secretory phase. Because the data generated under different initial and operating conditions have different mode characteristics, the performance of the DWNS-PCA method is evaluated using PFP in this study [38 –40].

Fig. 24

PFP diagram.

In this work, the data used in the simulation are generated using Pensim V2.0. The training and testing data from different modes are obtained by setting different initial conditions, set points, temperature controllers, and controller types to monitor the pH. PFP is run under two different modes—Modes 1 and 2—to generate different modes for the data (Table 2). In this simulation, aeration rate, agitator power, substrate feed rate, penicillin concentration, culture volume, and acid flow rate are selected as the modeling variables. The training dataset contains 200 normal samples (100 Mode 1 and 100 Mode 2 samples) from two different modes. In addition, 200 testing samples from Mode 1 and 200 testing samples from Mode 2 are generated. In PFP, a fault can be introduced in the aeration rate, agitator power, and substrate feed rate. In this simulation, to verify the performance of the proposed method, Fault 1 is introduced by increasing the agitator power with a step fault in Mode 1, which is shifted from the 61st hour to the end. The fault magnitude is set to 2%. Fault 2 is implemented by increasing the agitator power with a ramp fault in Mode 2, which is sharply shifted from the 61st hour to the end. The fault magnitude is set to 0.9. The faults are described in Table 3.

Table 2

Description of the two modes in the PFP

No.	Initial conditions	Set points	Temperature controller settings	Controller type for PH
Mode 1	Default	Default	Default	PID
Mode 2	Default	Default	Default	On–off

Table 3

Description of the two faults in the PFP

No.	Fault variable	Fault type	Magnitude	Occurrence moment
Fault 1	Agitator power	Step	2%	61 h
Fault 2	Agitator power	Ramp	0.9	61 h

In the modeling phase, the normal data from Modes 1 and 2 are used for modeling. The models are constructed using the z-score-PCA, LNS-PCA, and DWNS-PCA methods before that the modeling data are standardized by the z-score, LNS, and DWNS methods, respectively.

In the monitoring phase, the monitoring results of the three methods for Fault 1 are shown in Figs. 25 –27. Figs. 25 and 26 show that many false diagnoses appear in the SPE statistics using the z-score-PCA and LNS-PCA, respectively. For the z-score-PCA method, the detection rates are 95.7% and 16.4% for T² and SPE, respectively. For LNS-PCA, the detection rates are 88.6% and 38.6%, respectively. For the DWNS-PCA method, as shown in Fig. 27, because the DWNS method is used, the two-mode process data can be converted into single-mode data, which makes the PCA method effective. The alarms are clearly detected. The detection rates are both 100%. The DWNS-PCA demonstrates better monitoring performance than the z-score-PCA and LNS-PCA in terms of both the T² and SPE statistics.

Fig. 25

Monitoring results of Fault 1 using z-score-PCA in PFP.

Fig. 26

Monitoring results of Fault 1 using LNS-PCA in PFP.

Fig. 27

Monitoring results of Fault 1 using DWNS-PCA in PFP.

For Fault 2, Figs. 28 –30 show the monitoring results of the three methods. For the z-score-PCA method, as shown in Fig. 28, the T² and SPE statistics show many false diagnoses. The detection rates are 72.1% and 48.6% for the T² and SPE statistics, respectively. This result is due to the use of the z-score standardization method. The data are still in the multimode states, which make the PCA method ineffective. For the LNS-PCA, as shown in Fig. 29, the detection rates are 60.7% and 55% for the T² and SPE statistics, respectively, because in LNS, the faulty data located between two normal-mode datasets suffer from the problem of being misclassified as normal data. Thus, the detection result is degraded. For the DWNS-PCA method, we can see that DWNS-PCA strategy provides better monitoring results, as shown in Fig. 30. The detection rates are 77.9% and 80% for the T² and SPE statistics, respectively, because in the DWNS method, the two-mode processing data can be converted into single-mode data, which obey an approximately Gaussian distribution and make the hypothesis of the PCA method effective. The fault detection performance indexes of the three methods in these two faults are listed in Table 4. In conclusion, for the two faults, DWNS-PCA demonstrates a better monitoring performance in PFP. Thus, DWNS-PCA is more suitable for multimode process than the z-score-PCA and LNS-PCA methods.

Fig. 28

Monitoring results of Fault 2 using z-score-PCA in PFP.

Fig. 29

Monitoring results of Fault 2 using LNS-PCA in PFP.

Fig. 30

Monitoring results of Fault 2 using DWNS-PCA in PFP.

Table 4

Fault detection performance of the three methods in PFP

No.	Performance indexes	z-score-PCA		LNS-PCA		DWNS-PCA
		T ²	SPE	T ²	SPE	T ²	SPE
Fault 1	Detection rate (DR) [% ]	95.7	16.4	88.6	38.6	100.0	100.0
	Accuracy rate (AR) [% ]	97.0	41.5	92.0	57.0	100.0	100.0
Fault 2	Detection rate (DR) [% ]	72.1	48.6	60.7	55.0	77.9	80.0
	Accuracy rate (AR) [% ]	80.5	64.0	72.5	68.5	84.5	86.0

6 Conclusions

In this paper, a novel data-standardization method called DWNS is proposed. In this method, the statistical characteristics of the mean and variance of local data structures can be used to convert multimode data into single-mode data, which follow an approximately Gaussian distribution. Thus, it can address the problem of multimode-process data that most MSPM methods fail to overcome. Then, a novel DWNS-PCA fault detection approach is proposed for multimode processes. A numerical example and the PFP are introduced to test the effectiveness of DWNS-PCA. The monitoring results indicate that the DWNS-PCA method can significantly improve the performance in monitoring the multimode processes, especially when faulty data are located between multiple modes. Owing to the weights between the test and training data, the detection performance can be improved without relying on the number of nearest neighbors. Finally, the DWNS-PCA approach yields better fault detection performance than the z-score-PCA and LNS-PCA methods.

Footnotes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61733003.

References

Jiang

and Yin

, Recent Advances in Key-Performance-Indicator Oriented Prognosis and Diagnosis With a MATLAB Toolbox: DB-KIT, IEEE Transactions on Industrial Informatics 5(5) (2019), 2849–2858.

Qin

S.J.

, Statistical process monitoring: basics and beyond, Journal of Chemometrics: A Journal of the Chemometrics Society 17(8-9) (2003), 480–502.

and Qin

S.J.

, Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models, AIChE Journal 54(7) (2008), 1811–1829.

Yin

, Rodriguez-Andina

J.J.

and Jiang

, Real-time monitoring and control of industrial cyberphysical systems: with integrated plant-wide monitoring and control framework, IEEE Industrial Electronics Magazine 13(4) (2019), 38–47.

Jiang

, Yin

and Okyay

, Data-DrivenMonitoring and Safety Control of Industrial Cyber-Physical Systems: Basics and Beyond, IEEE Access (2018), 1–1.

Yoo

C.K.

, Villez

, Lee

I.B.

and Vanrolleghem

P.A.

, Multivariate nonlinear statistical process control of a sequencing batch reactor, Journal of chemical engineering of Japan 39(1) (2006), 43–51.

and Yang

, Statistical process monitoring based on modified nonnegative matrix factorization, Journal of Intelligent & Fuzzy Systems 28(3) (2015), 1359–1370.

Yin

, Li

, Gao

and Kaynak

, Data-based techniques focused on modern industry: An overview, IEEE Transactions on Industrial Electronics 62(1) (2014), 657–667.

Chen

and Zhang

, On-line multivariate statistical monitoring of batch processes using Gaussian mixture model, Computers & chemical engineering 34(4) (2010), 500–507.

10.

Zhao

, Zheng

, Xu

and Deng

, Fault diagnosis method based on principal component analysis and broad learning system, IEEE Access 7 (2019), 99263–99272.

11.

Paudyal

, Atique

M.S.A.

and Yang

C.X.

, Local maximum acceleration based rotating machinery fault classification using KNN, presented at 2019 IEEE Int. Conf. on Electro Information Technology (EIT), 2019, pp. 219–224.

12.

Capolino

G.A.

, Antonino-Daviu

J.A.

and Riera-Guasp

, Modern diagnostics techniques for electrical machines, power electronics, and drives, IEEE Transactions on Industrial Electronics 62(3) (2015), 1738–1745.

13.

Liao

, Discovering prognostic features using genetic programming in remaining useful life prediction, IEEE Transactions on Industrial Electronics 61(5) (2013), 2464–2472.

14.

, Wang

and Luo

, Model-based prognosis for hybrid systems with mode-dependent degradation behaviors, IEEE Transactions on Industrial Electronics 61(1) (2013), 546–554.

15.

Yin

, Ding

S.X.

, Xie

and Luo

, A review on basic data-driven approaches for industrial process monitoring, IEEE Transactions on Industrial Electronics 61(11) (2014), 6418–6428.

16.

Boonkhao

, Li

R.F.

, Wang

X.Z.

, Tweedie

R.J.

and Primrose

, Making use of process tomography data for multivariate statistical process control, AIChE Journal 57(9) (2011), 2360–2368.

17.

Jiang

and Yin

, Recursive Total Principle Component Regression Based Fault Detection and Its Application to Vehicular Cyber-Physical Systems, IEEE Transactions on Industrial Informatics (2017), 1–1.

18.

Wang

and Yin

, Quality-related fault detection approach based on orthogonal signal correction and modified PLS, IEEE Transactions on Industrial Informatics 11(2) (2017), 398–405.

19.

Song

, Zhou

, Tan

, Shi

, Zhao

and Wang

, Process Monitoring via Key Principal Components and Local Information Based Weights, IEEE Access 7 (2019), 15357–15366.

20.

Mehmood

, Liland

K.H.

, Snipen

and Sæbø

, A review of variable selection methods in partial least squares regression, Chemometrics and Intelligent Laboratory Systems 118 (2012), 62–69.

21.

Sun

and Hou

, An improved principal component regression for quality-related process monitoring of industrial control systems, IEEE Access 5 (2017), 21723–21730.

22.

Zhao

S. J.

, Zhang

and Xu

Y. M.

, Monitoring of processes with multiple operating modes through multiple principle component analysis models, Industrial & engineering chemistry research 43(22) (2004), 7025–7035.

23.

Z. Q.

, Yang

C. J.

, Song

Z. H.

and Wang

H. Q.

, Robust online monitoring for multimode processes based on nonlinear external analysis, Industrial & Engineering Chemistry Research 47(14) (2008), 4775–4783.

24.

Zhao

C.H.

, Yao

, Gao

F.R.

and Wang

F.L.

, Statistical analysis and online monitoring for multimode processes with between-mode transitions, Chemical Engineering Science 65(22) (2010), 5961–5975.

25.

Zhang

, Wang

and Lu

, Modeling and monitoring of multimode process based on subspace separation, Chemical Engineering Research and Design 91(5) (2013), 831–842.

26.

Zhao

S.J.

, Zhang

and Xu

Y.M.

, Performance monitoring of processes with multiple operating modes through multiple PLS models, Journal of process Control 16(7) (2006), 763–772.

27.

Z.Q.

and Song

Z.H.

, Multimode process monitoring based on Bayesian method, Journal of Chemometrics: A Journal of the Chemometrics Society 23(12) (2009), 636–650.

28.

Natarajan

and Srinivasan

, Multi-model based process condition monitoring of offshore oil and gas production process, Chemical Engineering Research and Design 88(5-6) (2010), 572–591.

29.

Jiang

and Yan

, Multimode process monitoring using variational bayesian inference and canonical correlation analysis, IEEE Transactions onAutomation Science & Engineering (2019), 1–11.

30.

J.B.

, Hidden Markov models combining local and global information for nonlinear and multimodal process monitoring, Journal of process control 20(3) (2010), 344–359.

31.

Z.Q.

, Gao

F.R.

and Song

Z.H.

, Two-dimensional Bayesian monitoring method for nonlinear multimode processes, Chemical Engineering Science 66(21) (2011), 5173–5183.

32.

Q.P.

and Wang

, Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes, IEEE transactions on semiconductor manufacturing 20(4) (2007), 345–354.

33.

, Hu

and Shi

, A novel local neighborhood standardization strategy and its application in fault detection of multimode processes, Chemometrics and Intelligent Laboratory Systems 118 (2012), 287–300.

34.

Wold

, Esbensen

and Geladi

, Principal component analysis, Chemometrics and intelligent laboratory systems 2(1-3) (1987), 37–52.

35.

Z. Q.

, Song

and Gao

, Review of recent research on data-based process monitoring, Industrial & Engineering Chemistry Research 52(10) (2013), 3543–3562.

36.

Wang

, Zhang

and Feng

, On the Euclidean distance of images, IEEE transactions on pattern analysis and machine intelligence 27(8) (2005), 1334–1339.

37.

Chen

, Wynne

R.J.

, Goulding

and Sandoz

, The application of principal component analysis and kernel density estimation to enhance process monitoring, Control Engineering Practice 8(5) (2000), 531–543.

38.

Feng

and Li

, MRS-kNN fault detection method for multirate sampling process based variable grouping threshold, Journal of process Control 85 (2020), 149–158.

39.

Zhang

, Li

, Hu

and Song

, Dynamical process monitoring using dynamical hierarchical kernel partial least squares, Chemometrics & Intelligent Laboratory Systems 118 (2012), 150–158.

40.

Sun

and Zhang

, Fault diagnosis with between mode similarity analysis reconstruction for multimode processes, Chemometrics and Intelligent Laboratory Systems 164 (2017), 43–51.