Abstract
As modern industrial processes often have multiple production modes, multimode-process monitoring has become an important issue. In multimode processes, the operating condition may often switch among different modes. As a result, popular process monitoring methods such as principal component analysis (PCA) and partial least squares (PLS) method should not be directly applied because they are based on a fundamental assumption that the process only has one stable operating condition. In this paper, a novel multimode-process data-standardization approach called double-weighted neighborhood standardization (DWNS) is proposed to solve the problem of multimode characteristics. This approach can transform multimode data into approximately single-mode data, which follow a Gaussian distribution. By analyzing a concrete example, this study indicates that the DWNS strategy is effective for multimode data preprocessing. Moreover, a novel fault detection method called DWNS-PCA is proposed for multimode processes. Finally, a numerical example and the penicillin fermentation process are used to test the validity and effectiveness of the DWNS-PCA. The results demonstrate that the proposed data-standardization method is suitable for multimode data, and the DWNS-PCA process monitoring method is effective for detecting faults in multimode processes.
Keywords
Introduction
Process monitoring is the most important activity in many industrial processes [1–3]. To ensure production-process safety and consistent high-quality products, many process monitoring methods have been widely researched over the past several decades [4–9].
In the various types of process monitoring approaches, as the most popular method, the data-driven approach has recently attracted considerable attention. In particular, multivariate statistical process monitoring (MSPM) approaches are a type of data-driven methods that offer unique advantages in process monitoring. The MSPM approaches analyze the statistical relationships using the features extracted from the process data, identify the characteristics of different states, and predict the future states without using any physical models [10–16]. PCA and PLS, as representative MSPM approaches, have been widely studied and applied to various industrial processes [17–21].
Nevertheless, in the traditional MSPM approaches, the process variables are assumed to follow a Gaussian distribution. In contrast, when the variables follow a non-Gaussian distribution, the traditional MSPM methods show poor monitoring performance. However, in modern production processes, many complex manufacturing processes often include multiple operating modes because of various factors, including different raw-material characteristics, diverse product specifications, changing market demands, different external environment, and various production-process requirements [22–24]. For example, in the electro-fused magnesia furnace manufacturing processes, the materials are diverse under different operating modes, such as powdery and massive magnesium [25]. Under multiple production-mode circumstances, the global process data follow an almost non-Gaussian distribution. The assumption that the process includes only one nominal operating mode becomes invalid. Hence, most MSPM approaches may suffer from inaccurate monitoring results. In recent years, to guarantee higher monitoring performance in multimode manufacturing processes, many multimode monitoring methods have been intensively researched.
The existing main monitoring approaches for multimode processes can be summarized as follows. One approach type is based on mixed models. Because a multimode-process dataset includes data derived from different production processes, an effective method is to construct multiple sub-models that are suitable for each mode. Then, the results of the local models are synthesized. However, in the modeling and monitoring phases of the multimode processes, classifying the historical-process and new data into different subsets corresponding to different operating modes is challenging. To address this problem, many techniques have been recently researched. For example, Zhao et al. proposed a multiple PLS model process monitoring scheme based on metrics in the form of principal angles to measure the similarities between any two models [26]. Ge et al. transferred the traditional monitoring statistics to fault probability in each operating mode and obtained the monitoring results under different operating modes using Bayesian inference [27]. More literature on mixed model approaches is available in [28–31].
Another approach type is based on a global model. The critical issue in the global-model approach is how to fit the global model to each production mode. A fault detection using the k nearest neighbor (FD-kNN) approach was proposed by He in which a global detection model was established [32]. Some dimension-reduction methods, which are combined with kNN, have been researched to address the problem of the large calculation burden in FD-kNN. Another global model is based on data standardization. To address the problem of multimode characteristic, a new data preprocessing method called local neighborhood standardization (LNS) was proposed by Ma in which multimode data can be normalized into single-mode data. This method was applied to the Tennessee Eastman process monitoring [33].
According to these circumstances, the current study aims to develop a set of data-standardization and multimode-process monitoring methods based on the following motivations. First, although the most common data-standardization method, namely, the z-score method, can transform the original data into a normalized distribution [34], it does not change the distribution characteristics of the data, which means that if the original data include multiple modes, the data after standardization will still have multimode states. The main idea is shown in Fig. 1. Hence, most traditional MSPM methods cannot be directly employed because they are based on the assumption that the process has one nominal operating mode. Second, when the neighbors of the fault data belong to multiple normal modes, the neighborhood standard deviation of the fault data might be inflated. Hence, the fault data may be misclassified as belonging to normal data after standardization using LNS. Under this circumstance, a need arises to develop a novel data-standardization strategy that is aimed at completing the process monitoring of multimode manufacturing processes.

Frameworks of the z-score and DWNS methods for multimode data.
In this paper, a novel multimode data-standardization method named DWNS is proposed to fit the multimode characters. Moreover, a novel fault detection approach called DWNS-PCA is proposed to address the problems suffered by the traditional PCA because of the assumption of Gaussian distribution and to improve the fault detection performance. In analyzing the data-standardization effect and fault detection performance, the main advantages of this method are the following. (i) In the data-standardization stage, the multimode-process data can be converted into single-mode data, which approximately obey a Gaussian distribution, using the DWNS method; thus, the hypothesis in most MSPM methods is made effective. The main idea is shown in Fig. 1. (ii) When faulty data are located between multiple normal-mode datasets, the proposed method can overcome the problem of misclassification of the faulty data as normal data. (iii) Because of the weight between the test and training data, the fault detection performance can be improved without relying on the number of nearest neighbors. (iv) For multimode-process monitoring, this method only requires a single-mode PCA monitoring model, instead of many sub-models, and does not depend on prior knowledge of the process.
The main contributions of this study paper are summarized as follows. (1) A novel data-standardization approach based on DWNS is proposed. (2) A novel multimode-process monitoring method based on DWNS-PCA is developed. (3) The monitoring procedures of the DWNS-PCA method are summarized. (4) Comparative studies of the proposed methods are presented based on the numerical example and penicillin fermentation process (PFP).
The remainder of this paper is organized as follows. Section 2 provides a detailed discussion and analysis of the multimode data characteristics. Section 3 introduces the proposed DWNS data-standardization method, and Section 4 describes the multimode-process monitoring approach based on DWNS-PCA. Section 5 discusses the numerical example and PFP simulation results. The final section concludes this paper.
Many complex manufacturing processes often include multiple stable working points. When the production process operates under different production modes, the process and statistical features are different, such as the mean, variance, and relationship of the variables, i.e., data with multimode features. Many factors can cause these multiple modes, such as the change in the raw-material characteristics, external environment, and process load conditions as well as equipment wear [35].
For ease in understanding, a numerical example is introduced to provide an intuitive representation and a detailed analysis. In the training data, 200 normal samples formulated according to Eq. (1) are available, including 100 samples from Mode 1 and 100 samples from Mode 2. Furthermore, 100 fault data, which are located between the two normal-mode data, are generated according to Equation (2).
The distribution of the primary training data is shown in Fig. 2. Obviously, the values of

Primary data.

Probability density of the primary data.

Three-dimensional scatter diagram of the primary data.
Data processing is necessary in the MSPM methods. The primary data should be standardized to eliminate the influence of the variable dimensions. The z-score method is widely used as a type of data-standardization method. It standardizes each variable into a zero mean and unit variance. In the z-score method, the mean and variance are derived from the global process data. However, in the multimode-process data, when the production condition changes, the mean and covariance of the normal process data accordingly change. Because of the influence of the global mean and variance, the standardized dataset may still appear to follow different distributions.
The data-standardization result of the z-score method is shown in Fig. 5. Clearly, the data after the z-score standardization still appear to have two different distributions. The probability density and three-dimensional scatter plots of the z-score standardization are shown in Figs. 6 and 7, respectively. We can observe that the data after the z-score standardization still belong to two different modes, and the fault data cannot be distinguished from the normal data. In this case, using the PCA monitoring method will yield serious mistakes because the data distributions are not Gaussian distributions. Therefore, in the multimode process, the z-score method cannot provide an effective data-standardization result before the PCA monitoring method.

Data using z-score standardization.

Probability density of the data using z-score standardization.

Three-dimensional scatter diagram of the data using z-score standardization.
Many complex manufacturing processes often include multiple operating modes because of the different production-process requirements. To address the problem of the z-score standardization in a multimode production process, a local neighborhood standardization method was proposed by Ma, which solved the problem of the multimode characteristic in the process data. The specific LNS details can be found in Reference [33]. However, when the neighbors of the fault data belong to multiple modes, the neighborhood standard deviation of the fault data might be inflated. In this case, the fault samples after the LNS will be close to zero, which means that they will be classified as normal data. Figures 8–10 show the results by the LNS when the number of nearest neighbors is 120. The results demonstrate that when the number of nearest neighbors is more than that in any sub-mode training dataset, the LNS method yields unsatisfactory results. Under an extreme condition, the number of nearest neighbors is close to that of the whole training data, and the LNS becomes similar to the z-score standardization.

Data using LNS.

Probability density of the data using LNS.

Three-dimensional scatter diagram of the data using LNS.
Under this condition, a novel multimode data preprocessing approach called DWNS is proposed in this paper. This strategy standardizes each data based on the weight between the test and training data. The training data with a larger weight play a more significant role in the data standardization. DWNS is also a univariate analysis method. Therefore, it cannot change the structures and relationships between the different variables. The main DWNS steps are described in the following. First, the distance between sample
Parameter k is the number of nearest neighbor samples. Finally, the mean value and standard deviation of dataset
According to the earlier discussions, the neighborhood set has been used twice in this study. We can conclude that in one aspect, in the data-standardization phase, the different training data play different roles according to their weights, which are based on the distances between the training and test data. The training data in a nearer neighborhood, i.e., with larger weights, play more significant roles, and the smaller ones might not have any effect. In this manner, the test data are standardized based on the weight of each training sample rather than depending on the number of nearest neighbors, which is a complicated problem in the LNS. In another aspect, when the fault data are in the middle of multiple modes, the jth nearest data in the neighborhood of the fault data belong to a normal data in one mode. At the same time, the data in its neighborhood belong to the same mode. Thus, the proposed method avoids the limitation where the neighborhood set spans multiple modes in the LNS. According to the proposed novel method, the data after standardization follow an approximately Gaussian distribution. The results using the DWNS are shown in Figs. 11–13.

Data using DWNS.

Probability density of the data using DWNS.

Three-dimensional scatter diagram of the data using DWNS.
In this section, a process monitoring method based on DWNS is introduced to demonstrate the validity of the DWNS method. Because the multimode-process data using DWNS follow an approximately Gaussian distribution, PCA, as the most common monitoring method, is introduced in this study. We must note that DWNS is not only suitable for PCA but also for other monitoring methods such as PLS, which is not presented in this paper. The monitoring procedures of the DWNS-PCA are shown in Fig. 14. The confidence limits of the T2 and squared prediction error (SPE) statistics are calculated using the kernel density estimation [37]. The procedures for off-line modeling and on-line monitoring are summarized as follows.

Monitoring procedure of the DWNS-PCA.
The off-line modeling can be performed in the following steps: The normal process data from different modes are used as the training dataset. For each training sample The k nearest neighborhood dataset of each data in Equation (3) is searched, and the mean value and standard deviation of Weight parameter w
j
is calculated and For each training example after standardization, PCA is used to extract the principal components and reduce the dimensions. The control limits of T2 and SPE are calculated.
For a new sample, the monitoring phase mainly includes the following steps. Given new sample The new sample is standardized using Steps 3 and 4 in the modeling phase. The new standardized sample is projected into the principal and residual spaces. Statistics T2 and SPE of Statistics T2 and SPE of
If the T2 or SPE statistics exceeds its control limit, then a fault occurs in the multimode process.
The previous numerical example, which includes normal (Modes 1 and 2) and faulty data, are tested. The detection results of the z-score-PCA, LNS-PCA, and DWNS-PCA are shown in Figs. 15–17, respectively.

Monitoring results using z-score-PCA.

Monitoring results using LNS-PCA.

Monitoring results using DWNS-PCA.
In this section, two cases are introduced to demonstrate the monitoring performance of the DWNS-PCA method. First, a numerical example is presented to prove the effectiveness of the proposed method for monitoring a multimode process. Second, the performance of the proposed method is verified using PFP.
Numerical example
In this part, a modified numerical example is introduced, which was originally suggested by Ge and Song [27]. In this numerical example, five variables are generated by two sources, namely,
The training dataset are constructed using 200 normal samples, including 100 samples from Mode 1 and 100 samples from Mode 2. Furthermore, to verify the effectiveness of the DWNS data-preprocessing and DWNS-PCA-based process monitoring methods, two test datasets that contain 200 samples are generated. Test Dataset 1 includes 100 normal and 100 fault samples from Mode 1. A step fault of
In the process monitoring phase, the z-score-PCA and LNS-PCA monitoring methods are introduced for comparison. The monitoring results of T2 and SPE are shown in Figs. 18–23. For Fault 1 shown in Figs. 18 and 19, some difficulties in the step fault detection using the z-score-PCA and LNS-PCA methods are pointed out. In particular, the T2 monitoring results show that these methods are not suitable for detecting faults in multimode processes. In contrast, the DWNS-PCA method can clearly detect faults with higher detection and accuracy rates, as shown in Fig. 20. For Fault 2, which is shown in Figs. 21 and 22, the z-score-PCA and LNS-PCA methods cannot accurately detect the faults using both T2 and SPE. However, Fig. 23 shows that the fault detection results using the DWNS-PCA are more effective. The fault detection performance indexes of the three methods are listed in Table 1. In conclusion, the DWNS-PCA demonstrates better monitoring results than the z-score-PCA and LNS-PCA in multimode-process monitoring.

Monitoring results of Fault 1 using z-score-PCA.

Monitoring results of Fault 1 using LNS-PCA.

Monitoring results of Fault 1 using DWNS-PCA.

Monitoring results of Fault 2 using z-score-PCA.

Monitoring results of Fault 2 using LNS-PCA.

Monitoring results of Fault 2 using DWNS-PCA.
Fault detection performance of the three methods in the numerical example
Penicillin is produced by microorganism fermentation. Until now, PFP has still attracted the interest of researchers because penicillin offers significant commercial and therapeutic benefits as well as influences the engineering field. PFP, as a very complicated biochemical process, has been widely researched. Typical PFP includes two major operational phases: bacteria growth and penicillin secretory phases. The PFP flow is shown in Fig. 24. In penicillin cultivation, most of the cell mass is usually generated, and penicillin cells begin to be produced in the bacteria growth phase. Then, the cells continue to grow to become penicillin in the secretory phase. Because the data generated under different initial and operating conditions have different mode characteristics, the performance of the DWNS-PCA method is evaluated using PFP in this study [38–40].

PFP diagram.
In this work, the data used in the simulation are generated using Pensim V2.0. The training and testing data from different modes are obtained by setting different initial conditions, set points, temperature controllers, and controller types to monitor the pH. PFP is run under two different modes—Modes 1 and 2—to generate different modes for the data (Table 2). In this simulation, aeration rate, agitator power, substrate feed rate, penicillin concentration, culture volume, and acid flow rate are selected as the modeling variables. The training dataset contains 200 normal samples (100 Mode 1 and 100 Mode 2 samples) from two different modes. In addition, 200 testing samples from Mode 1 and 200 testing samples from Mode 2 are generated. In PFP, a fault can be introduced in the aeration rate, agitator power, and substrate feed rate. In this simulation, to verify the performance of the proposed method, Fault 1 is introduced by increasing the agitator power with a step fault in Mode 1, which is shifted from the 61st hour to the end. The fault magnitude is set to 2%. Fault 2 is implemented by increasing the agitator power with a ramp fault in Mode 2, which is sharply shifted from the 61st hour to the end. The fault magnitude is set to 0.9. The faults are described in Table 3.
Description of the two modes in the PFP
Description of the two faults in the PFP
In the modeling phase, the normal data from Modes 1 and 2 are used for modeling. The models are constructed using the z-score-PCA, LNS-PCA, and DWNS-PCA methods before that the modeling data are standardized by the z-score, LNS, and DWNS methods, respectively.
In the monitoring phase, the monitoring results of the three methods for Fault 1 are shown in Figs. 25–27. Figs. 25 and 26 show that many false diagnoses appear in the SPE statistics using the z-score-PCA and LNS-PCA, respectively. For the z-score-PCA method, the detection rates are 95.7% and 16.4% for T2 and SPE, respectively. For LNS-PCA, the detection rates are 88.6% and 38.6%, respectively. For the DWNS-PCA method, as shown in Fig. 27, because the DWNS method is used, the two-mode process data can be converted into single-mode data, which makes the PCA method effective. The alarms are clearly detected. The detection rates are both 100%. The DWNS-PCA demonstrates better monitoring performance than the z-score-PCA and LNS-PCA in terms of both the T2 and SPE statistics.

Monitoring results of Fault 1 using z-score-PCA in PFP.

Monitoring results of Fault 1 using LNS-PCA in PFP.

Monitoring results of Fault 1 using DWNS-PCA in PFP.
For Fault 2, Figs. 28–30 show the monitoring results of the three methods. For the z-score-PCA method, as shown in Fig. 28, the T2 and SPE statistics show many false diagnoses. The detection rates are 72.1% and 48.6% for the T2 and SPE statistics, respectively. This result is due to the use of the z-score standardization method. The data are still in the multimode states, which make the PCA method ineffective. For the LNS-PCA, as shown in Fig. 29, the detection rates are 60.7% and 55% for the T2 and SPE statistics, respectively, because in LNS, the faulty data located between two normal-mode datasets suffer from the problem of being misclassified as normal data. Thus, the detection result is degraded. For the DWNS-PCA method, we can see that DWNS-PCA strategy provides better monitoring results, as shown in Fig. 30. The detection rates are 77.9% and 80% for the T2 and SPE statistics, respectively, because in the DWNS method, the two-mode processing data can be converted into single-mode data, which obey an approximately Gaussian distribution and make the hypothesis of the PCA method effective. The fault detection performance indexes of the three methods in these two faults are listed in Table 4. In conclusion, for the two faults, DWNS-PCA demonstrates a better monitoring performance in PFP. Thus, DWNS-PCA is more suitable for multimode process than the z-score-PCA and LNS-PCA methods.

Monitoring results of Fault 2 using z-score-PCA in PFP.

Monitoring results of Fault 2 using LNS-PCA in PFP.

Monitoring results of Fault 2 using DWNS-PCA in PFP.
Fault detection performance of the three methods in PFP
In this paper, a novel data-standardization method called DWNS is proposed. In this method, the statistical characteristics of the mean and variance of local data structures can be used to convert multimode data into single-mode data, which follow an approximately Gaussian distribution. Thus, it can address the problem of multimode-process data that most MSPM methods fail to overcome. Then, a novel DWNS-PCA fault detection approach is proposed for multimode processes. A numerical example and the PFP are introduced to test the effectiveness of DWNS-PCA. The monitoring results indicate that the DWNS-PCA method can significantly improve the performance in monitoring the multimode processes, especially when faulty data are located between multiple modes. Owing to the weights between the test and training data, the detection performance can be improved without relying on the number of nearest neighbors. Finally, the DWNS-PCA approach yields better fault detection performance than the z-score-PCA and LNS-PCA methods.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61733003.
