Fault Diagnosis of Subway Indoor Air Quality Based on Local Fisher Discriminant Analysis

Abstract

This article proposes a combined principal component analysis (PCA) and local Fisher discriminant analysis (LFDA) scheme to improve the fault diagnosis performance of the indoor air quality (IAQ) measuring devices in subway stations. The combined scheme employs PCA for fault detection step and subsequently utilizes LFDA for diagnosing faulty IAQ sensors. A fault discriminant index based on LFDA discriminant components is proposed for fault diagnosis. Effectiveness of the proposed approach is demonstrated on the IAQ measuring system, where three types of IAQ sensor faults including bias fault, drifting fault, and complete failure fault are involved. Results demonstrate that diagnosing performance of LFDA is better than that of conventional Fisher discriminant analysis. The combined method has the capability of detecting and discriminating the sensor faults in the subway system.

Introduction

Considering the fact that there are about eight million people utilizing Seoul metro system each day, indoor air quality (IAQ) in subway stations has attracted wide public concern over recent years (Liu et al., 2013b). Owing to inefficient ventilation and overuse of metro systems, IAQ in subway systems may become problematic, which will significantly influence the health and comfort of the passengers. To effectively manage and control major pollutants in indoor spaces including underground subway stations, Korea Ministry of Environment (MOE) has established environmental laws and regulations such as the IAQ Act (Park and Ha, 2008). The subway sensors installed at subway platforms, waiting rooms, and ticketing areas are used for measuring the concentrations of air pollutants and meteorological values. These IAQ sensors may become abnormal and unreliable because of the long time usage and the bad working environment. Several types of sensor faults such as bias failure, drifting failure, complete failure, and precision degradation failure can be encountered in subway stations. If these sensor failures cannot be detected and diagnosed in time, they will provide faulty measuring information to the IAQ monitoring and control systems, which may result in unnecessary energy consumption or wrong decisions toward subway IAQ maintenance. Therefore, early detection and accurate isolation of the faulty sensors are of key importance for the quality and reliability of sensors in the underground space.

Fault detection and diagnosis (FDD) has received more attention both in the academic and industrial community over the past two decades (Qin, 2012). FDD can be classified into two categories: data-driven (Qin, 2012) and model-based (Gao et al., 2015) methods. The data-driven FDD is more frequently used than the model-based method due to the fact that process data can be easily obtained with the fast development of computer usage in industries. In addition, the model-based FDD has the main disadvantage that the detection and diagnosing results are more sensitive to the developed models. This characteristic makes it unsuitable for the more complex chemical and environmental processes. To detect abnormal conditions, the data-driven FDD typically employs multivariate statistical process monitoring methods such as principal component analysis (PCA) and partial least squares (PLS). PCA is one of the most widely used fault detection methods and has been successfully applied in numerous applications (Kwon et al., 2015). Besides the PCA-based methods, PLS is also widely used for FDD. PLS is similar in concepts to PCA except that PLS is used in processes that have response variables (Qin and Zheng, 2013).

Process monitoring approaches are mainly used in fault detection, fault identification, and fault reconstruction. Their isolation and separation capability is unsatisfactory to diagnose the detected faults. Fault detection is a technique used to determine whether the process is in the normal situation, fault identification is used to find out the root cause of the detected faults, and fault reconstruction is used to estimate the fault-free values of the faulty sensors or process variables. Fault diagnosis, also known as the classification of multiple fault classes, is a procedure of finding out the root causes of the observed abnormal situations (Jiang et al., 2015). For the details about the differences among and methods used in fault detection, fault identification, fault reconstruction, and fault diagnosis, refer to the review by Qin (2012).

Fisher discriminant analysis (FDA) is originated from pattern classification and has been introduced to the fault diagnosis area (Zhao and Gao, 2015). FDA is a linear dimensionality reduction technique that can effectively separate different fault classes by maximizing the scatter between classes and by minimizing the scatter within classes simultaneously. He et al. (2005) proposed an FDA-based procedure that employed FDA fault directions to generate contribution plots for fault diagnosis. To improve the isolation performance within the framework of FDA, a local FDA (LFDA)-based fault diagnosis method was developed and applied to the Tennessee Eastman process (Yu, 2011). Having the advantage of preserving the multimodality within multiple faulty clusters, LFDA usually shows better diagnosing performance than the traditional FDA method.

In particular, extensive research has been carried out on the fault detection and identification of IAQ sensors in subway systems (Kim et al., 2010a, 2010b, 2013, 2014; Liu et al., 2012, 2013a; Lee et al., 2014). Multivariate statistical techniques such as PCA have gained successful applications in the chemometrics research field during the past two decades. A multivariate monitoring method based on PCA was presented not only to monitor the real-time indoor air pollutants data but also to diagnose the status of the IAQ in a subway station (Kim et al., 2010a). Furthermore, multiway principal component analysis (MPCA) (Kim et al., 2010b) and parallel factor analysis (Lee et al., 2014) were developed to improve the monitoring performance for the air pollutants possessing periodic patterns in subway systems. Some work with IAQ sensor fault monitoring has also been reported. A sensor validation scheme based on PCA was developed to improve the IAQ in a subway station. The method mainly consists of three parts: sensor fault detection using PCA, sensor fault identification using a sensor validity index, and sensor fault reconstruction using an iterative reconstruction algorithm (Liu et al., 2012). To achieve sustainable monitoring of indoor air pollutants in an underground subway environment, a self-validating soft sensor was developed on the basis of recursive PLS (Liu et al., 2013a). In addition to the mentioned process monitoring methods such as PCA or PLS, independent component analysis (ICA) has also been actively pursued in recent years (Kim et al., 2013). A dynamic ICA, which is a technique that extracts essential information from dynamic non-Gaussian distributed data, was used to detect, identify, and reconstruct IAQ sensor faults (Kim et al., 2014). However, most of the research on the process monitoring of IAQ in subway systems has focused on the fault detection other than fault diagnosis.

The purpose of this work is to propose a combined scheme of PCA and LFDA for monitoring and diagnosing faulty IAQ sensors in a subway station in Seoul. PCA is used for detection step and LFDA for diagnosing the fault source after a sensor fault is discovered. The aim of this study is to develop an effective fault diagnosis method to the subway IAQ monitoring research field.

Theory

Motivation of fault diagnosis using LFDA

Fault isolation is an important task for the systems containing multiple faulty sensors. Unfortunately, the traditional fault detection statistics such as squared prediction error (SPE) and T² cannot be used to determine which sensor is problematic. In this case, fault diagnosing methods such as FDA and its enhanced version LFDA can be used to solve this problem. FDA is a supervised dimensionality reduction method that is used to search for a few characteristic directions that make samples in the same class close and samples in different classes apart simultaneously (Zhong et al., 2014). It is a widely used method in the pattern classification and recognition field. However, it has a main disadvantage that does not take the within-class multimodality into account. Therefore, it cannot separate the samples in the same class with multimodal feature (one class has several separate clusters). In addition, there are only a few researches on this topic in the domain of process monitoring. For example, FDA has been applied to an industrial Tennessee Eastman chemical process (Ge et al., 2016).

As a linear supervised dimensionality reduction method, LFDA was developed by Sugiyama (2007) and stands for an improved version of conventional FDA. The most significant feature of LFDA is that it can preserve within-class's local structure and multimodality. Multimodality is common in the data measured from many applications. Taking the diagnosis of subway IAQ sensors as an example, the problematic samples resulting from the faulty IAQ sensors could be multimodal because there usually exist several fault types such as bias, drifting, and precision degradation of sensor signals. In contrast to FDA, LFDA takes into account the local structure of the multimodal data, which can be embedded effectively in a local manner by reformulating the constraint of the traditional FDA optimization problem (Sugiyama, 2007).

Principal component analysis

PCA is one of the most commonly used dimension reduction techniques and is widely used in the multivariate statistical process monitoring field. Qin (2012) provided a comprehensive overview and analysis of PCA for process monitoring. Here we briefly describe the key equations for fault detection based on PCA.

Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${{ \bf{X}}_{ \bf{0}}} \in {{ \rm{R}}^{n \times m}}$$ \end{document} be the raw collected data matrix with n measurements and m process variables. Usually X₀ is normalized to a matrix X with zero mean and unit variance. The scaled matrix X can then be decomposed into a score matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{T}} \in {{ \rm{R}}^{n \times d}}$$ \end{document} and a loading matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{P}} \in {{ \rm{R}}^{m \times d}}$$ \end{document} by the singular value decomposition algorithm (Qin, 2012) \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \bf{X}} = { \bf{T}}{{ \bf{P}}^{ \bf{T}}} + \hat{\bf{T}}\hat{\bf{P}}^{\rm{T}}. \tag{1} \end{align*} \end{document}

The number of principal components d can be determined by the variance of reconstruction error method (Qin and Dunia, 2000). To determine whether a fault occurs when a new sample x is available, two indices consisting of SPE and Hotelling's T² are frequently introduced. SPE is calculated using the following equation: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm{SPE}} ( { \bf{x}} ) = {{ \bf{x}}^{ \rm{T}}} ( { \bf{I}} - { \bf{P}}{{ \bf{P}}^{ \rm{T}}} ) { \bf{x}}. \tag{2} \end{align*} \end{document}

The other fault detection index is Hotelling's T² statistic, which can be calculated using the following equation: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {T^2} = {{ \bf{x}}^{ \rm{T}}}{ \bf{P}}{ \Lambda ^{ - 1}}{{ \bf{P}}^{ \rm{T}}}{ \bf{x}} , \tag{3} \end{align*} \end{document}

Local Fisher discriminant analysis

Let the normalized training data \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{X}} = { [ {{ \bf{x}}_1} , {{ \bf{x}}_2} , \cdots , {{ \bf{x}}_n} ] ^T} \in {{ \rm{R}}^{n \times m}}$$ \end{document} have c classes and the kth class C_k have n_k samples. Then we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} n = \mathop \sum \limits_{k = 1}^c {{n_k}} . \tag{4} \end{align*} \end{document}

\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { { \bf { S } } _b } = \frac { 1 } { 2 } \mathop \sum \limits_ { i = 1 } ^n { \mathop \sum \limits_ { j = 1 } ^n { \rm { } } } { \bf { W } } _ { i , j } ^ { ( b ) } ( { x_i } - { x_j } ) { ( { x_i } - { x_j } ) ^ { \rm { T } } } \tag { 5 } \end{align*} \end{document}

and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { { \bf { S } } _w } = \frac { 1 } { 2 } \mathop \sum \limits_ { i = 1 } ^n { \mathop \sum \limits_ { j = 1 } ^n { \rm { } } } { \bf { W } } _ { i , j } ^ { ( w ) } ( { x_i } - { x_j } ) { ( { x_i } - { x_j } ) ^ { \rm { T } } } , \tag { 6 } \end{align*} \end{document}

where the weighting matrices \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{W}}_{i , j}^{ ( b ) }$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{W}}_{i , j}^{ ( w ) }$$ \end{document} are defined as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \bf{W}}_{i , j}^{ ( b ) } = \left\{ { \begin{matrix} {{A_{i , j}} ( {1 \over n} - {1 \over {{n_k}}} ) } \hfill & {{ \rm{if}}_{}^{}\ \ {x_i} \in {C_k} , {x_j} \in {C_k}} \hfill \\ {{1 \over n}} \hfill & {{ \rm{otherwise}}} \hfill \\ \end{matrix} } \right. \tag{7} \end{align*} \end{document}

and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \bf{W}}_{i , j}^{ ( w ) } = \left\{ { \begin{matrix} {{{{A_{i , j}}} \over {{n_k}}}} \hfill & {{ \rm{if}}_{}^{}\ {x_i} \in {C_k} , {x_j} \in {C_k}} \hfill \\ 0 \hfill & {{ \rm{otherwise}}} \hfill \\ \end{matrix} } \right. , \tag{8} \end{align*} \end{document}

The objective function for finding the local Fisher discriminant directions is defined as follows (Sugiyama, 2007): \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {{ \bf{J}}_{{ \rm{LFDA}}}} = \mathop { \mathop { \arg \max } \limits_{{ \bf{J}} \in {{ \rm{R}}^{m \times r}}} } \limits_{} [ { \rm{tr}} ( {{ \bf{J}}^{ \mathop{ \rm T} \nolimits} }{{ \bf{S}}_b}{ \bf{J}}{ ( {{ \bf{J}}^{ \mathop{ \rm T} \nolimits} }{{ \bf{S}}_w}{ \bf{J}} ) ^{ - 1}} ) ] , \tag{9} \end{align*} \end{document}

where r is the dimensionality reduction order in LFDA. This objective function is equivalent to the following generalized eigenvalue decomposition problem \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {{ \bf{S}}_b} \tilde{\bf{p}} = \tilde{\lambda} {{ \bf{S}}_w}\tilde{\bf{p}}, \tag{10} \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{\lambda}$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{\bf{p}}$$ \end{document} are generalized eigenvalue and eigenvector, respectively. The generalized eigenvalues \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde{\lambda}$$ \end{document} are arranged in descending order as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \tilde \lambda _1} \ge { \tilde \lambda _2} \ge \cdots \ge { \tilde \lambda _m}. \tag{11} \end{align*} \end{document}

Then the LFDA transformation matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${{ \bf{J}}_{{ \rm{LFDA}}}}$$ \end{document} is given as (Sugiyama, 2007)

Finally, a mapped representation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${ \bf{Z}} \in {{ \rm{R}}^{n \times r}}$$ \end{document} in a low-dimensional space is calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \bf{Z}}{ \rm{ = }}{ \bf{X}}{{ \bf{J}}_{{ \rm{LFDA}}}}. \tag{13} \end{align*} \end{document}

For further details of the LFDA algorithm, see the article by Sugiyama (2007).

Sensor fault diagnosis index

To diagnose the fault type that a faulty measurement pertains to, several methods have been proposed in the literature. Pattern matching, such as the similarity factor between test discriminant vector and optimal discriminant vector calculated from historical data, has been proposed for fault diagnosis as well (Singhal and Seborg, 2006). A fault diagnosis method based on the Mahalanobis distance has also been applied to some industrial applications such as air handling units (Du and Jin, 2008). In fact, the Mahalanobis distance can be treated as a kind of T² statistic that can be used to evaluate the similarity between each test sample and fault types.

In this work, we propose to use a new sensor fault discriminant index \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}} , i}^2$$ \end{document} , which is calculated in a way similar to calculating the T² statistic in PCA as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} T_{{ \rm{LFDA}} , i}^2 = { ( {{ \bf{z}}_i} - {{ \bf{ \mu }}_i} ) ^{ \rm{T}}}{ \bf{ \Sigma }}_i^{ - 1} ( {{ \bf{z}}_i} - {{ \bf{ \mu }}_i} ) , \tag{14} \end{align*} \end{document}

The intuitionistic meaning of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}} , i}^2$$ \end{document} is explained as follows. The lowest value of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}} , i}^2$$ \end{document} indicates the occurrence of the fault type of the faulty sensor associated with subscript i. Therefore, we can calculate the values of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}} , i}^2$$ \end{document} utilizing test data, and then diagnose the faulty sensor that corresponds to the lowest \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}} , i}^2$$ \end{document} .

Procedure of the proposed method

Three main steps of the proposed procedure are described in Fig. 1. First, a pretreatment of historical data is conducted to deal with the problematic data that may have missing data and outliers. After the pretreatment to the original data, the normalization of the treated data is performed for the PCA modeling. Then, discriminant models based on conventional FDA and LFDA are generated with the aid of process knowledge that is helpful for the last step of sensor fault diagnosis. The purpose of process knowledge is to predefine fault patterns. In this study, IAQ data collected from subway stations may be contaminated due to some sensor faults such as bias, drifting, and complete failure. A PCA model is at the same stage built from the normal data for sensor fault detection. Two statistics of SPE and T² are used to discover sensor fault information for fault detection. Finally, the two-dimensional Fisher direction and three-dimensional Fisher feature space graphs are used for fault visualization. Furthermore, the discriminant index \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}}}^2$$ \end{document} for each sensor fault is calculated and used for the fault diagnosis analysis.

FIG. 1.

Overall flowchart of fault diagnosis of IAQ using LFDA. IAQ, indoor air quality; LFDA, local Fisher discriminant analysis.

Fault Diagnosis of Subway IAQ

IAQ data description

In this study, seven air pollutants (nitrogen monoxide [NO], nitrogen dioxide [NO₂], nitrogen oxides [NO_x], particulate matters with diameters <10 μm [PM₁₀] and 2.5 μm [PM_2.5], carbon monoxide [CO], and carbon dioxide [CO₂]), and two meteorological variables (temperature and humidity) were measured by minivolume air samplers and a telemonitoring system in a subway station in Seoul. The detection limits and the measurement accuracy of the air analyzers are given in Table 1 (Liu and Yoo, 2016). The IAQ data were collected in January of 2010 with a sampling interval of 1 h (Fig. 2). The peak values shown in Fig. 2 are due to the rush hour effect in the subway station. All of the variables were scaled to zero mean and unit variance for PCA modeling. The normalized data were then divided into two data sets: training data set used for getting off-line models and test data set for validating the accuracy of the developed training models. The total number of IAQ data measurements is 600, among which the first 400 measurements were used as training data and the rest as test data.

FIG. 2.

Variations of the IAQ data obtained from a subway station in Seoul.

Table 1.

Detection Limits and Measurement Accuracy of Air Sampling Instruments

Device	Detection limit	Measurement accuracy
NO₂ analyzer (NA-623)	0.5 ppm (0–1 ppm)	Within ±1% of full span
PM₁₀ analyzer (SPM-613D)	less then ±1 μm/m³	Within ±0.5% of full span
PM_2.5 analyzer (SPM-613)	less then ±1 μm/m³	Within ±2% of full span
CO₂ analyzer (NDIR gas analyzer)	0.1 ppm (0–5,000 ppm)	Within ±1% of full span

Discriminant model development using LFDA

To evaluate the fault diagnosis performance of FDA and LFDA methods, three types of subway sensor faults were introduced to the normally collected IAQ data, as shown in Fig. 3a–c. Each sensor fault spans 100 samples and only influences one IAQ sensor. For the training data listed in Table 2, the first sensor fault is a PM₁₀ bias with bias term of 150 μm/m³; the second sensor fault is a drifting type that happens in the temperature sensor from samples 201 to 300 with drifting factor of 0.1°C/h; the third sensor fault is a complete type happening in the PM₁₀ sensor with a constant fault size of 200 μm/m³.

FIG. 3.

IAQ sensor faults: (a) PM₁₀ bias fault involved in the training data set, (b) temperature drifting fault involved in the training data set, and (c) PM₁₀ complete failure fault involved in the training data set, (d) PM₁₀ bias fault involved in the test data set, and (e) temperature drifting fault involved in the test data set.

Table 2.

Indoor Air Quality Sensor Faults Considered

	Normal	Bias	Drifting	Complete failure
Training data
Faulty sensor		PM₁₀	Temperature	PM₁₀
Fault size		150 μm/m³	0.1°C/h	200 μm/m³
Fault occurrence period	0–100	101–200	201–300	301–400
Test data
Faulty sensor		PM₁₀	Temperature
Fault size		100 μm/m³	0.3°C/h
Fault occurrence period	401–450	451–500	501–600

The comparison of the isolation results in terms of the three-dimensional and two-dimensional graphs obtained using FDA and LFDA methods is shown in Figs. 4 and 5, respectively. FDA shows poor isolation capability for the four different classes representing IAQ sensor conditions, including one normal condition and three faulty conditions. From Fig. 4 we observe that most of the data points in the three-dimensional and two-dimensional graphs mix together, which makes FDA impossible for the isolation of IAQ sensor faults. On the contrary, LFDA can separate the faulty data correctly both in the three-dimensional local Fisher feature space and in the projected two-dimensional directions shown in Fig. 5.

FIG. 4.

FDA discriminant model illustrated (a) in three-dimensional Fisher feature space and (b) on the first and second directions.

FIG. 5.

LFDA discriminant model illustrated (a) in three-dimensional local Fisher feature space and (b) on the first and second directions.

The 95% elliptical confidence regions for the four IAQ classes obtained using LFDA method are also shown in Fig. 5b. The elliptical confidence regions were used here to analyze fault diagnosis or classification performance. Only PM₁₀ sensor bias and PM₁₀ sensor complete faults overlap partially, which decreases the isolation performance between these two IAQ sensor faults. With respect to the PM₁₀ sensor bias fault, 7 out of the total 100 samples falling in the intersection of the two elliptical confidence regions are misclassified as the PM₁₀ sensor complete fault, and thus the misclassification rate is 7%. In contrast, with respect to the PM₁₀ sensor complete fault, 12 out of the total 100 samples are misclassified as the PM₁₀ sensor bias fault, therefore, the misclassification rate is 12%. In a word, LFDA is more effective to separate the normal and faulty conditions of IAQ sensors than FDA.

IAQ sensor faults detection and diagnosis for test data set

A statistical PCA model can be developed using the first 100 samples from the measured IAQ data (Table 2). The method of calculating unreconstructed variances for best reconstruction (Qin and Dunia, 2000) was implemented to determine the optimal number of principal components. Three principal components were chosen on the basis of searching for the lowest unreconstructed variance. The first three principal components corresponding to the three largest eigenvalues could explain 85.40% of the total variance of the system, and thus, three principal components were retained for the PCA model.

To test the diagnosing performance of the LFDA method, a test data set containing a normal data set and two IAQ sensor fault data sets was used (Fig. 3d, e and Table 2). The developed PCA model was then utilized for the IAQ fault detection purpose. The results of PCA detection show that there are mainly three distinct regions relating to the mentioned sensor faults as shown in Fig. 6. The SPE statistic is more accurate than the T² statistic with regard to the fault detection rate. The PM₁₀ bias sensor fault can be detected without any time delay by the SPE statistic, whereas it cannot be detected by the T² statistic. In the case of the detection of the temperature drifting sensor fault, the SPE statistic has better detection performance than the T² statistic. The detection delay of 15 samples is also observed for the monitoring chart, which is mainly due to the slowly changed characteristic of the temperature drifting sensor fault.

FIG. 6.

Process monitoring results of T² statistic (top figure) and SPE statistic (bottom figure) for the PM₁₀ bias and the temperature drifting sensor faults. SPE, squared prediction error.

The dimensionality reduction order r of LFDA has a significant effect on the diagnosing results when applied to the IAQ sensor fault diagnosis. Table 3 lists the misclassification rates of different LFDA order values when using the test data that contain the PM₁₀ bias fault and the temperature drifting fault. The misclassification rates of the temperature drifting fault are much lower than those of the PM₁₀ bias fault for the LFDA order values ranging from 2 to 7. The misclassification rate of the PM₁₀ bias fault decreases significantly from 94% to 20% when the LFDA order r increases from 4 to 5. A minimum overall misclassification rate can be achieved when the reduction order of LFDA is 8, which is used in the following analysis. After the sensor fault detection step by PCA, the IAQ sensor fault discriminant index \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}}}^2$$ \end{document} as shown in Fig. 7 can be used to verify the type of a sensor fault associated with the testing faulty data point. There are four different \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}}}^2$$ \end{document} values indicating four conditions of IAQ sensors at each test sample point, and the lowest one relates to the faulty sensor diagnosed on the basis of the fault discriminant model of LFDA. During the period of the samples from 450 to 500, the PM₁₀ bias sensor fault is correctly classified as shown by the bottom curve in Fig. 7. For the period of the samples from 500 to 600, the temperature drifting sensor fault is accurately diagnosed except two sample points.

FIG. 7.

IAQ sensor fault discriminant index \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}}}^2$$ \end{document} for the three predefined IAQ sensor faults.

Table 3.

The Misclassification Rates with Respect to Dimensionality Reduction Order of Local Fisher Discriminant Analysis

LFDA order (r)	2	3	4	5	6	7	8
PM₁₀ bias fault (%)	92	92	94	20	22	16	2
Temperature drifting fault (%)	0	0	4	4	4	1	2
Overall (%)	30.7	30.7	34	9.3	10	6	2

LFDA, local Fisher discriminant analysis.

Alternatively, Fig. 8 shows a graph that can be used to diagnose the IAQ sensor faults in a more intuitive way. Three considered patterns of IAQ sensor faults are shown on the y-axis, and x-axis represents the detected faulty samples by the PCA monitoring statistics (Fig. 8). The fault diagnosis results using LFDA are satisfactory. Only 1 out of 50 samples is misclassified as normal condition when PM₁₀ bias sensor fault occurs. In terms of temperature drifting sensor fault, 2 out of 100 samples cannot be diagnosed correctly, which leads to 2% Type-II error, and both of the samples are misclassified as normal condition. No samples are diagnosed as PM₁₀ complete sensor failure, which indicates that the LFDA approach could perform accurately when dealing with this kind of IAQ sensor fault.

FIG. 8.

Fault diagnosis results using LFDA.

Conclusions

To maintain a stable and reliable monitoring system of indoor air pollutants, a data-driven FDD method is developed in this article to monitor and diagnose sensor faults in the subway IAQ management system. The diagnosing method combines PCA with LFDA. PCA monitoring tools consisting of SPE and T² are used to detect three types of IAQ sensor faults. The detected faulty data are then diagnosed by the two- and three-dimensional visualization graphs of FDA and LFDA. The comparison of the visualization results shows that LFDA is more effective to separate multiple IAQ sensor faults than FDA. To quantitatively determine which IAQ sensor fault occurs, the discriminant index \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$T_{{ \rm{LFDA}}}^2$$ \end{document} is proposed for the test data wherein normal, bias, and drifting signals exist. The diagnosing results are satisfactory and the minimum overall misclassification rate for the test data is only 2%.

Footnotes

Acknowledgments

This study was supported by the Foundation of Nanjing Forestry University (No. 163105996), Open Fund of State Key Laboratory of Pulp and Paper Engineering (Nos. 201813 and 201610), Open Fund of Jiangsu Provincial Key Lab of Pulp and Paper Science and Technology (No. 201530), and the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (No. 2017R1E1A1A03070713).

Author Disclosure Statement

No competing financial interests exist.

References

, and Jin

(2008). Multiple faults diagnosis for sensors in air handling unit using Fisher discriminant analysis. Energy Convers. Manage., 49, 3654.

Gao

, Cecati

, and Ding

S.X.

(2015). A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches. IEEE Trans. Ind. Electron., 62, 3757.

, Zhong

, and Zhang

(2016). Semisupervised kernel learning for FDA model and its application for fault classification in industrial processes. IEEE Trans. Ind. Inform., 12, 1403.

Q.P.

, Qin

S.J.

, and Wang

(2005). A new fault diagnosis method using fault directions in Fisher discriminant analysis. AIChE J., 51, 555.

Jiang

, Zhu

, Huang

, Paulson

J.A.

, and Braatz

R.D.

(2015). A combined canonical variate analysis and Fisher discriminant analysis (CVA-FDA) approach for fault diagnosis. Comput. Chem. Eng., 77, 1.

Kim

, Liu

, Kim

J.T.

, and Yoo

(2013). Sensor fault identification and reconstruction of indoor air quality (IAQ) data using a multivariate non-Gaussian model in underground building space. Energy Build. 66, 384.

Kim

, Liu

, Kim

J.T.

, and Yoo

(2014). Evaluation of passenger health risk assessment of sustainable indoor air quality monitoring in metro systems based on a non-Gaussian dynamic sensor validation method. J. Hazard. Mater., 278, 124.

Kim

, Kim

, Lim

, Kim

J.T.

, and Yoo

(2010b). Predictive monitoring and diagnosis of periodic air pollution in a subway station. J. Hazard. Mater., 183, 448.

Kim

Y.-S.

, Kim

J.T.

, Kim

I.-W.

, Kim

J.-C.

, and Yoo

(2010a). Multivariate monitoring and local interpretation of indoor air quality in Seoul's metro system. Environ. Eng. Sci., 27, 721.

10.

Kwon

S.-B.

, Jeong

, Park

, Kim

K.-T.

, and Cho

K.H

. (2015). A multivariate study for characterizing particulate matter (PM10, PM2.5, and PM1) in Seoul metropolitan subway stations, Korea. J. Hazard. Mater., 297, 295.

11.

Lee

, Liu

, Kim

J.T.

, and Yoo

(2014). Online monitoring and interpretation of periodic diurnal and seasonal variations of indoor air pollutants in a subway station using parallel factor analysis (PARAFAC). Energy Build. 68, 87.

12.

Liu

, Kang

, Kim

, Oh

, Lee

, Kim

J.T.

, and Yoo

(2013a). Sustainable monitoring of indoor air pollutants in an underground subway environment using self-validating soft sensors. Indoor Built Environ. 22, 94.

13.

Liu

, Kim

, Kang

, Sankararao

, Kim

J.-C.

, and Yoo

C.K.

(2012). Sensor validation for monitoring indoor air quality in a subway station. Indoor Built Environ. 21, 205.

14.

Liu

, Lee

, Kim

, Shi

, Kim

J.T.

, Wasewar

K.L.

, and Yoo

(2013b). Multi-objective optimization of indoor air quality control and energy consumption minimization in a subway ventilation system. Energy Build. 66, 553.

15.

Liu

, and Yoo

(2016). A robust localized soft sensor for particulate matter modeling in Seoul metro systems. J. Hazard. Mater., 305, 209.

16.

Park

D.-U.

, and Ha

K.-C.

(2008). Characteristics of PM10, PM2.5, CO2 and CO monitored in interiors and platforms of subway train in Seoul, Korea. Environ. Int., 34, 629.

17.

Qin

S.J.

(2012). Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control, 36, 220.

18.

Qin

S.J.

, and Dunia

(2000). Determining the number of principal components for best reconstruction. J. Process Control, 10, 245.

19.

Qin

S.J.

, and Zheng

(2013). Quality-relevant and process-relevant fault monitoring with concurrent projection to latent structures. AIChE J., 59, 496.

20.

Singhal

, and Seborg

D.E.

(2006). Evaluation of a pattern matching method for the Tennessee Eastman challenge process. J. Process Control, 16, 601.

21.

Sugiyama

(2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J. Mach. Learn. Res., 8, 1027.

22.

(2011). Localized Fisher discriminant analysis based complex chemical process monitoring. AIChE J., 57, 1817.

23.

Zhao

, and Gao

(2015). A nested-loop fisher discriminant analysis algorithm. Chemom. Intell. Lab. Syst., 146, 396.

24.

Zhong

, Wen

, and Ge

(2014). Semi-supervised Fisher discriminant analysis model for fault classification in industrial processes. Chemom. Intell. Lab. Syst., 138, 203.