Atmospheric SO 2 pollutant prediction using mutual information based TCNN-GRU model for flue gas desulfurization process

Abstract

Over the past several years, sulfur dioxide (SO₂) has raised growing concern in China owing to its adverse impact on atmosphere and human respiratory system. The major contributor to SO₂ emissions is flue gas generated by fossil-fired electricity-generating plants, and as a consequence diverse flue gas desulphurization (FGD) techniques are installed to abate SO₂ emissions. However, the FGD is a dynamic process with serious nonlinearity and large time delay, making the FGD process modeling problem a formidable one. In our research study, a novel hybrid deep learning model with temporal convolution neural network (TCNN), gated recurrent unit (GRU) and mutual information (MI) technique is proposed to predict SO₂ emissions in an FGD process. Among those technique, MI is applied to select variables that are best suited for SO₂ emission prediction, while TCNN and GRU are innovatively integrated to capture dynamics of SO₂ emission in the FGD process. A real FGD system in a power plant with a coal-fired unit of 1000 MW is used as a study case for SO₂ emission prediction. Experimental results show that the proposed approach offers satisfactory performance in predicting SO₂ emissions for the FGD process, and outperforms other contrastive predictive methods in terms of different performance indicators.

Keywords

SO2 emission prediction flue gas desulphurization neural network deep learning mutual information

1. Introduction

Emitted from coal-fired thermal power stations, SO₂ has detrimental effects on human health and atmosphere [1, 2]. To abate SO₂ emissions, governments from different countries have implemented stringent regulations on SO₂ concentration of the emitted flue gas. As for China, the ultralow SO₂ emission concentration limit of 35 mg $\cdot$ m^{- 3} must be reached [3], making various post-combustion desulphurization technologies come into use in recent years. Among them, the wet FGD is the most widely used technology due to its outstanding SO₂ absorption performance and low operating cost. The wet FGD system is a complex industrial system incorporated numerous subsystems (e.g., slurry preparation system, wastewater treatment system), and the desulphurization process is thus characterized in terms of strongly nonlinear dynamics and high time delay, which makes accurate description of an FGD process an especially challenging task.

As accurate modeling of FGD process provides the basis for prediction and control for SO₂ emissions, there has been considerable interest in developing different modeling approaches to characterize the FGD process. The modeling approaches for FGD process are basically categorized into two main types: mechanism-based models (MBMs) and black-box type models (BBMs). The MBMs are derived based upon a thorough understanding of underlying mechanisms that govern dynamic characteristics of the FGD process. The double-film theory and penetration theory are two main methods to model a flue gas desulfurization process, both of which are found to arise frequently in many research works [4] suggested an FGD model on the basis of double-film theory, then the relationship between influencing factors (e.g., fly ash content, velocity of inlet flue gas, etc.) and SO₂ removal efficiency is explored. Analysis results provide guidance for finding the optimal operating parameters, experimental results indicated that the SO₂ removal efficiency can improve up to 99.69% with the derived FGD process parameters. As with [4, 5] utilized the double-film theory to model heat transfer and mass transfer processes of desulfurization slurry droplets. In addition, a comprehensive investigation of influencing factors about the FGD process is also performed based upon the developed model [6] proposed a dynamic FGD process model using penetration theory, and influencing parameters including velocity of droplet, nozzle number and SO₂ mass transfer coefficient are considered during the modeling process, and historical data from a real FGD process is utilized to verify the effectiveness of developed model.

As compared to MBMs, BBMs eliminate the need to thoroughly understand the complicated chemical mechanism of an FGD process, and have the capability to obtain the underlying relationship among different process variables directly from available measurements. The flue gas desulfurization is an inherently nonlinear process, artificial neural networks, as nonlinearly parameterized function approximators, have been extensively used for modeling an FGD process (see [7, 8, 9, 10, 11, 12, 13, 14]) [7] developed a hybrid modeling method to describe the FGD process, the method consists of two parts: an artificial neural network and a mechanism-based model, where the network serves to provide initial estimate for SO₂ emissions while mechanism-based model plays the role of compensating for the resulting deviation. As compared with [7], more predictor variables are involved to construct a neural FGD model in [8], a total of nine influencing variables are taken as model inputs to provide an accurate SO₂ emission prediction result, and the developed model is useful for determining optimal operating ranges for process parameters. Unlike [7] and [8, 9] focused on the training algorithm of the network used for FGD process modeling. It innovatively proposed to use Levenberg-Marquardt with Bayesian regularization algorithm to carry out supervised learning with an neural network, and the neural model is employed for FGD process modeling. Experimental results reveal that outstanding prediction performance is obtained with the trained model. Similar to [9, 10] trained a multi-layered neural network with five selected FGD process variables (i.e. nozzle diameter, nozzle number, gas velocity, SO₂ pollutant content and liquid flow rate) with varied training algorithms, and the obtained performance of different network training algorithms are compared and discussed [11] designed a dynamic neural network (DNN) model to predict the outlet SO₂ concentration for an FGD process, and DNN achieves the best performance as compared to other experimental models (e.g., ordinary least square, random forest). Apart from neural networks, other machine learning models also find use in FGD process modeling tasks. A multiplicity of linear regression models are developed in [12] to perform hourly prediction for SO₂ emission concentration, and the model shows a good fit to the observed data [13] differs from [12] in that heterogeneous techniques rather than homogeneous ones are utilized to model FGD process, more specifically, the least square regression method is combined with support vector machine to predict SO₂ removal efficiency for a desulfurization process, experimental results suggest that the proposed model shows outstanding predictive performance, and the value of determination coefficient is high up to 0.986. An SO₂ emission predictive model that hybridizes support vector regression and random forest model is created in [14], predicted results indicated that the proposed model can achieve better performance by combining advantages of multiple individual models.

As compared to conventional machine learning models, deep learning (DL) techniques are more advantageous for capturing complex dynamics in temporal sequences, and thus have been extensively applied in many fields [15] proposed a deep neural model based upon recurrent neural network and variational autoencoder for early detection of faults in a motor [16] integrated deep long short-term memory (LSTM) network with uniform manifold approximation and projection technique for fire status prediction, which is of considerable practical importance since precautionary measures can be taken by miners according to prediction results. A novel deep adversarial transfer learning network is developed in [17] for machine fault diagnosis, where newly occurring faults can be recognized with high accuracy. A deep sequence-to-sequence LSTM model is proposed in [18] to perform a prediction for NO_X in the selective catalytic reduction system, and data mining methods are incorporated to select more relevant variables so as to further enhance model performance. A deep GRU based neural model is employed in [19] to capture temporal dependencies in sequences of security alert, and the established model can be further utilized to make a prediction for network intrusion alerts from different sources [20] developed a combined deep learning model, which hybridizes the convolution neural network with LSTM model, to predict variables concerning water quality, and satisfactory prediction performance can be achieved with the proposed model. DL techniques are also initially applied to predict SO₂ emissions in an FGD process [21, 22] respectively utilized deep GRU and LSTM to develop a predictive model for the SO₂ emission concentration, and desired prediction results were yielded. An integrated SO₂ predictive model that hybridizes LSTM with auto-regression features was designed in [23], and its efficacy was verified by historical measurements of the studied FGD process. Based on LSTM network and high volumes of environmental measurements, a comprehensive prediction model that incorporates multiple indices of supervised learning is designed in [24], and experimental results demonstrate its superiority by comparing with other baseline models. In [25], a deep BP network model was optimized by particle swarm algorithm and used for SO₂ emission prediction in the flue gas, the developed model shows fast training speed and superior prediction performance. The studied industrial FGD is a typical chemical process with highly nonlinear dynamics and complexity, which pose high requirements for dynamics learning capability of the model. Whereas, it is found that most of the existing studies applied either static machine learning models (e.g. feed-forward neural network, linear regression model) or a standalone deep learning model to predict SO₂ emissions for an FGD process, making it difficult to adequately capture process dynamics and achieve outstanding prediction performance. To fill this gap, an advanced deep learning model, TCNN, is well integrated with GRU to perform SO₂ emission prediction for an FGD process. The newly proposed TCNN-GRU model is advantageous for characterizing complicated dynamics in an FGD process by combining advantages of two modules, and meanwhile this is the first study to employ mutual information based TCNN-GRU hybrid technique in predicting SO₂ emissions for an FGD process. In our hybrid model, TCNN is effective in extracting useful dynamical properties existing in a process, besides it has advantages of stable gradients, low memory requirement for training, fewer parameters, and so on. As regards GRU, it has the powerful ability to capture long-term temporally dependent features within sequences extracted by TCNN. Furthermore, the mutual information technique is introduced to select most relevant features to make prediction for SO₂ emission, which can further improve the predictive capability of the model.

In our study, a novel modeling approach is proposed to address the modeling problem in a flue gas desulphurization process. This work skillfully and innovatively combines multiple concepts and techniques together, including SO₂ emission prediction, flue gas desulphurization, neural network, deep learning and mutual information. More specifically, the mutual information technique is combined with two deep learning models, the temporal convolution network and gated recurrent unit network, to develop an integrated model that is used for predicting SO₂ emissions in a flue gas desulphurization process. This study is structured as follows. The “FGD process description” section provides a comprehensive description of wet FGD process under study. In “Methodology” section, we give a general description of related techniques utilized in our method. The design of proposed modeling framework is detailed in “Proposed modeling approach” section. “Experiments and analysis” tests effectiveness of our approach through experiments designed based upon measurements of a real FGD process. Conclusions are drawn in “Conclusion” section.

2. Methodology

This section provides a description of FGD process and theoretical fundamentals in the proposed prediction model, which includes mutual information feature selection, TCNN and GRU network.

2.1 FGD process description

SO₂ is the main atmospheric pollutant which negatively affects both human health and environment, for instance, it can cause respiratory diseases and acidification of crops. The coal-fired power station is the major source of SO₂ emissions, and as a consequence emission standards for SO₂ has been increasingly strengthened all over the world. Currently, various SO₂ removal technologies are present to reduce sulfur dioxide content in flue gas emitted from coal-fired power stations. Among them, the one gained most popularity is wet FGD, whose main advantages lie in three aspects, namely high SO₂ removal efficiency, low operating cost and reliability. In wet FGD process, the sulfur bearing gas contacts with limestone (CaCO₃) absorbing slurry, and the removal of SO₂ is realized by contacting with the absorbent ${\text{CaCO}}_{3}$ , the overall chemical reaction is written as

$\displaystyle\text{SO}_{\text{2(\text{g})}}+\text{CaCO}_{3(\text{s})}+\text{2H% }_{2}\text{O}_{2(\text{aq})}=\text{CaSO}_{3}\cdot 2\text{H}_{2}\text{O}_{\text% {(aq)}}+\text{CO}_{\text{2(g)}}$ (1)

The reaction product calcium sulphite dihydrate ( $\text{CaCO}_{3}\cdot 2\text{H}_{2}\text{O})$ is then oxidized by the injected oxygen, and calcium sulphate dihydrate ( $\text{CaCO}_{4}\cdot 2\text{H}_{2}\text{O})$ known as gypsum is produced.

$\displaystyle\text{CasSO}_{3}\cdot 2\text{H}_{2}\text{O}_{\text{(aq)}}+1/2% \text{O}_{\text{2(g)}}\leftrightarrow\text{CaSO}_{4}\cdot 2\text{H}_{2}\text{O% }_{(\text{aq})}$ (2)

Figure 1.

Schematic illustration of the wet FGD process.

The FGD system is schematically presented in Fig. 1, where limestone is first pulverized by a ball mill and mixed with water to form the desulfurizing agent in a slurry preparation tank, then the produced limestone slurry is pumped to the storage tank. The desulfurization tower plays an essential role in an FGD process, the boiler exit gas is fed from the lower part of the tower while the limestone absorbent is sprayed from the top by sprayers, and the countercurrent contact of desulfurizer and flue gas can ensure adequacy of the chemical reaction. The last stage concerns the disposal of end chemical product, specifically the gypsum slurry filtrate is dewatered through hydrocyclone and the gypsum ( $\text{CaSO}_{4}\cdot 2\text{H}_{2}\text{O})$ is finally made available, which finds extensive use in cement industry and road construction.

2.2 Mutual information feature selection

FGD is a multivariable process where multiple variables are obtained through different types of sensors, and each of which can be regarded as a candidate variable to predict SO₂ emissions. In this sense, the dimensionality of feature space becomes prohibitively high, which poses the problem of over-fitting and high computational complexity when developing an SO₂ emission predictive model As a consequence, it is essential to eliminate the redundant irrelevant features so as to enhance the prediction performance. Feature relevance criterion plays a significant part in feature selection (FS), and mutual information as an information theoretic criterion has two main advantages. First, it has the capability to measure any type of association between two features, including linear and nonlinear association. Second, MI is kept unchanged with invertible and differentiable transformations in the feature space. In information theory, entropy is a key concept that is used to quantify the information content of a feature, let $M$ be a random variable and its entropy is calculated by

$\displaystyle H(M)=-\sum_{m\in M}P(m)\log p(m)$ (3)

with $p(m)$ being the probability density function (PDF) of $M$ . Assume a variable $N$ is observed the conditional entropy can be represented as

$\displaystyle H(M|N)=-\sum_{n\in N}\sum_{m\in M}p(m,n)\log p(m|n)$ (4)

where $p(m,n)$ and $p(m|n)$ are respectively denote joint probability and posterior probability of $M$ in the presence of $N$ Then the mutual information $I(M,N)$ is derived as follows:

$\displaystyle I(M,N)=H(M)-H(M|N)=\sum_{n\in N}\sum_{m\in M}p(m,n)\log ⁡\frac% {p(m,n)}{p(m)p(n)}$ (5)

From Eq. (5), the $I(M,N)$ score would be high if a close relation exists between $M$ and $N$ , while it equals to zero when two variables are entirely uncorrelated In our research study, outlet SO₂ concentration for an FGD process is the target variable to be predicted, while 10 other process variables influencing the predicted variable are candidate predictor variables to be selected, they are summarized in Table 1. To determine variables that are best suited to perform SO₂ emission prediction, a two-step procedure should be performed. First select variables with high MI scores with respect to the predicted variable, which can be regarded as variables of high relevance for our prediction task. As for selected variables, search for ones with approximate MI scores which implies a close relation exists among them, in this case some of them are are determined as redundant variables and hence should be eliminated to reduce the computational complexity [26]. The above process can be graphically depicted in Fig. 2.

Table 1

Preliminary predictor variables used for SO₂ emission prediction

No.	Feature name	Units	Symbol
1	The pressure for flue gas	Pa	PFG
2	Oxygen content in flue gas	mg $\cdot$ m^{- 3}	OCF
3	Temperature for emitted flue gas	${{}^{\circ}}$ C	TIF
4	Liquid level for FGD tower	m	LLF
5	Temperature for exit flue gas	${{}^{\circ}}$ C	TOF
6	Fly ash content in inlet flue gas	mg $\cdot$ m^{- 3}	FAC
7	SO₂ content in inlet flue gas	mg $\cdot$ m^{- 3}	SCI
8	Inlet flue gas flux	Nm³ $\cdot$ h^{- 1}	IFG
9	Generating units power	MW	GUP
10	Feed air flow	m³ $\cdot$ h^{- 1}	FAF
11	SO₂ concentration in outlet flue gas	mg $\cdot$ m^{- 3}	SCO

Figure 2.

Graphical representation of MI based feature selection.

2.3 Temporal convolution neural network

Structurally speaking, the convolutional neural network (CNN) is basically a feedforward network with convolutional layers. A CNN generally incorporates a series of convolutional layers, whose outputs are connected only to local regions of inputs by using multiple filters, CNNs have the capability of effectively extracting underlying patterns exist in inputs and hence find applications in different fields, such as object detection, text classification, etc. Despite the CNN was initially proposed to address computer vision issues, it is also is effective at addressing sequential data. Here TCNN that was first proposed in [27] by Bai et al. is adopted and served as the feature extractor in the proposed hybrid prediction model. The structure of TCNN and CNN are essentially the same, except that 1D causal convolutional layer with dilation is used in place of a regular convolutional layer in TCNNs. Causal convolutions implies the value at time instant $t$ is merely calculated by elements that are less than or equal to $t$ , while dilated convolutions are introduced to enlarge the receptive field by skipping input values with a specified length. Consider a univariate time series $\{x(k)\}_{k=1}^{T}$ , the output $h$ at time $t$ in a causal dilated convolution is formulated as

$\displaystyle h(k)=(x*_{d}w)(k)=\sum_{f=0}^{F-1}w(f)x(k-d\cdot f)$ (6)

where $w$ is the filter, and $F$ and $d$ respectively represent the filter size and dilation factor. It is typical to stack multiple dilated convolutions to obtain a sufficiently large receptive field and extracts underlying long-range dependencies between time instants. Figure 3 provides a graphical description for the case where four dilated convolution layers are stacked, where filter size $F=$ 2 and dilation factor $d$ grows exponentially with the layer number $i$ as $d=$ 2 ${}^{i-1}(i=1,2,\cdots)$ , and the receptive field size is high up to 16 under this structure. In most cases, the model performance is directly dependent on the receptive field size, which can be enhanced by increasing either the layer number or filter size.

Figure 3.

Illustration of TCNN with exponentially increasing dilation factor and filter size of length 2.

2.4 Gated recurrent unit

The GRU was originally introduced by Cho et al. [28], and it shows superior performance in various prediction contexts, e.g., wind speed, Mine Gas Concentration, etc. Structurally speaking, GRU is quite analogous to LSTM in that it also utilizes gating mechanism to regulate the information flow inside the unit, but the input and forget gate are coupled in GRU and there exist merely two gating units, a reset gate and an update gate, within it. As compared to LSTM GRU is equipped with a smaller number of parameters and a less complicated structure, making it has computational advantages. Notwithstanding this fact, the performance of GRU is comparable to that of LSTM. The structure of GRU is depicted in Fig. 4, where $\sigma$ and tanh respectively represent sigmoidal and tangent hyperbolic transfer function, at time instant $t$ , the information flow in a GRU unit is formulated as

$\displaystyle d_{k}=\sigma(W_{d}[x_{k},h_{k-1}]+b_{d})$ (7) $\displaystyle s_{k}=\sigma(W_{s}[x_{k},h_{k-1}]+b_{s})$ (8) $\displaystyle\tilde{h}_{k}=\textit{tanh}\left(W_{\tilde{h}}\left[x_{k},s_{k}% \odot h_{k-1}\right]+b_{\tilde{h}}\right)$ (9) $\displaystyle h_{k}=(1-d_{k})\odot h_{k-1}+d_{k}\odot\tilde{h}_{k}$ (10)

with $x$ being the unit input, $\tilde{h}$ and $h$ denoting temporary hidden state and hidden state. $W_{d}$ , $W_{s}$ and $W_{\tilde{h}}$ respectively represent corresponding trainable weighting matrix. Outputs of the reset gate and update gate are denoted by $s$ and $db_{\ast}$ (where * denotes $d$ , $s$ and $\tilde{h})$ is the bias vector and $\odot$ stands for the point-wise vector product.

Figure 4.

The structure diagram of GRU network.

3. Proposed hybrid SO₂ emission prediction framework

This section is devoted to a description of the hybrid modelling framework for SO₂ emission prediction, which is comprised of above-discussed mutual information, TCNN and GRU network. The motivation behind the proposed framework is that each component has its own advantage, to be specific, TCNN model is effective in extracting useful dynamical properties existing in a process, especially for multivariable cases; GRU has the powerful ability to capture long-term temporally dependent features within the extracted sequence. In addition, the introduction of mutual information facilitates the selection of most relevant features to make SO₂ emission prediction, which can further improve the predictive capability of the TCNN – GRU model. As presented in Fig. 5, the proposed modelling framework comprises an input layer, a TCNN layer and a GRU layer. The input layer incorporates input variables selection and pre-processing, to be specific, candidate features are evaluated and selected as input variables based on MI metric then pre-processed prior to feeding into the model. The TCNN layer is built up of multiple units, each of which is built up of a dilated convolution layer, a weight normalization layer, a dropout layer and a transfer function (rectified linear unit in chosen in our model), where the weight normalization can speed up the convergence of network while dropout serves to overcome the overfitting problem. The TCNN layer serves as a feature extractor for capturing long/short range relations in sequential features. Then the extracted temporal feature is fed and processed by a series of GRU units in the GRU layer, and the predicted SO₂ emission at each time instant is obtained as the output of GRU layer. TCNN has a powerful ability of extracting temporal features, and meanwhile GRU network is advantageous for memorizing long-term information, providing the integrated model with the capability to satisfactorily identify the system with severely nonlinear dynamics and large time delay, and thus particularly suitable in handling the modelling problem for the FGD process studied in this research. From the above analysis, it is seen that the proposed MI-TCNN-GRU predictive framework is an ensemble of three types of techniques, and is designed based on the flow direction of data. From the direction of data flow perspective, the data processing in MI-TCNN-GRU is performed in four stages.

Stage 1.
Data acquisition: sensors deployed in the FGD process is employed to record measurements of various process variables in real time, which are then transmitted to the build-in management information system, and stored historical data for corresponding variables can be extracted if necessary
Stage 2.
Model inputs selection: select candidate variables that may possibly be employed to characterize the variable to be predicted, and this process is mainly on the basis of practical experience and theoretical knowledge.
Stage 3.
Data preprocessing: preprocess raw measurements of selected variables, which incorporates outlier detection, missing data imputation, data normalization, which is beneficial for the subsequent model development process.
Stage 4.
Feature selection and elimination: the MI metric is employed to evaluate the degree of relevance between two candidate variables according to Eqs (3)–(5) First select variables with high MI scores with respect to the predicted variable, which can be regarded as variables of high relevance for our prediction task As for selected variables, search for ones with approximate MI scores that imply a close relation exists among them, in this case some of them are redundant variables and hence should be eliminated to reduce the computational complexity
Stage 5.
Model development: the well-trained TCNN-GRU model is obtained with the use of preprocessed data obtained in the last stage
Stage 6.
Model validation: Evaluate the established model using performance metrics like root mean square error (RMSE) R squared (R²) and mean absolute percent error (MAPE).

Figure 5.
The structure of MI based TCNN-GRU SO₂ emission prediction framework.

The computational complexity for the integrated model proposed in our manuscript is determined by three parts: 1) calculation of mutual information for feature selection, 2) temporal convolution neural network and 3) gated recurrent unit. The mutual information, as an effective statistic for measuring the degree of relevance between features, the time complexity is found to be $O(n\log n)$ [29], where $n$ denotes the total number of data points. The mutual information is computationally cheap since both computation time and memory usage are proportional to the number of data points [30]. As an extension to the convolution neural network, TCNN has gained popularity owing to its high performance in modelling temporal sequences, and meanwhile it has the capability of processing data in parallel with a small amount of memory [31]. The time complexity of TCNN can be expressed as $O(\frac{2}{d}\textit{lmhn})$ [32], where $l$ and $m$ respectively represent length and dimensionality of the input sequence, $n$ is the number of convolution kernels and $h$ corresponds to the kernel size. With a gating mechanism and simple structure, GRU has great advantages in learning long-term temporal dependencies within the sequence. According to [33], the GRU has time complexity $O(Td_{h}^{2}+Td_{h}d_{i})$ , where $T$ is the sequence length, $d_{h}$ and $d_{i}$ and are dimensions of the hidden state and input, respectively. It is demonstrated experimentally in [34] that the GRU network has lower computational requirements as compared to other popular network models (e.g., LSTM and its variants, recurrent neural network and feed-forward neural network). Based on above analysis, the proposed approach is resource saving with respect to computation and thus has a broad application prospect. The specific procedure of the proposed MI based TCNN-GRU modelling approach is shown below. In the following section, experimental studied are carried out to evaluate the predictive performance of proposed approach.

Algorithm 1: MI based TCNN-GRU modeling approach

Input: temporal sequence $\{x(t)\}$ , hyper-parameters for two types of neural models: TCNN: kernel size, stacked layer number, dilation factor, filter number, dropout rate; GRU: stacked layer number, hidden state number, dropout rate

Output: predicted sequence $\{\hat{x}(t)\}$

Preprocess the series $\{x(t)\}$ with Pauta criterion to detect and eliminate outliers, Lagrange interpolation for missing data imputation and min-max normalization for mapping values within the range of [ $-$ 1,1].
Calculate the MI score for each pair of candidate variables to evaluate the relevance degree between them, and feature selection is performed according to obtained MI scores.

Estimate the range of each hyperparameter in the experimental model with several preliminary runs, and assign corresponding values within it to build the hyper-parameter space.

Search the hyper-parameter space using the random search technique.

Train the TCNN-GRU network with searched hyper-parameters in a supervised manner by the backpropagation (BP) algorithm and measurements of selected features, and record corresponding performance metrics on the validation set.

Compare all obtained results and choose the optimal hyper-parameter combination of the model.

Evaluate the established model on the test set using metrics like R squared ( ${R}^{2}$ ), mean absolute percent error (MAPE) and root mean squared error (RMSE).

4. Experiment and analysis

Algorithm 1: MI based TCNN-GRU modeling approach
Input: temporal sequence $\{x(t)\}$ , hyper-parameters for two types of neural models: TCNN: kernel size, stacked layer number, dilation factor, filter number, dropout rate; GRU: stacked layer number, hidden state number, dropout rate
Output: predicted sequence $\{\hat{x}(t)\}$ Preprocess the series $\{x(t)\}$ with Pauta criterion to detect and eliminate outliers, Lagrange interpolation for missing data imputation and min-max normalization for mapping values within the range of [ $-$ 1,1]. Calculate the MI score for each pair of candidate variables to evaluate the relevance degree between them, and feature selection is performed according to obtained MI scores. Estimate the range of each hyperparameter in the experimental model with several preliminary runs, and assign corresponding values within it to build the hyper-parameter space. Search the hyper-parameter space using the random search technique. Train the TCNN-GRU network with searched hyper-parameters in a supervised manner by the backpropagation (BP) algorithm and measurements of selected features, and record corresponding performance metrics on the validation set. Compare all obtained results and choose the optimal hyper-parameter combination of the model. Evaluate the established model on the test set using metrics like R squared ( ${R}^{2}$ ), mean absolute percent error (MAPE) and root mean squared error (RMSE).

This part is devoted to verifying the efficacy of suggested SO₂ emission prediction model, and a real FGD process is selected as the case to be investigated. Based upon historical FGD operation measurements, experiments are designed to compare our method with other popular methods appeared in previous studies. All experiments run on a personal computer with Microsoft Windows 10.0 environment, 32 GB RAM, and an Intel Core i7 – 11700H, 2.50 GHz base frequency.

4.1 Data description

Our study uses operating data for an actual FGD system of a 1000MW coal-fired power plant in Hebei Province, China, and a total of 12962 samples used for model training and evaluation were obtained through the FGD information management system. The photograph of the FGD tower under study is presented in Fig. 6. The measurements span over a week (from 03/12/2022–03/19/2022) and are proportionally divided into three portions, 70% were used as the training set, while the rest are equally separated into the validation set and test set. Each type of subset has its unique role, where the training set is employed for parameter estimation, the validation set finds uses in early-stop training and searching for optimal hyper-parameters, and the generalization performance of each experimental model is evaluated through the test set.

4.2 Feature selection

As mentioned earlier, a total of 10 candidate variables other than the predicted SO₂ emissions are taken into consideration. To select variables that are best suited to perform SO₂ emissions prediction, the MI score between each variable pair is calculated and results are presented in Fig. 7.

It is observed in Fig. 7 that MI scores between SCO and three predictor variables (i.e., SO₂ concentration in inlet flue gas (SCI), generating units power (GUP) and temperature of inlet flue gas (TIF) are greater than one, where the score of SCI is the largest and high up to 1.67, implying it is most relevant for SO₂ emission prediction. On the other hand, scores between SCO and GUP/TIF are closely approximated, and meanwhile the MI score between GUP and TIF reaches 1.93, which suggests there exists a close relationship between two variables, and only one variable is needed to participate in the SO₂ emission prediction task. Given that the MI score between GUP is slightly higher than TIF, we select SO₂ concentration in inlet flue gas and generating units power as finally chosen. In addition, from the constitution of process dynamics, the current output of process also correlates with its past observations. In view of the above, the model inputs are determined as SCO, GUP and SCI. Table 2 presents the statistical information of variables that are selected to perform SO₂ emission prediction, where S.D, Min and Max are respectively standard deviation, minimum and maximum of the variable.

Table 2
Main statistical characteristics of variables used for SO₂ emission prediction

Variable name	Mean	Median	S.D	Min	Max
SO₂ concentration in inlet flue gas (mg $\cdot$ m^{- 3})	1.05E+03	1.01E+03	134.78	753.02	1.65E+03
Generating units power (MW)	624.65	704.12	167.17	365.78	1.01E+03
SO₂ concentration in outlet flue gas (mg $\cdot$ m^{- 3})	5.27	5.11	1.41	2.01	9.96

Figure 6.

The photograph of studied FGD tower.

Figure 7.

MI scores between candidate predictor variables.

4.3 Preprocessing of data

Prior to the modeling stage, extra preprocessing processes are required to be performed. First, the simple but effective Pauta Criterion is employed for outlier elimination. Specifically, if absolute value for the deviation between a certain measurement and average is three times larger than the standard deviation, then the sample is identified as an outlier and should be removed, which is mathematically formulated as

$\displaystyle\sigma=\sqrt{\frac{\sum^{n}_{i=1}(x_{i}-\bar{x})^{2}}{n-1}}$ (11) $\displaystyle|x_{i}-\bar{x}|>3\sigma$ (12)

in which $x_{i}$ represents the $i$ th measured value, $\sigma$ and $\bar{x}$ are respectively standard deviation and average value of the dataset. As for missing values in the dataset, Lagrange interpolation is adopted to derive a complete temporal data. Finally, all data are normalized in the range of [ $-$ 1,1] using Eq. (13).

$\displaystyle n_{i}=\frac{2(x_{i}-x_{{\min}})}{\textit{xmin}_{{\max}}}-1$ (13)

where $x_{\min}$ and $x_{\max}$ respectively denote the normalized, minimum and maximum value in the sequence. $x_{i}$ and $n_{i}$ are $i^{\text{th}}$ sample before and after normalization, respectively.

4.4 Experimental and parameter settings

This section is about details involved in our experiment. In our approach, learnable parameters in the network (i.e., weights and biases) are identified through the adaptive moment estimation optimizer (Adam), whose major strengths are low computational cost and memory requirements and hence is particularly suited for deep learning. The optimization objective is to minimize the mean square error cost criterion, the mini-batch technique is adopted and the batch size is selected as 64, besides the learning gain and epoch number are respectively specified as 0.001 and 50. To prevent the over-fitting problem, the early stopping technique is employed and network training is terminated when there is no enhancement of performance on the validation set for 6 successive epochs. Further, we compare the proposed TCNN-GRU model with popular methods in the field of SO₂ emission prediction for FGD, they are: recurrent neural network (RNN), multi-layer perceptron (MLP), GRU neural network and TCNN. To quantify the prediction performance for SO₂ emission in the FGD process R squared ( ${R}^{2})$ root meansquare error (RMSE) and mean absolute percentage error (MAPE) are introduced, which are mathematically represented as

$\displaystyle\textit{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}(\hat{x}̂_{i}-x_{i}% )^{2}}$ (14) $\displaystyle\textit{MAPE}=\frac{1}{N}\sum_{i=1}^{N}\left|\frac{\hat{x}_{i}-x_% {i}}{x_{i}}\right|$ (15) $\displaystyle R^{2}=1-\sum_{i=1}^{N}(x_{i}-\hat{x}_{i})^{2}\left/\sum_{i=1}^{N% }(x_{i}-\bar{x})^{2}\right.$ (16)

in which $N$ represents the number of samples in total $x_{i}$ , $\hat{x}_{i}$ and $\bar{x}$ are respectively measured SO₂ concentration, predicted SO₂ concentration and the average of measured SO₂ concentration.

Hyperparameters have a considerable effect on model performance, the optimal combination of hyperparameters tends to yield the bestperforming model. Considering many hyperparameters are often involved in a prediction model, optimal hyper-parameters should be obtained by means of suitable methods instead of searching them manually. Grid search and random search are two mostly applied techniques for hyper-parameter optimization [35], the two methods are differentiated in that the former makes an exhaustive search over the target hyper-parameter space while parameter search is carried out in a random fashion in the latter method. As the grid search technique is a brute force algorithm, it is better suited for low dimensional hyperparameter spaces; while the random search is more effective for hyperparameter spaces of higher dimensions. For all experimental models under study, RNN and MLP are relatively simple ones which have fewer hyper-parameters while GRU, TCNN and proposed GRU-TCNN model possess more hyper-parameters. In this sense, the grid search serves to choose best hyper-parameters for RNN and MLP model, and random search with preset 50 iterations is performed for the rest. With some preliminary runs, rough boundaries for key hyper-parameters in each experimental model can be obtained, searching parameters are then determined by setting proper intervals. Table 3 summarizes searching parameters and optimal values of key hyper-parameters for each experimental model.

Table 3

Main statistical characteristics of variables used for SO₂ emission prediction

Model type	Variable name	Searching range	Optimized value
MLP	Hidden layer number	{1, 2, 3}	3
	Hidden units number per layer	{8, 16, 32 , 64}	32, 64, 16
	Kernel size	{2, 4, 6, 8, 16}	8
TCNN	Stacked layer number	{1, 2, 3, 4}	3
	Dilation factor	{2, 4, 8, 16, 32}	4
	Dropout rate	{0.05, 0.1, 0.15, 0.2, 0.25}	0.2
	Stacked layer number	{1, 2, 3}	2
GRU	Hidden state number	{16, 32, 64, 128}	64, 128
	Dropout rate	{0.1, 0.15, 0.2, 0.25,0.3}	0.15
	Hidden layer number	{1, 2, 3}	3
RNN	Hidden units number per layer	{8, 16, 32 , 64}	16, 64, 32
	Stacked GRU layer number	{1, 2, 3}	2
	Stacked TCNN layer number	{1, 2, 3}	2
	Hidden state number in GRU layer	{8, 16, 32 , 64}	32, 32
TCNN-GRU	Dilation factor in TCNN layer	{2, 4, 8, 16, 32}	16
	Kernel size in TCNN layer	{2, 4, 6, 8, 16}	8
	Dropout rate	{0.1, 0.15, 0.2, 0.25, 0.3}	0.25

4.5 Experimental result and analysis

Table 4
Performance index comparison of models on test set

Model type	RMSE	MAPE	R²
MLP	0.5711	0.0793	0.6540
RNN	0.4492	0.0543	0.8054
TCNN	0.3316	0.0439	0.9196
GRU	0.3118	0.0410	0.9223
PSO-BP	0.3026	0.0391	0.9307
DNN	0.2657	0.0336	0.9422
AM-LSTM	0.1893	0.0258	0.9614
TCNN-GRU	0.1490	0.0201	0.9706

Figure 8.

Prediction results for outlet SO₂ concentration with different models. Red line: predicted values. Blue line: real values. Ordinate: outlet SO₂ concentration (mg/m³).

In this part, each prediction model with optimal parameter configuration is first used to predict the SO₂ emissions for the studied FGD process, then prediction results are compared and analyzed to demonstrate the efficiency of our newly developed modeling approach. Apart from popular SO₂ emission prediction model, We evaluate the performance of proposed method versus current state-of-the-art SO₂ emission prediction models, they are: LSTM network with attention mechanism (AM) in [22] (succinctly written as AM-LSTM in what follows), dynamic deep neural network (DDNN) model in [11] and particle swarm optimization (PSO) algorithm based BP neural network in [25] (shortly, PSO-BPNN). The hyperparameters of comparison models are set as recommended in original studies and some of key parameters are presented below. AM-LSTM: Adam optimizer is adopted, four layers in LSTM are respectively used with 256, 256, 128, 64 units, the dropout rate and learning rate are equal to 0.4 and 0.001, respectively; DNN: the time delay is set equal to 5, three hidden layers in network are respectively employed with 128, 64 and 32 neurons, and Adam optimizer is applied during the model training process. PSO-BP: particle size is 80, learning factors are respectively specified as 1.2 and 1.5, iteration number is 100; two hidden layers with 11 and 23 neurons respectively and Adam optimizer is applied during the model training process. The prediction results of all experimental models are given in Table 4 and Fig. 8 below. As seen from Table 4, it is found that MLP, as a classical nonlinear model, achieves highest scores on MAPE and RMSE while the lowest score on R² (R² = 0.6540, RMSE $=$ 0.5711 mg/m³, MAPE $=$ 0.0793 mg/m³). By comparison, a certain degree of improvements in performance have been made for RNN, more specifically, RMSE and MAPE values are respectively 21.34% and 31.53% lower than that of the MLP, while R² is enhanced by 23.15%. The distinctive structures of GRU network and TCNN enable them to learn large time delay exists in the FGD process, making two models yield further performance enhancement over the RNN model. Specifically, for GRU the RMSE and MAPE values are respectively reduced by 30.59% and 24.49% and R² value is increased by 14.51%; for TCNN drop rates of RMSE and MAPE respectively reach 26.18% and 19.15%, while the growth rate for R² is 14.18%. As regards three state-of-the-art predictive SO₂ emission prediction methods, PSO-BPNN resulted in comparable predictive performance to TCNN and GRU model, and merely marginal improvements in SO₂ emission prediction is achieved. DNN and AM-LSTM, in comparison, yield further performance improvement over PSO-BPNN model, both of which can provide satisfactory fits to the measured data, and AM-LSTM shows better performance (R² = 0.9614, RMSE $=$ 0.1893 mg/m³, MAPE $=$ 0.0258 mg/m³). The proposed TCNN-GRU model performs the best among all experimental models, it achieves the lowest RMSE value of 0.1490, lowest MAPE value of 0.0201 and highest R² value of 0.9706, implying it has the ability to accurately capture complicated dynamic characteristics in SO₂ emission series by combining advantages of two modules TCNN and GRU. Above experimental results indicate that the proposed TCNN-GRU is best suited to perform SO₂ emission prediction for a flue gas desulphurization process.

5. Conclusion and future work

This study deals with the prediction problem of SO₂ emission for a flue gas desulphurization process. An MI based TCNN-GRU model is innovatively developed to perform a prediction for SO₂ emissions in an industrial FGD process, which has characteristics of highly nonlinear dynamics and large time delay. Benefiting from TCNN’s superior feature extracting ability and remarkable long range dependency learning ability of a GRU model, the proposed hybrid model achieves best prediction performance among models that are popular and state-of-the-art in the SO₂ emission prediction field, and values of RMSE, MAPE and R² reach 0.1490, 0.0201 and 0.9706, respectively. The proposed integrated model has structural flexibility and outstanding performance, which demonstrates an extensive application prospect in similar scenarios. Although the proposed approach achieved excellent performance in predicting SO₂ emissions for a flue gas desulfurization process, the methods also present some potential limitations, such as difficulties in determining appropriate values of associated hyper-parameters in a rapid but effective method, heavy computational burden of handling multivariate sequences, and so on. In the future, we will extend our idea to other AI tasks, such as visible-infrared person re-identification [36], 3D mesh analysis [37] and pose transfer [38], and so on.

Footnotes

Acknowledgments

This study is supported by the National Natural Science Foundation of China, Grant number 62373012 and 62303025.

References

Oliveira

L.B.

Marvila

M.T.

Fediuk

Vieira

C.M.F.

and Azevedo

A.R.

, Development of a complementary precursor based on flue gas desulfurization (FGD) for geopolymeric pastes produced with metakaolin, Journal of Materials Research and Technology 22 (2023), 3489–3501.

Hanif

M.A.

Ibrahim

and Abdul Jalil

, Sulfur dioxide removal: An overview of regenerative flue gas desulfurization and factors affecting desulfurization capacity and sorbent regeneration, Environmental Science and Pollution Research 27 (2020), 27515–27540.

Wang

Liang

Jing

and Feng

, Application of flue gas desulfurization gypsum improves multiple functions of saline-sodic soils across China, Chemosphere 277 (2021), 130345.

Cai

Wang

Bai

Han

and Zhou

, Numerical simulation and optimization of semi-dry flue gas desulfurization in a CFB based on the two-film theory using response surface methodology, Powder Technology 401 (2022), 117268–117280.

Yue

Gao

Gong

and Zhou

, Numerical simulation of semi-dry flue gas desulfurization process in the powder-particle spouted bed, Advanced Powder Technology 31 (2020), 323–331.

Zou

and Yuan

, Online application oriented dynamic modeling for the flue gas desulfurization tower in coal-fired power plants, Process Safety and Environmental Protection 159 (2022), 698–707.

Guo

Zheng

Shu

Dong

Zhang

Weng

and Gao

, Modeling and optimization of wet flue gas desulfurization system based on a hybrid modeling method, Journal of the Air & Waste Management Association 69 (2019), 565–575.

Uddin

G.M.

Arafat

S.M.

Ashraf

W.M.

Asim

Bhutta

M.M.A.

Jatoi

H.U.K.

Niazi

S.G.

Jamil

Farooq

and Ghufran

, Artificial intelligence-based emission reduction strategy for limestone forced oxidation flue gas desulfurization system, Journal of Energy Resources Technology 142 (2020), 092103–092116.

Makomere

Rutto

Koech

and Banza

, The use of artificial neural network (ANN) in dry flue gas desulphurization modelling: Levenberg-Marquardt (LM) and Bayesian regularization (BR) algorithm comparison, The Canadian Journal of Chemical Engineering 101 (2023), 3273–3286.

10.

Valera

V.Y.

Codolo

M.C.

and Martins

T.D.

, Artificial neural network for prediction of SO2 removal and volumetric mass transfer coefficient in spray tower, Chemical Engineering Research and Design 170 (2021), 1–12.

11.

Fan

Wang

Chang

and Zhao

, Soft sensing of SO2 emission for ultra-low emission coal-fired power plant with dynamic model and segmentation model, Fuel 332 (2023), 125921–125935.

12.

Trošić Lesar

and Filipčić

, Prediction of the SO2 Hourly Concentration for Sea Breeze and Land Breeze in an Urban Area of Split Using Multiple Linear Regression, Atmosphere 14 (2023), 420–432.

13.

Wang

Zhang

Hou

and Jia

, A novel prediction model of desulfurization efficiency based on improved FCM-PLS-LSSVM, Multimedia Tools and Applications 82 (2023), 5685–5708.

14.

Lei

Guo

Zhang

and Lou

, A new prediction method of industrial atmospheric pollutant emission intensity based on pollutant emission standard quantification, Frontiers of Environmental Science & Engineering 17 (2023), 8–20.

15.

Huang

Chen

C.-H.

and Huang

C.-J.

, Motor fault detection and feature extraction using RNN-based variational autoencoder, IEEE Access 7 (2019), 139086–139096.

16.

Kumari

Dey

Kumar

Pandit

Mishra

Kisku

Chaulya

Ray

and Prasad

, UMAP and LSTM based fire status and explosibility prediction for sealed-off area in underground coal mine, Process Safety and Environmental Protection 146 (2021), 837–852.

17.

Huang

Wang

and Li

, A deep adversarial transfer learning network for machinery emerging fault detection, IEEE Sensors Journal 20 (2020), 8413–8422.

18.

Xie

Gao

Zhang

Niu

and Wang

, Dynamic modeling for NOx emission sequence prediction of SCR system outlet based on sequence to sequence long short-term memory network, Energy 190 (2020), 116482–116495.

19.

Ansari

M.S.

Bartoš

and Lee

, GRU-based deep learning approach for network intrusion alert prediction, Future Generation Computer Systems 128 (2022), 235–247.

20.

Barzegar

Aalami

M.T.

and Adamowski

, Short-term water quality variable prediction using a hybrid CNN – LSTM deep learning model, Stochastic Environmental Research and Risk Assessment 34 (2020), 415–433.

21.

Chen

Gao

Zhang

and Yue

, Dynamic prediction of SO2 emission based on hybrid modeling method for coal-fired circulating fluidized bed, Fuel 346 (2023), 128284–128296.

22.

Pang

Duan

Zhou

Han

Yao

Zheng

Yang

and Gao

, An integrated LSTM-AM and SPRT method for fault early detection of forced-oxidation system in wet flue gas desulfurization, Process Safety and Environmental Protection 160 (2022), 242–254.

23.

Zhao

Shao

Tan

Zhou

Fan

Zheng

and Gao

, Prediction of inlet SO2 concentration of wet flue gas desulfurization (WFGD) by operation parameters of coal-fired boiler, Environmental Science and Pollution Research 30 (2023), 53089–53102.

24.

Seng

Zhang

Chen

and Chen

, Spatiotemporal prediction of air quality based on LSTM neural network, Alexandria Engineering Journal 60 (2021), 2021–2032.

25.

Liu

Zhang

Lyu

and Chen

, Prediction of SO2 and NOx in sintering flue gas based on PSO-BP neural network model, Ironmaking & Steelmaking 50 (2023), 1–8.

26.

Vajargah

K.F.

Golshan

H.M.

and Farahabadi

F.B.

, Improving the LDA Linear Discriminant Analysis Method By Eliminating Redundant Variables for the Diagnosis Of COVID-19 Patients, Applications & Applied Mathematics 18 (2023), 1–11.

27.

Bai

Kolter

J.Z.

and Koltun

, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv preprint arXiv:180301271. (2018).

28.

Cho

Van Merriënboer

Gulcehre

Bahdanau

Bougares

Schwenk

and Bengio

, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:14061078. (2014).

29.

Evans

, A computationally efficient estimator for mutual information, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 464 (2008), 1203–1215.

30.

Ross

B.C.

, Mutual information between discrete and continuous data sets, PloS One 9 (2014), e87357–e87366.

31.

Lara-Benítez

Carranza-García

Luna-Romera

J.M.

and Riquelme

J.C.

, Temporal convolutional networks applied to energy-related time series forecasting, Applied Sciences 10 (2020), 2322–2335.

32.

Sun

Luo

Gao

Wang

Gao

and Yang

, Categorizing malware via A Word2Vec-based temporal convolutional network scheme, Journal of Cloud Computing 9 (2020), 1–14.

33.

Rotman

and Wolf

, Shuffling recurrent neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 9428–9435..

34.

Mzelikahle

Trimble

and Hlatywayo

D.J.

, A Hybrid Technique Between BOSOM and LSTM for Data Analysis, International Journal of Mathematics and Computational Science 4 (2018), 128–138.

35.

Bhat

P.C.

Prosper

H.B.

Sekmen

and Stewart

, Optimizing event selection with the random grid search, Computer Physics Communications 228 (2018), 245–257.

36.

and Gao

, Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification, Neurocomputing 523 (2023), 170–181.

37.

Fan

and Song

, TPNet: A Novel Mesh Analysis Method via Topology Preservation and Perception Enhancement, Computer Aided Geometric Design 104 (2023), 102219–102233.

38.

Duan

and Yan

, Perceptual metric-guided human image generation, Integrated Computer-Aided Engineering 29 (2022), 141–151.

Atmospheric SO 2 pollutant prediction using mutual information based TCNN-GRU model for flue gas desulfurization process

Abstract

Keywords

1. Introduction

2. Methodology

2.1 FGD process description

4.1 Data description

4.2 Feature selection

Table 2 Main statistical characteristics of variables used for SO2 emission prediction

Table 4 Performance index comparison of models on test set

Footnotes

Acknowledgments

References

Table 2
Main statistical characteristics of variables used for SO₂ emission prediction

Table 4
Performance index comparison of models on test set