Abstract
Over the past several years, sulfur dioxide (SO2) has raised growing concern in China owing to its adverse impact on atmosphere and human respiratory system. The major contributor to SO2 emissions is flue gas generated by fossil-fired electricity-generating plants, and as a consequence diverse flue gas desulphurization (FGD) techniques are installed to abate SO2 emissions. However, the FGD is a dynamic process with serious nonlinearity and large time delay, making the FGD process modeling problem a formidable one. In our research study, a novel hybrid deep learning model with temporal convolution neural network (TCNN), gated recurrent unit (GRU) and mutual information (MI) technique is proposed to predict SO2 emissions in an FGD process. Among those technique, MI is applied to select variables that are best suited for SO2 emission prediction, while TCNN and GRU are innovatively integrated to capture dynamics of SO2 emission in the FGD process. A real FGD system in a power plant with a coal-fired unit of 1000 MW is used as a study case for SO2 emission prediction. Experimental results show that the proposed approach offers satisfactory performance in predicting SO2 emissions for the FGD process, and outperforms other contrastive predictive methods in terms of different performance indicators.
Keywords
Introduction
Emitted from coal-fired thermal power stations, SO2 has detrimental effects on human health and atmosphere [1, 2]. To abate SO2 emissions, governments from different countries have implemented stringent regulations on SO2 concentration of the emitted flue gas. As for China, the ultralow SO2 emission concentration limit of 35 mg
As accurate modeling of FGD process provides the basis for prediction and control for SO2 emissions, there has been considerable interest in developing different modeling approaches to characterize the FGD process. The modeling approaches for FGD process are basically categorized into two main types: mechanism-based models (MBMs) and black-box type models (BBMs). The MBMs are derived based upon a thorough understanding of underlying mechanisms that govern dynamic characteristics of the FGD process. The double-film theory and penetration theory are two main methods to model a flue gas desulfurization process, both of which are found to arise frequently in many research works [4] suggested an FGD model on the basis of double-film theory, then the relationship between influencing factors (e.g., fly ash content, velocity of inlet flue gas, etc.) and SO2 removal efficiency is explored. Analysis results provide guidance for finding the optimal operating parameters, experimental results indicated that the SO2 removal efficiency can improve up to 99.69% with the derived FGD process parameters. As with [4, 5] utilized the double-film theory to model heat transfer and mass transfer processes of desulfurization slurry droplets. In addition, a comprehensive investigation of influencing factors about the FGD process is also performed based upon the developed model [6] proposed a dynamic FGD process model using penetration theory, and influencing parameters including velocity of droplet, nozzle number and SO2 mass transfer coefficient are considered during the modeling process, and historical data from a real FGD process is utilized to verify the effectiveness of developed model.
As compared to MBMs, BBMs eliminate the need to thoroughly understand the complicated chemical mechanism of an FGD process, and have the capability to obtain the underlying relationship among different process variables directly from available measurements. The flue gas desulfurization is an inherently nonlinear process, artificial neural networks, as nonlinearly parameterized function approximators, have been extensively used for modeling an FGD process (see [7, 8, 9, 10, 11, 12, 13, 14]) [7] developed a hybrid modeling method to describe the FGD process, the method consists of two parts: an artificial neural network and a mechanism-based model, where the network serves to provide initial estimate for SO2 emissions while mechanism-based model plays the role of compensating for the resulting deviation. As compared with [7], more predictor variables are involved to construct a neural FGD model in [8], a total of nine influencing variables are taken as model inputs to provide an accurate SO2 emission prediction result, and the developed model is useful for determining optimal operating ranges for process parameters. Unlike [7] and [8, 9] focused on the training algorithm of the network used for FGD process modeling. It innovatively proposed to use Levenberg-Marquardt with Bayesian regularization algorithm to carry out supervised learning with an neural network, and the neural model is employed for FGD process modeling. Experimental results reveal that outstanding prediction performance is obtained with the trained model. Similar to [9, 10] trained a multi-layered neural network with five selected FGD process variables (i.e. nozzle diameter, nozzle number, gas velocity, SO2 pollutant content and liquid flow rate) with varied training algorithms, and the obtained performance of different network training algorithms are compared and discussed [11] designed a dynamic neural network (DNN) model to predict the outlet SO2 concentration for an FGD process, and DNN achieves the best performance as compared to other experimental models (e.g., ordinary least square, random forest). Apart from neural networks, other machine learning models also find use in FGD process modeling tasks. A multiplicity of linear regression models are developed in [12] to perform hourly prediction for SO2 emission concentration, and the model shows a good fit to the observed data [13] differs from [12] in that heterogeneous techniques rather than homogeneous ones are utilized to model FGD process, more specifically, the least square regression method is combined with support vector machine to predict SO2 removal efficiency for a desulfurization process, experimental results suggest that the proposed model shows outstanding predictive performance, and the value of determination coefficient is high up to 0.986. An SO2 emission predictive model that hybridizes support vector regression and random forest model is created in [14], predicted results indicated that the proposed model can achieve better performance by combining advantages of multiple individual models.
As compared to conventional machine learning models, deep learning (DL) techniques are more advantageous for capturing complex dynamics in temporal sequences, and thus have been extensively applied in many fields [15] proposed a deep neural model based upon recurrent neural network and variational autoencoder for early detection of faults in a motor [16] integrated deep long short-term memory (LSTM) network with uniform manifold approximation and projection technique for fire status prediction, which is of considerable practical importance since precautionary measures can be taken by miners according to prediction results. A novel deep adversarial transfer learning network is developed in [17] for machine fault diagnosis, where newly occurring faults can be recognized with high accuracy. A deep sequence-to-sequence LSTM model is proposed in [18] to perform a prediction for NOX in the selective catalytic reduction system, and data mining methods are incorporated to select more relevant variables so as to further enhance model performance. A deep GRU based neural model is employed in [19] to capture temporal dependencies in sequences of security alert, and the established model can be further utilized to make a prediction for network intrusion alerts from different sources [20] developed a combined deep learning model, which hybridizes the convolution neural network with LSTM model, to predict variables concerning water quality, and satisfactory prediction performance can be achieved with the proposed model. DL techniques are also initially applied to predict SO2 emissions in an FGD process [21, 22] respectively utilized deep GRU and LSTM to develop a predictive model for the SO2 emission concentration, and desired prediction results were yielded. An integrated SO2 predictive model that hybridizes LSTM with auto-regression features was designed in [23], and its efficacy was verified by historical measurements of the studied FGD process. Based on LSTM network and high volumes of environmental measurements, a comprehensive prediction model that incorporates multiple indices of supervised learning is designed in [24], and experimental results demonstrate its superiority by comparing with other baseline models. In [25], a deep BP network model was optimized by particle swarm algorithm and used for SO2 emission prediction in the flue gas, the developed model shows fast training speed and superior prediction performance. The studied industrial FGD is a typical chemical process with highly nonlinear dynamics and complexity, which pose high requirements for dynamics learning capability of the model. Whereas, it is found that most of the existing studies applied either static machine learning models (e.g. feed-forward neural network, linear regression model) or a standalone deep learning model to predict SO2 emissions for an FGD process, making it difficult to adequately capture process dynamics and achieve outstanding prediction performance. To fill this gap, an advanced deep learning model, TCNN, is well integrated with GRU to perform SO2 emission prediction for an FGD process. The newly proposed TCNN-GRU model is advantageous for characterizing complicated dynamics in an FGD process by combining advantages of two modules, and meanwhile this is the first study to employ mutual information based TCNN-GRU hybrid technique in predicting SO2 emissions for an FGD process. In our hybrid model, TCNN is effective in extracting useful dynamical properties existing in a process, besides it has advantages of stable gradients, low memory requirement for training, fewer parameters, and so on. As regards GRU, it has the powerful ability to capture long-term temporally dependent features within sequences extracted by TCNN. Furthermore, the mutual information technique is introduced to select most relevant features to make prediction for SO2 emission, which can further improve the predictive capability of the model.
In our study, a novel modeling approach is proposed to address the modeling problem in a flue gas desulphurization process. This work skillfully and innovatively combines multiple concepts and techniques together, including SO2 emission prediction, flue gas desulphurization, neural network, deep learning and mutual information. More specifically, the mutual information technique is combined with two deep learning models, the temporal convolution network and gated recurrent unit network, to develop an integrated model that is used for predicting SO2 emissions in a flue gas desulphurization process. This study is structured as follows. The “FGD process description” section provides a comprehensive description of wet FGD process under study. In “Methodology” section, we give a general description of related techniques utilized in our method. The design of proposed modeling framework is detailed in “Proposed modeling approach” section. “Experiments and analysis” tests effectiveness of our approach through experiments designed based upon measurements of a real FGD process. Conclusions are drawn in “Conclusion” section.
Methodology
This section provides a description of FGD process and theoretical fundamentals in the proposed prediction model, which includes mutual information feature selection, TCNN and GRU network.
FGD process description
SO2 is the main atmospheric pollutant which negatively affects both human health and environment, for instance, it can cause respiratory diseases and acidification of crops. The coal-fired power station is the major source of SO2 emissions, and as a consequence emission standards for SO2 has been increasingly strengthened all over the world. Currently, various SO2 removal technologies are present to reduce sulfur dioxide content in flue gas emitted from coal-fired power stations. Among them, the one gained most popularity is wet FGD, whose main advantages lie in three aspects, namely high SO2 removal efficiency, low operating cost and reliability. In wet FGD process, the sulfur bearing gas contacts with limestone (CaCO3) absorbing slurry, and the removal of SO2 is realized by contacting with the absorbent
The reaction product calcium sulphite dihydrate (
Schematic illustration of the wet FGD process.
The FGD system is schematically presented in Fig. 1, where limestone is first pulverized by a ball mill and mixed with water to form the desulfurizing agent in a slurry preparation tank, then the produced limestone slurry is pumped to the storage tank. The desulfurization tower plays an essential role in an FGD process, the boiler exit gas is fed from the lower part of the tower while the limestone absorbent is sprayed from the top by sprayers, and the countercurrent contact of desulfurizer and flue gas can ensure adequacy of the chemical reaction. The last stage concerns the disposal of end chemical product, specifically the gypsum slurry filtrate is dewatered through hydrocyclone and the gypsum (
FGD is a multivariable process where multiple variables are obtained through different types of sensors, and each of which can be regarded as a candidate variable to predict SO2 emissions. In this sense, the dimensionality of feature space becomes prohibitively high, which poses the problem of over-fitting and high computational complexity when developing an SO2 emission predictive model As a consequence, it is essential to eliminate the redundant irrelevant features so as to enhance the prediction performance. Feature relevance criterion plays a significant part in feature selection (FS), and mutual information as an information theoretic criterion has two main advantages. First, it has the capability to measure any type of association between two features, including linear and nonlinear association. Second, MI is kept unchanged with invertible and differentiable transformations in the feature space. In information theory, entropy is a key concept that is used to quantify the information content of a feature, let
with
where
From Eq. (5), the
Preliminary predictor variables used for SO2 emission prediction
Graphical representation of MI based feature selection.
Structurally speaking, the convolutional neural network (CNN) is basically a feedforward network with convolutional layers. A CNN generally incorporates a series of convolutional layers, whose outputs are connected only to local regions of inputs by using multiple filters, CNNs have the capability of effectively extracting underlying patterns exist in inputs and hence find applications in different fields, such as object detection, text classification, etc. Despite the CNN was initially proposed to address computer vision issues, it is also is effective at addressing sequential data. Here TCNN that was first proposed in [27] by Bai et al. is adopted and served as the feature extractor in the proposed hybrid prediction model. The structure of TCNN and CNN are essentially the same, except that 1D causal convolutional layer with dilation is used in place of a regular convolutional layer in TCNNs. Causal convolutions implies the value at time instant
where
Illustration of TCNN with exponentially increasing dilation factor and filter size of length 2.
The GRU was originally introduced by Cho et al. [28], and it shows superior performance in various prediction contexts, e.g., wind speed, Mine Gas Concentration, etc. Structurally speaking, GRU is quite analogous to LSTM in that it also utilizes gating mechanism to regulate the information flow inside the unit, but the input and forget gate are coupled in GRU and there exist merely two gating units, a reset gate and an update gate, within it. As compared to LSTM GRU is equipped with a smaller number of parameters and a less complicated structure, making it has computational advantages. Notwithstanding this fact, the performance of GRU is comparable to that of LSTM. The structure of GRU is depicted in Fig. 4, where
with
The structure diagram of GRU network.
This section is devoted to a description of the hybrid modelling framework for SO2 emission prediction, which is comprised of above-discussed mutual information, TCNN and GRU network. The motivation behind the proposed framework is that each component has its own advantage, to be specific, TCNN model is effective in extracting useful dynamical properties existing in a process, especially for multivariable cases; GRU has the powerful ability to capture long-term temporally dependent features within the extracted sequence. In addition, the introduction of mutual information facilitates the selection of most relevant features to make SO2 emission prediction, which can further improve the predictive capability of the TCNN – GRU model. As presented in Fig. 5, the proposed modelling framework comprises an input layer, a TCNN layer and a GRU layer. The input layer incorporates input variables selection and pre-processing, to be specific, candidate features are evaluated and selected as input variables based on MI metric then pre-processed prior to feeding into the model. The TCNN layer is built up of multiple units, each of which is built up of a dilated convolution layer, a weight normalization layer, a dropout layer and a transfer function (rectified linear unit in chosen in our model), where the weight normalization can speed up the convergence of network while dropout serves to overcome the overfitting problem. The TCNN layer serves as a feature extractor for capturing long/short range relations in sequential features. Then the extracted temporal feature is fed and processed by a series of GRU units in the GRU layer, and the predicted SO2 emission at each time instant is obtained as the output of GRU layer. TCNN has a powerful ability of extracting temporal features, and meanwhile GRU network is advantageous for memorizing long-term information, providing the integrated model with the capability to satisfactorily identify the system with severely nonlinear dynamics and large time delay, and thus particularly suitable in handling the modelling problem for the FGD process studied in this research. From the above analysis, it is seen that the proposed MI-TCNN-GRU predictive framework is an ensemble of three types of techniques, and is designed based on the flow direction of data. From the direction of data flow perspective, the data processing in MI-TCNN-GRU is performed in four stages.
Data acquisition: sensors deployed in the FGD process is employed to record measurements of various process variables in real time, which are then transmitted to the build-in management information system, and stored historical data for corresponding variables can be extracted if necessary Model inputs selection: select candidate variables that may possibly be employed to characterize the variable to be predicted, and this process is mainly on the basis of practical experience and theoretical knowledge. Data preprocessing: preprocess raw measurements of selected variables, which incorporates outlier detection, missing data imputation, data normalization, which is beneficial for the subsequent model development process. Feature selection and elimination: the MI metric is employed to evaluate the degree of relevance between two candidate variables according to Eqs (3)–(5) First select variables with high MI scores with respect to the predicted variable, which can be regarded as variables of high relevance for our prediction task As for selected variables, search for ones with approximate MI scores that imply a close relation exists among them, in this case some of them are redundant variables and hence should be eliminated to reduce the computational complexity Model development: the well-trained TCNN-GRU model is obtained with the use of preprocessed data obtained in the last stage Model validation: Evaluate the established model using performance metrics like root mean square error (RMSE) R squared (R2) and mean absolute percent error (MAPE).
The structure of MI based TCNN-GRU SO2 emission prediction framework.
The computational complexity for the integrated model proposed in our manuscript is determined by three parts: 1) calculation of mutual information for feature selection, 2) temporal convolution neural network and 3) gated recurrent unit. The mutual information, as an effective statistic for measuring the degree of relevance between features, the time complexity is found to be
This part is devoted to verifying the efficacy of suggested SO2 emission prediction model, and a real FGD process is selected as the case to be investigated. Based upon historical FGD operation measurements, experiments are designed to compare our method with other popular methods appeared in previous studies. All experiments run on a personal computer with Microsoft Windows 10.0 environment, 32 GB RAM, and an Intel Core i7 – 11700H, 2.50 GHz base frequency.
Data description
Our study uses operating data for an actual FGD system of a 1000MW coal-fired power plant in Hebei Province, China, and a total of 12962 samples used for model training and evaluation were obtained through the FGD information management system. The photograph of the FGD tower under study is presented in Fig. 6. The measurements span over a week (from 03/12/2022–03/19/2022) and are proportionally divided into three portions, 70% were used as the training set, while the rest are equally separated into the validation set and test set. Each type of subset has its unique role, where the training set is employed for parameter estimation, the validation set finds uses in early-stop training and searching for optimal hyper-parameters, and the generalization performance of each experimental model is evaluated through the test set.
Feature selection
As mentioned earlier, a total of 10 candidate variables other than the predicted SO2 emissions are taken into consideration. To select variables that are best suited to perform SO2 emissions prediction, the MI score between each variable pair is calculated and results are presented in Fig. 7.
It is observed in Fig. 7 that MI scores between SCO and three predictor variables (i.e., SO2 concentration in inlet flue gas (SCI), generating units power (GUP) and temperature of inlet flue gas (TIF) are greater than one, where the score of SCI is the largest and high up to 1.67, implying it is most relevant for SO2 emission prediction. On the other hand, scores between SCO and GUP/TIF are closely approximated, and meanwhile the MI score between GUP and TIF reaches 1.93, which suggests there exists a close relationship between two variables, and only one variable is needed to participate in the SO2 emission prediction task. Given that the MI score between GUP is slightly higher than TIF, we select SO2 concentration in inlet flue gas and generating units power as finally chosen. In addition, from the constitution of process dynamics, the current output of process also correlates with its past observations. In view of the above, the model inputs are determined as SCO, GUP and SCI. Table 2 presents the statistical information of variables that are selected to perform SO2 emission prediction, where S.D, Min and Max are respectively standard deviation, minimum and maximum of the variable.
Main statistical characteristics of variables used for SO2 emission prediction
Main statistical characteristics of variables used for SO2 emission prediction
The photograph of studied FGD tower.
MI scores between candidate predictor variables.
Prior to the modeling stage, extra preprocessing processes are required to be performed. First, the simple but effective Pauta Criterion is employed for outlier elimination. Specifically, if absolute value for the deviation between a certain measurement and average is three times larger than the standard deviation, then the sample is identified as an outlier and should be removed, which is mathematically formulated as
in which
where
This section is about details involved in our experiment. In our approach, learnable parameters in the network (i.e., weights and biases) are identified through the adaptive moment estimation optimizer (Adam), whose major strengths are low computational cost and memory requirements and hence is particularly suited for deep learning. The optimization objective is to minimize the mean square error cost criterion, the mini-batch technique is adopted and the batch size is selected as 64, besides the learning gain and epoch number are respectively specified as 0.001 and 50. To prevent the over-fitting problem, the early stopping technique is employed and network training is terminated when there is no enhancement of performance on the validation set for 6 successive epochs. Further, we compare the proposed TCNN-GRU model with popular methods in the field of SO2 emission prediction for FGD, they are: recurrent neural network (RNN), multi-layer perceptron (MLP), GRU neural network and TCNN. To quantify the prediction performance for SO2 emission in the FGD process R squared (
in which
Hyperparameters have a considerable effect on model performance, the optimal combination of hyperparameters tends to yield the bestperforming model. Considering many hyperparameters are often involved in a prediction model, optimal hyper-parameters should be obtained by means of suitable methods instead of searching them manually. Grid search and random search are two mostly applied techniques for hyper-parameter optimization [35], the two methods are differentiated in that the former makes an exhaustive search over the target hyper-parameter space while parameter search is carried out in a random fashion in the latter method. As the grid search technique is a brute force algorithm, it is better suited for low dimensional hyperparameter spaces; while the random search is more effective for hyperparameter spaces of higher dimensions. For all experimental models under study, RNN and MLP are relatively simple ones which have fewer hyper-parameters while GRU, TCNN and proposed GRU-TCNN model possess more hyper-parameters. In this sense, the grid search serves to choose best hyper-parameters for RNN and MLP model, and random search with preset 50 iterations is performed for the rest. With some preliminary runs, rough boundaries for key hyper-parameters in each experimental model can be obtained, searching parameters are then determined by setting proper intervals. Table 3 summarizes searching parameters and optimal values of key hyper-parameters for each experimental model.
Main statistical characteristics of variables used for SO2 emission prediction
Performance index comparison of models on test set
Performance index comparison of models on test set
Prediction results for outlet SO2 concentration with different models. Red line: predicted values. Blue line: real values. Ordinate: outlet SO2 concentration (mg/m3).
In this part, each prediction model with optimal parameter configuration is first used to predict the SO2 emissions for the studied FGD process, then prediction results are compared and analyzed to demonstrate the efficiency of our newly developed modeling approach. Apart from popular SO2 emission prediction model, We evaluate the performance of proposed method versus current state-of-the-art SO2 emission prediction models, they are: LSTM network with attention mechanism (AM) in [22] (succinctly written as AM-LSTM in what follows), dynamic deep neural network (DDNN) model in [11] and particle swarm optimization (PSO) algorithm based BP neural network in [25] (shortly, PSO-BPNN). The hyperparameters of comparison models are set as recommended in original studies and some of key parameters are presented below. AM-LSTM: Adam optimizer is adopted, four layers in LSTM are respectively used with 256, 256, 128, 64 units, the dropout rate and learning rate are equal to 0.4 and 0.001, respectively; DNN: the time delay is set equal to 5, three hidden layers in network are respectively employed with 128, 64 and 32 neurons, and Adam optimizer is applied during the model training process. PSO-BP: particle size is 80, learning factors are respectively specified as 1.2 and 1.5, iteration number is 100; two hidden layers with 11 and 23 neurons respectively and Adam optimizer is applied during the model training process. The prediction results of all experimental models are given in Table 4 and Fig. 8 below. As seen from Table 4, it is found that MLP, as a classical nonlinear model, achieves highest scores on MAPE and RMSE while the lowest score on R2 (R2 = 0.6540, RMSE
This study deals with the prediction problem of SO2 emission for a flue gas desulphurization process. An MI based TCNN-GRU model is innovatively developed to perform a prediction for SO2 emissions in an industrial FGD process, which has characteristics of highly nonlinear dynamics and large time delay. Benefiting from TCNN’s superior feature extracting ability and remarkable long range dependency learning ability of a GRU model, the proposed hybrid model achieves best prediction performance among models that are popular and state-of-the-art in the SO2 emission prediction field, and values of RMSE, MAPE and R2 reach 0.1490, 0.0201 and 0.9706, respectively. The proposed integrated model has structural flexibility and outstanding performance, which demonstrates an extensive application prospect in similar scenarios. Although the proposed approach achieved excellent performance in predicting SO2 emissions for a flue gas desulfurization process, the methods also present some potential limitations, such as difficulties in determining appropriate values of associated hyper-parameters in a rapid but effective method, heavy computational burden of handling multivariate sequences, and so on. In the future, we will extend our idea to other AI tasks, such as visible-infrared person re-identification [36], 3D mesh analysis [37] and pose transfer [38], and so on.
Footnotes
Acknowledgments
This study is supported by the National Natural Science Foundation of China, Grant number 62373012 and 62303025.
