Abstract
The development of advanced technologies in variety of domains such as health care, sensor measurements, intrusion detection, motion capture, environment monitoring have directed to the emergence of large scale time stamped data that varies over time. These data are influenced by complexities such as missing values, multivariate attributes, time-stamped features. The objective of the paper is to construct temporal classification framework using stacked Gated Recurrent Unit (S-GRU) for predicting ozone level. Ozone level prediction plays a vital role for accomplishing healthy living environment. Temporal missing value imputation and temporal classification are two functions performed by the proposed system. In temporal missing value imputation, the temporal correlated k-nearest neighbors (TCO-KNN) approach is presented to address missing values. Using attribute dependency based KNN, the nearest significant set is identified for each missing value. The missing values are imputed using the mean values from the determined closest significant set. In temporal classification, the classification model is build using stacked gated recurrent unit (S-GRU). The performance of the proposed framework investigated using ozone multivariate temporal data sets shows improvement in classification accuracy compared to other state of art methods.
Keywords
Introduction
A time series is specified as a sequential set of data points, which is measured typically over successive time intervals [1]. This data is captured over certain time periods or intervals, such as a day, a year, or a month. The time series data exhibits characteristics such as: large size, high dimensional and continuous updating of values. In addition, time series data, being numerical and continuous in nature, is considered as a whole rather than as precise numerical field.
Many real world data sets that are multivariate and time stamped challenges the process of data mining and need some special preprocessing procedures before mining. This is because in such raw temporal data, there can be some situations where a particular record is unavailable owing to a variety of factors, including flawed data, incomplete, noisy, inconsistent data extraction, or failure to load the data. As a result, such unprocessed data in the training data set can reduce the accuracy of a model or can lead to a biased model because of the lack of behaviour and relationship analysis with other variables. It can lead to wrong prediction or classification.
Data cleaning is the phenomenon of routines attempting to handle missing values, smooth out noise, identify outliers, and correct inconsistencies in the data [2]. Moreover, this paper focuses on the processing of missing values in such raw data. For example, in a social survey, residents of a given area may refuse to answer certain questions, such as their phone number; in an industrial experiment, some results may be missing due to mechanical failures while collecting data; and in medical databases, every patient record may be missing some values because the patient may not have undergone laboratory tests, resulting in missing values in such data sets. These missing values without any preprocessing will have an impact in data analysis. As mentioned in [3], the strength of the pattern determines the type of missingness for every data collection, and missingness is categorized as follows: Missing completely at random (MCAR), Missing at random (MAR), and Missing not at random (MNAR). Missing data in MCAR has no consequence on the other information in the data set. Missing at random is a feature of MAR that is based on the other data. Data is missing in MNAR for unspecified reasons, and it is unavoidable missing data.
This paper focuses on two functionalities: Missing value imputation for multivariate time series data sets and temporal classification model construction for ozone level detection using stacked Gated Recurrent unit (S-GRU). The proposed temporal correlated K-nearest neighbors (TCO-KNN) approach utilizes a correlation matrix and attribute dependencies to impute missing data. The framework performs the classification procedure after the missing value imputation step. To efficiently classify multivariate time series data for ozone level prediction, the classification technique is based on deep learning network model, namely stacked gated recurrent unit (S-GRU).
The significance of handling missing value has been addressed in several studies Littel et al [4], Enders et al [5], Kulanuwat et al [6]. The presence of missing values in data can have an impact on the effectiveness of knowledge discovery and decision making process. This paper focus on handling missing values in the multivariate temporal data. Though there are a number of strategies for dealing with missing values in uni-variate time series, such as rough set tolerance [19], generative adversarial networks [24], k-nearest neighbors [23], deep neural networks [20] and so on. Treating missing values in multivariate temporal data remains difficult, because time series are often a collection of continuous values assessed against a certain time interval and temporal patterns in multivariate time series. This paper proposes temporal correlated k-nearest neighbors (TCO-KNN) to handle the missing value by developing correlation matrix and attribute dependency set.
Xu et al [7] presented deep neural network model for intrusion detection. The developed model integrates Gated recurrent unit (GRU) with Multi-Layer Perceptron (MLP) to identify network intrusions. GRU is the simplification and improvement of LSTM with two gates namely reset gate and update gate and it was proposed by [8]. GRU are capable to store and filter the data by means of its gating mechanisms. GRU eliminates the vanishing gradient problem since it retains the new input that arrives every single time and preserves the suitable data and transfers it to subsequent time stamp. GRU outperforms LSTM, as it uses fewer training parameters, smaller amount of memory by this way it trains the long temporal sequential data sets faster with less computational cost. Further, stacked Gated Recurrent recurrent unit (S-GRU) is used in this work to detect ozone level from multivariate temporal data. The (S-GRU) is composed of several GRU Units to improve the accuracy in predicting the Ozone level.
In this paper, the proposed framework is experimented with ground level ozone data set which contains multivariate temporal data with missing values [9]. Ground-level ozone is a result of chemical reactions caused by the emission of nitrogen oxide (NOX) and volatile organic compounds just above the earth’s surface (VOCs). The NOX and VOCs are emitted from natural and man-made sources namely cars, power plants, refineries, paints, pesticides, etc. Ground level ozone exposure might proliferate the risk of heart disease, Lung disease, etc. In addition to its effect on human beings, ozone at ground level harms animals, agricultural crops as well. As a result, ozone level detection in the atmosphere is essential to safeguard the environment from ozone related impairment and sustaining high air quality [10–12].
The existing approaches in the literature to discover the ozone level are discussed as follows: Regression models have been explored for forecasting Ozone concentration at different locations [13]. Later, Non Linear methods namely Fuzzy systems, Neural Network, Bayesian Network have been developed to predict the ozone levels [14–16]. Deep learning-based techniques for detecting anomalies in Ozone measurements have recently emerged [17]. In this paper, the ozone level is predicted using stacked gated recurrent unit (S-GRU). The data sets used to train the S-GRU network were collected from three different regions namely Houstan, Galveston and Brazoria. The characteristics of the data are multivariate, time dependent and liable to missing values. In order to improve the model accuracy of ozone level detection, the missing values are imputed using proposed temporal correlated k-nearest neighbors (TCO-KNN). The proposed TCO-KNN method performs two functionalities namely dependency analysis and identifies temporal closest nearest neighbors to impute missing values.
Contribution of this paper is defined as follows
This paper presents two modules namely temporal missing value imputation and temporal classification for ozone level detection from multivariate time stamped data sets. In temporal missing value imputation module, the proposed temporal correlated k-nearest neighbors (TCO-KNN) handles the missing value imputation process for multivariate time series data. To impute missing values, the TCO-KNN constructs a temporal nearest neighbors set using attribute dependency based KNN. The mean value of the identified temporal nearest points is used to impute missing values for missing data. In temporal classification, the stacked gated recurrent unit (S-GRU) is used to build a temporal classification framework for detecting ozone levels from multivariate time stamped data sets
The rest of the paper is arranged as follows. Section 2 discusses the works related to missing value imputation and classification. The proposed work is described in section 3. Section 4 provides experimental results and discussions. Conclusion and future work are mentioned in section 5.
Related works
This section describes the work related to proposed methods which include missing data imputation and classification of multivariate time series data.
Shi et al. [18] has employed temporal dynamic matrix factorization technique for missing data prediction in co evolving time series. The author says that the initial models are built by fusing the smoothness characteristics of each time series and correlation information across multiple source. Further, batch processing and fine-tuning strategies are incorporated in the proposed method to ensure the effectiveness and efficiency of predicting the missing values. The author has experimented the proposed method on real-world data sets and synthetic data sets.
Jane et al. [19] have constructed a tolerance rough set induced bio-statistical (TRiBS) framework to impute missing values in an unevenly spaced clinical time series data. The author says that the proposed framework handles the missing values by using the concept of TR and PSO technique.The proposed framework is experimented on clinical time series data of hepatitis and thrombosis patients.The experimental results shows that the proposed framework has reduced error rate compared to other imputation techniques.
Yuan et al. [20] imputed the missing values in time series data with LSTM networks. For experimentation, public air quality monitoring stations and personal air quality monitoring system data set were used. In this paper, to improve the concentration prediction accuracy of particulate matter with diameters less than 2.5 microns (PM2.5) LSTM neural network was used. For the performance evaluation criteria, Root Mean Square Error (RMSE) and prediction accuracy were considered. Therefore, missing value imputation with LSTM network provides better accuracy with the residues of prediction and observations.
Lobato et al. [21] used the Genetic Algorithm Imputation (GAI) for imputing the missing values. The benchmark data set from the UCI repository which is a multivariate time series data with missing values was used for result analysis. The framework treats mixed-attributes data sets properly with classification methods are rule induction learning, approximate models and lazy learning. Moreover, better classification accuracy obtained using three algorithms. The missing data imputation is done separately for numerical attributes in which mutation is based on Gaussian distribution where mean is an actual index of genotype and for categorical attributes, mutation is random change from solution pool. The experimental results are obtained significantly superior to the other missing data treatment methods. The results of the paper show that very high performance in GAI, moreover, method proved to be suitable for filling missing values in mixed-attribute data sets.
Tran et al. [22] have proposed Genetic Programming Multiple Imputation (GPMI) method to impute missing values. For experimentation, eight benchmark data sets obtained from the machine learning data set repository was used. The proposed GPMI with regression based imputation focuses on missing value imputation in the independent variables similar to the dependent variables. The performance analysis of the framework has been evaluated using the prediction accuracy and classification accuracy. The paper compares GPMI with other kinds of missing data such as missing at random data.
Xu et al. [23] have described a missing data imputation in the spatial-temporal sensor data. The proposed modules are nearest neighbors based heuristic search and tensor singular value Decomposition (t-SVD) method. The methodology focused on improving efficiency and accuracy in sensor data set to achieve high performance by recovering the missing data. The performance analysis of the framework was evaluated using standard Mean Absolute Error (MAE) and Mean Relative Error (MRE). The robust model can be used to tackle the noisy observations.
Ying Zing et al. [24] used generative adversarial networks to impute missing values in multivariate time series data (GANs). The GANs are made up of three networks: a generator network, an encoder network, and a discriminator network. The author tested the presented GANs on 4000 ICU patients’ digitized medical information. The suggested GANs method beats the original methods by imputing missing values utilizing an efficient and stable generation process, according to the author.
Ju Ma et al. [25] imputed missing valuses in time series data using transferred long short term memory based iterative estimation (TLSTM-IE) for air pollution detection. The proposed method was tested using 11 monitoring sites in New York City by the author. When compared to alternative imputation strategies, the author claims that the TLASM-IE model has excellent generalization capability for long interval consecutive missing value segments.
Lu et al. [26] used an extreme learning machine auto-encoder (ELM-AE) method to impute missing data in time series data sets like seeds data set, IRIS data set, satimage data set, fertility data set, bupa data set, cmc data set and pima data set. To fill in the missing data, the suggested framework selects random plausible values and then takes the average of those reasonable values. According to the results of the experiment, imputation of missing data using the ELM-AE methodology produces more accurate predictions than other methods. The generalized mean absolute deviation (GMAD) and clustering purity, which is used to infer the nearest value in lacking data, were the main evaluation criteria. The intention of the paper was to work without complete data.
The following studies address DNN, GRU, LSTM, PSO, Recurrent Neural Network (RNN), Optimal Bayesian Classification, and other classification approaches.
Lu et al. [27] used Cross-correlation Analysis to detect outliers in real-world time series data sets like population data set, receiver data set, climate data set and house condition data set. The proposed framework involves three steps as follows, first undertakes the data preprocessing then converting the high-dimensional data into a cross-correlation function in order to reduce the dimension, and finally detects the outlier with the help of otsu’s method. The results show that Outlier Detection method based on Cross-correlation Analysis (ODCA) algorithm works well for small data sets as well as large data sets.
Harrou et al. [28] presented deep learning based strategy for detecting anomalies in unlabeled ozone measurements. The author integrates the deep belief networks (DBN) with one class support vector machine (OCSVM) to improve the ozone monitoring. The developed unsupervised anomaly detection strategy is experimented with real data from Isere in France and the efficiency of the developed algorithm is evaluated for monitoring ozone measurements.
Boro and Bhattacharyya [29] used KNN with PSO for ensemble classification process. The proposed PSOKNN method finds the K training samples from the training set for a given test sample. The author demonstrated the proposed PSOKNN method over benchmark and real life data sets and the performance was compared with other ensemble methods in terms of classification accuracy and computational time.
Karthikram et al [30] presented Bidirectional GRU (Bi-GRU) to identify neurodegenerative illness using patient gait data. The suggested Bi-GRU was tested on three patients with Parkinson’s disease, Huntington’s disease, and Alzheimer’s disease. When compared to existing techniques, the performance of the suggested Bi-GRU predicts disease with more accuracy.
Min Xia et al. [31] used stacked GRU-RNN to estimate renewable energy for smart grid efficiency. The author tested the GRU-RNN model with two separate experiments: wind energy prediction using weather parameters and electricity load prediction using historical load data. According to the author, the suggested model surpasses state-of-the-art deep learning algorithms in wind energy prediction and electricity prediction by obtaining an effective accuracy rate.
Pen sun et al. [32] built a selected stacked gated recurrent unit (SSGRU) for multi road traffic prediction. For experimentation, the author used england traffic flow statistics, which included 15-minute time period traffic flow data for all Wednesdays in 2016. When compared to traditional GRU models, the experimental findings reveal that the SSGRU achieves a greater accuracy rate in traffic flow prediction.
Jane et al [33] presented time delay neural network to diagnose gait disruption in Parkinson disease. To train the time delay neural network, the author used a Q back propagated training technique. The given Q-BTDNN constructs a temporal classification model for clinical decision-making prediction. For the purposes of this investigation, three data sets from Parkinson’s disease studies were used. True positive rate, true negative rate, recognition rate, misclassification rate, and precision were used to evaluate the experiment findings.
The proposed work distinguishes from the works addressed in the literature in the following ways: Most of the existing missing value imputation techniques works effectively with univariate time series and only very few techniques focuses on multivariate temporal data-sets. Imputing missing values in multivariate temporal data still remains challenging area of research, because of temporal patterns in multivariate time series, Continuous observation of time intervals and these data are liable to outliers, missing values, higher dimensional and so on. In this paper, temporal correlated k-nearest neighbors (TCO-KNN) framework is proposed to impute the missing values for multivariate time series data. The imputed missing value data sets are then used to identify ozone levels with a stacked gated recurrent unit (S-GRU).
Proposed work
The data sets, methodologies, and architecture of the proposed framework, as well as the key functionalities of its components, are outlined in this section.
Architecture diagram
Figure 1 depicts the proposed framework’s overall architecture. Temporal missing value imputation and temporal classification are the two components that make up the framework. In temporal missing value imputation, the TCO-KNN produces nearest neighbors sets utilising attribute dependency based KNN. The missing value is imputed by taking the mean of the values of closest significant set. In temporal classification, the stacked GRU (S-GRU) classifier is used to build a temporal classification model to detect ozone levels from multivariate temporal data. The proposed framework’s performance is measured in terms of precision, recall and f1-score.

Workflow of the Proposed Framework.
The data sets used in this paper for experimentation were obtained from the UCI repository. Two ground ozone level data sets are available in UCI repository [34]. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). These data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area. For experimentation, the proposed work uses eight hour peak data set which contains 73 attributes. As reported in Table 1, all parameters initialised with T represent temperature recorded at various times throughout the day (T0...T23) and WSR represent wind speed at various times (WSR0...WSR23).
Attributes Descriptions of the Datasets
Attributes Descriptions of the Datasets
The methodology and approaches employed in the proposed work are described in this section. Data preprocessing and classification are among the features.
Data preprocessing: temporal missing value imputation
This module performs the following two functionalities namely Dependency Analysis and Temporal Missing value imputation. The time series data are often influenced by complexities such as incomplete, inaccurate or inconsistent, prone to missing values and so on. Such data will result in erroneous conclusion. As a result, data preprocessing is essential for developing a better model and enhancing the accuracy of the model. Data preprocessing can be done in various levels such as data cleaning, data transformation, data reduction and data integration. The proposed framework seeks to replace missing values in the data set in order to improve ozone level forecast classification accuracy. In a data set, there are three types of missingness: MCAR, MAR, and MNAR.
This work considers the data that is missing at random (MAR). As a result, the imputation can be done using the non-missing records that have been recorded. Dependency analysis and temporal missing value imputation are the two preprocessing phases in this module. Out of various strategies that exists for dealing with missing values, this work uses correlation technique to determine the dependencies between attributes of the time series data set.
Correlation is specified as a data technique that indicates the strength of association between variables. It expresses the strength of relationship using the value of the correlation coefficient which ranges from +1 to -1, where 1 represents a perfect degree of association between variables. As the value of correlation coefficient moves near 0, the dependency of the two variables tends to be weak. The trend of the association is specified by the sign of the correlation coefficient. The (+) sign represents a positive dependency and (-) sign represents a negative dependency. Based on the correlation matrix obtained, this module finds the dependency variables. There are various metrics for correlation coefficients.
Pearson correlation is the most often used correlation metric to measure the degree of the relationship between linearly correlated variables. The below mentioned equation (1) calculates the Pearson correlation value:
Spearman correlation is another metric which is based on the ranking technique and measures the strength of dependency of the two attributes. The Spearman rank correlation value is calculated using the equation (2),
The correlation matrix Q of M variables (x1, x2,...,xM) is an M x M matrix, where each entry of the matrix is defined using equation (3):
At the end of processing of this module the framework obtains the highly dependent attributes by eliminating the low or non dependent attributes of the data set. In this way all of the dependent attributes for each attribute in the data set are discovered. The correlation methodology arises as a prominent method to detect the dependencies between the features because the time series is typically a collection of continuous values evaluated against a certain time interval. The operation of dependent attribute set analysis are presented in Algorithm 1. The dependent attribute set of all attributes is determined in this method. The next step is to find the nearest significant set for imputing the missing value using k-nearest neighbors (KNN). KNN can predict the missing values in both discrete and continuous attributes based on distance metric using the mean. The distance metric for continuous data and categorical data is specified as follows, Continuous Data: For continuous data, distance metrics like Euclidean, Manhattan and Cosine are often used. Categorical Data: For categorical data, hamming distance is used. This metric is calculated by considering the categorical attributes and for every record, add one if the value is different between two points. The metric is t equal to the number of points for which the value was not same.
The distance d between two data points X (a1, b1) and Y (a2, b2) can be found using the equation (4),
Load the non missing time series data Y Calculate the correlation matrix Q using the equation 3 R = //empty set For each attribute i in A For every other attribute j in A Find the value of Q[i,j] from Q If (Q[i,j] ≥ -1 and Q[i,j] < -0.1) or (Q[i,j] > 0.1 and Q[i,j] ≤ 1) then R = R ∪ Aj end if End for AD(i) = R End for return AD
Following dependency analysis, this framework uses KNN to impute missing entries. The framework does a dependency search to pass the dependent attribute set of the missing attribute, which is really provided as the significant set for imputing missing value. With this, the framework imputes the missing value by using the mean values of dependence variable records from the closest significant set. The missing value imputation module is explained in Algorithm 2.
The working of missing value imputation module is illustrated in Figure 2 using sample data sets. Consider a sample data set ‘A’ which consists of 7 attributes namely A1, A2, A3, A4, A5, A6, A7. The first missing value of the data set which is present in the second row indicated by ‘?’. The module fetches AD = A1, A7 and determines the nearest significant set as seen in Figure 2. The nearest significant set is calculated using Euclidean measure (assuming k = 3) once the significant set has been determined. The imputed value is the mean value of the obtained nearest significant set.

Illustration of missing value imputation process.
For each missing value NaN in the data set X a = A { NaN } //Get attribute of missing value Get AD(a) // Dependency attributes of missing value attribute Find NS(a) by computing k-nearest neighbors using the equation (4) For each value of a in NS(a) m = size(NS(a))
Put NaN = mean End for return X //processed data set with no missing values
As a result, at the end of this module, all missing values in the data set are imputed with optimal values, and the data set is ready for efficient classification. Table 2 represents the imputed missing values of the sample data set. The proposed temporal correlated k-nearest neighbors (TCO-KNN) imputes missing value in multivariate temporal data for ozone level prediction by determining dependent attribute set and temporal closest nearest set. This temporal closest points are used to impute the missing values.
Filled Values of the Sample Data Sets
Classification is the task of building a model by training them on observed data to describe classes for future prediction. In this paper, classification model is constructed for the pre-processed multivariate temporal data sets. Though, there are varieties of strategies available for uni-variate time series classification which includes regression models, dynamic Bayesian network, and many more. Multivariate time series data classification is complex in nature because it involves temporal patterns in data and continuous observation of time intervals. Building temporal classification models using deep learning techniques is efficient because deep neural networks are capable of processing and modelling non-linearly separable data. Using the stacked gated recurrent unit (S-GRU), this work creates a classification model for the processed multivariate time series data.
Recurrent Neural Networks (RNN) is a special type of neural network that is well suited for modelling time series data as it incorporates memory cell to neural network. RNN is effective in time series prediction because it remembers all of the data across time. RNN are prone to be affected by vanishing gradient problem. This issue is addressed using Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU). To avoid Vanishing gradient problem, LSTM is comprised of memory cell and gating functions. There are three gates in LSTM network namely forget gate, input gate and output gate. The forget gate is responsible for removing information from the cell state. The input for the LSTM is fed through input gate. The output gate impacts the output at the current time stamp [26].
Cho et al. 2014 [35] proposed a modification to LSTM called GRU that solves vanishing gradient problem using update gate and reset gate. GRU are capable of capturing time series dependencies for large timestamp distance. Figure 3 depicts the structure of the GRU. Compared to LSTM, the GRU has fewer gates. This is because the GRU has no cell state and combines the input and forget gates into a single gate, called the update gate Z t . Hence, the GRU is much simpler than the LSTM in its structure and has fewer parameters, which results in great advantage in terms of performance and convergence.

Structure of Gated Recurrent Unit (GRU).
Stacked GRU is also kind of RNN with several GRU units and the principal of stacked GRU is similar to simple GRU. In this work, the stacked GRU does the temporal model construction for ozone level detection. The stacked GRU is comprised of GRU chain with two hidden layers of 73 neurons as shown in figure 4. The input to the stacked GRU is multivariate attributes with time stamped data taken from three separate regions: Houstan, Galveston, and Brasoria. Each GRU units is followed by a fully connected layer with a linear activation function for ozone level detection. The gates of the each GRU units is computed as follows,

Temporal Classification using Stacked Gated Recurrent Unit (S-GRU) for ozone Level Detection.
The update gate Z t for time step t and the reset gate r t for time stamp t is computed using the equation (5) and (6):
x t is the input state at current time stamp t, h(t - 1) is the hidden state value of previous output of the GRU at time stamp (t-1), W z and W r is the feed forward connections, σ is the sigmoid function that converts the data to value in the range of 0 1.
The update and reset gates of the stacked GRU (S-GRU) assist the model in deciding which past data should be deleted and which new data should be sent on to the future in order to forecast the ozone level. The current memory state
Where tanh is the hyperbolic tangent function and * denotes element wise multiplication. The tanh function scales the data to the range of -1 to 1. Finally the output gate of the stacked GRU is computed using the equation (8)
GRU Network receives pre-processed multivariate temporal data sets as input. Compute the state of update gate vector Z
t
at time stamp (t) is computed using equation (5). Compute the state of reset gate vector r
t
at time stamp (t) using equation (6). Compute the state of current memory using equation (7). Compute the state of Output gate using the equation (8). Performance of the Classification model is evaluated and validated.
In this proposed framework, the stacked GRU classifier is used for constructing temporal classification model as described in Algorithm 3. Stacking the several GRU units has strong ability to extract features from multivariate time stamped data for ozone level detection. The performance of the classification model is evaluated in terms of classification accuracy and validation loss.
This section discusses the detailed experimental setup and the results obtained when the proposed framework is subjected to the experimentation.
The proposed framework is tested using a data collection of ground ozone levels, which contains approximately 2535 occurrences of 73 attributes. The data set has a missingness rate of roughly ten percent. Missing value imputation and classification are two functions performed by the proposed system. To execute missing value imputation, a dependency attribute set is derived for each attribute and stored in a 2D attribute dependency set. Figure 5 depicts the reduced dependent attribute set, which includes all Wind Speed (WSR) attributes of the data set. Figure 6 describes the reduced dependent attribute set which incorporates all Temperature (T) attributes of the data set. Figure 7 describes the reduced dependent attribute set, which includes all other attributes of the data set.

Reduced Dependency Attribute Set Size of Wind Speed (WSR)Attributes.

Reduced Dependency Attribute Set Size of Temperature (T) Attributes.

Reduced Dependency Attribute Set Size of All Other Attributes
The correlation matrix is a two-dimensional grid that uses the coefficient value to show the relationship between the attributes. The heat map of the seaborn module is being used to visualise the correlation matrix. Figure 8 demonstrates the ozone data set’s correlation matrix, with deep red hues indicating a strong link and white shaded blue hues indicating a weak relationship.

Correlation Matrix Plot.
For each missing value in the data set, the presented TCO-KNN approach employs the developed dependency attribute set in the missing value imputation processes. There are totally 14937 missing values in the data set. For each missing value nearest significant set is computed based on its attribute dependent set. The mean of the missing value record in the nearest significant set is replaced with the missing value. Table 3 shows the imputed result for the days (05/05/1998) to (06/01/1998).
Records of the Data Sets with Missing Value Imputed for the Days (05-05-1998, 05-08-1998, 25-05-1998, 27-05-1998, 06-01-1998)
*indicates the values that are imputed by the missing value imputation.
On successful imputation of missing values, the temporal classification model for processed multivariate time series data is constructed using stacked GRU (S-GRU). The stacked GRU (S-GRU) is comprised of two hidden layers of GRU chain with 73 cells, the cells or neurons corresponds to the number of multivariate time stamped inputs. Each stacked GRU units is followed by fully connected layer consisting of 73 neurons with linear activation function. The classification model is constructed by training 2193 samples of multivariate temporal data sets for 200 epochs. The classification model is evaluated using 709 test samples. The layers and the parameters of the stacked GRU (S-GRU) are listed in table 4.
The root mean squared error value achieved during validation is represented by equation (9). The result implies that the accuracy continues to rise, indicating the constructed model is getting stabilized on the phase of increase in training epochs. The purpose of the proposed framework can be understood by comparing the obtained results with the model built without missing value imputation as seen in figure 9(a). There would be insufficient samples for validation if the rows with missing data were dropped abruptly. As a result, the model is constructed by simply substituting 0 for missing data. The loss acquired when validating the GRU model with missing data not imputed (raw data) is shown in figure 9(a).
Figure 9(b) depicts the Root Mean Square Error (RMSE) value obtained on validating S-GRU model. RMSE is the measure of the difference between values predicted by the build classification model and the values actually present in the data sets. The graph in (Figure 9(b)) shows that as the validation progresses and the number of epochs increases, the loss acquired on validation reduces.

Loss Graphs Obtained by Evaluating S-GRU Models Built: (a) Without Imputation (Raw Data). (b) With TCO-KNN Missing Value Imputation Data.
The proposed TCO-KNN missing value imputation strategies are compared to other missing value techniques namely Extreme Learning Machine Auto-Encoder (ELM-AE), Normal KNN and mean imputation as well as the presented stacked GRU classifier is compared to LSTM, RNN, ANN, and TDNN network learning techniques. Table 5 illustrates the results of various classifiers for identifying ozone levels using various learning algorithms and imputation procedures. Precision, recall, and F1-score are used to assess the performance of temporal classification models utilizing various imputation strategies. The equations (10), (11), and (12) are used to calculate the Precision, Recall, and F1-Score.
Stacked Gated Recurrent Unit (S-GRU) Model HyperParameters
Comparison of the Proposed Framework’s Performance with Different Learning Techniques and Missing Value Imputation Strategies
Precision is a metric that reflects the quality of a classifier’s performance based on false positives by specifying the proportion of true predictions. Recall is a metric that reflects the quality of a classifier’s performance based on false negatives by indicating the proportion of predictions that were missed. The precision and recall as a joint metric are specified by the F1 score. It is the harmonic mean of the two metrics that represents the overall quality of the classifier’s performance. When compared to alternative learning techniques such as LSTM, RNN, ANN, and TDNN, the experimental results reveal that the proposed framework performs well in terms of precision, recall and f1-score for predicting ozone levels.
The proposed method and its achievements are discussed in this section.
Missing values in multivariate temporal data have an impact on classification accuracy, which affects the effectiveness of the knowledge discovery process. Because temporal data has continuous observation of values, temporal patterns in multivariate time series data, high dimension, and so on, imputing missing value in multivariate time stamped data is a difficult task. The TCO-KNN presented in this paper generates temporal closest nearest neighbors employing attribute dependency based KNN to impute missing values in multivariate temporal data. The missing data is imputed using the determined temporal nearest points.
The temporal classification model for processed multivariate time series data is developed using stacked GRU (S-GRU) after successful imputation of missing values. The proposed stacking GRU (S-GRU) is compared against LSTM, Simple RNN, TDNN, ANN as well as several missing value imputation methodologies such as attribute dependency based KNN, Extreme Machine Auto-Encoder (ELM-AE), Normal KNN, and Mean Imputation. The S-GRU performs well in terms of accuracy, precision, recall, and F1-Score when compared to other network learning techniques for predicting ozone levels.
Treating the missing values in continuous observation of time intervals helps to increase the classification model accuracy of forecasting ozone level. For ozone level monitoring, stacking many GRU units offers a significant ability to extract features from multivariate time stamped data.
Conclusion
The multivariate temporal data are liable to complexities such as missing values, presence of temporal patterns in multivariate attributes, continuous values evaluated against a certain time interval, high dimensional, outliers and so on.The existence of missing values in data sets has an impact on the process of knowledge discovery and decision making. This work introduces two modules: Missing value imputation in multivariate time stamped data and the construction of a temporal classification model for predicting ozone levels. The proposed framework’s system is assessed using multivariate time series data obtained in three separate locations: Houstan, Galveston, and Brazoria, all of these data subject to missing values.
In proposed temporal correlated k-nearest neighbors (TCO-KNN), dependence attribute set for each missing value is constructed utilising the correlation matrix technique. The k-nearest neighbors (KNN) with its dependent attribute set records is used to find the most significant records, which imputes the missing value with the mean of values from its nearest significant set. Then after, a stacked gated recurrent unit (S-GRU) based temporal classification model is created to predict the ozone level using missing value imputed data sets. Stacking several GRU units for ozone level monitoring provides a substantial ability to extract features from multivariate time stamped data. The established temporal framework for predicting ozone levels aids in the protection of the environment from ozone-related hazards. The experimental results show that the proposed TCO-KNN and S-GRU based temporal classification model enhances the classification accuracy and minimizes the error rate.
The goal of future research is to develop a temporal classification model using optimization approaches for tuning the hyper parameters of deep neural networks.
