Hybrid classifier model with tuned weights for human activity recognition

Abstract

A wide variety of uses, such as video interpretation and surveillance, human-robot interaction, healthcare, and sport analysis, among others, make this technology extremely useful, human activity recognition has received a lot of attention in recent decades. human activity recognition from video frames or still images is a challenging procedure because of factors including viewpoint, partial occlusion, lighting, background clutter, scale differences, and look. Numerous applications, including human-computer interfaces, robotics for the analysis of human behavior, and video surveillance systems all require the activity recognition system. This work introduces the human activity recognition system, which includes 3 stages: preprocessing, feature extraction, and classification. The input video (image frames) are subjected for preprocessing stage which is processed with median filtering and background subtraction. Several features, including the Improved Bag of Visual Words, the local texton XOR pattern, and the Spider Local Picture Feature (SLIF) based features, are extracted from the pre-processed image. The next step involves classifying data using a hybrid classifier that blends Bidirectional Gated Recurrent (Bi-GRU) and Long Short Term Memory (LSTM). To boost the effectiveness of the suggested system, the weights of the Long Short Term Memory (LSTM) and Bidirectional Gated Recurrent (Bi-GRU) are both ideally determined using the Improved Aquila Optimization with City Block Distance Evaluation (IACBD) method. Finally, the effectiveness of the suggested approach is evaluated in comparison to other traditional models using various performance metrics.

Keywords

Human activity recognition feature extraction quantum neural network long short term memory optimization

1. Introduction

Human activity recognition (HAR) [1, 2] is an area of research that focuses on the spontaneous recognition of people’s daily normal activities via time series recordings and sensors. Sensors, edge computing, IoT, and cloud computing has provide a significant developments in the previous decade. As the sensors are low-cost components that may be easily incorporated or implanted in both non-portable and portable devices, the majority of human activity recognition [3, 4] research has switched to sensor technology. Wearable sensors are the common IoT application that is utilized for the quick capturing of diverse physical movements or actions. In recent decades, the quick proliferation of reasonably priced smart phones as well as smart watches with wearable Inertial Measurement Units (IMU) sensors (gyroscopes andaccelerometers) can accurately detect and locate human motions in different fields like personal fitness trackers, healthcare, biometrics, elderly care, sports analytics, surveillance, and security. The significance of human activity recognition [5, 6] depending on wearable sensors is evident in the fact that it could be used to monitor and locate a wide range of everyday activities, including drinking, eating, detecting sleep irregularities, as well as brushing teeth, but not confined to exercise-related actions.

Human activity recognition [7, 8] distinguishes simple and complex actions. There are limited research on recognizing complicated human activities including brushing teeth, dribbling a ball, and so on. Complicated human actions encompass performing a simple human action along with a specific transition activity. Several applications are available for video action recognition [9, 10], including surveillance privacy and security systems, content-based video retrieval, human–computer interaction, and activity identification. Moreover, the activity recognition goal [11] is to identify and verify individuals, their behavior, or unusual behaviors in videos, as well as to offer relevant information to assist interactive programmers and IoT-based systems. Due to major advancements in occlusions, camera movements, action recognition variations, and complex backgrounds in illumination postures numerous challenges with regards to guaranteeing the safe and secure environment of residents, such as virtual reality, industrial monitoring, person identification, violence detection, and cloud environments. In videos, temporal and spatial information are critical for distinguishing various human behaviours [12].

Human activity recognition with labeled data is a multivariate time series categorization and supervised learning challenge in the machine learning [13] field. Numerous research have looked at the challenge of activity recognition utilizing both classic techniques like XGBoost, SVM, and Random Forest (RF), as well as non-traditional deep learning approaches like Convolutional Neural Network (CNN), Artificial Neural Network (ANN), Long Short Term Memory (LSTM), and Recurrent Neural Network (RNN). Conventional methods have the disadvantage of requiring a lot of feature engineering and human feature extraction, which would be time-consuming. deep learning approaches [14] could acquire features from necessary data, making them better suited to the challenge of identifying complicated human actions. In video-based action and behavior identification, deep learning is presently the most popular and commonly used approach for learning increased discriminative important aspects and building end-to-end systems. Conventional deep learning techniques for human activity recognition [15] use pre-trained models to learn features from video frames using simple CNNs algorithms in convolution operations. To train a classification algorithm [16], these convolutional layers collect and integrate spatial information. Conventional CNN models perform worse in sequential data than handmade features. For example, LSTM [17, 18] is used to recognize actions utilizing characteristics learnt from a CNN with spatiotemporal data. Direct image classification is done using CNN [56]. Deep learning algorithms are divided into four categories based on how they are used, including mapping-based deep learning, instance-based deep learning, network-based deep learning, and adversarial-based deep learning [57, 58, 59]. Furthermore, RNNs have been used to address spatiotemporal difficulties in surveillance technology, with the LSTM developed particularly for higher-term video sequences in human activity recognition for learning and interpreting temporal characteristics [19, 20]. When compared to single learner, ensemble classifier is used [51]. Ensemble classifier improves the accuracy and robustness of the system [52]. Successful generalizable machine learning models are mostly determined by the quantity of training data. The machine-running technology developed to deal with these problems is ensemble methods. An ensemble of classifiers combines the opinions of individual classifiers in some way in order to categorize fresh samples [53, 54, 55].

The following is the major contributing parts of this research work:

•
Improved Bag of Visual words, local texton XOR pattern, and Spider Local Image Feature (SLIF) based features are determined.
•
Hybrid classifier with fined tuned weights via proposed Improved Aquila Optimization with City Block Distance Evaluation (IACBD) model.
•
Offers an upgraded version of the Aquila Optimizer in the form of a new IACBD framework.
•
The proposed overcomes the drawback of traditional Aquila Optimizer (AO), such as lack of exploration ability and enhances its performance.

The paper is structure in this format: The review of the human activity recognition model is given in Section 2 of this article. Section 3 depicts the human activity recognition model’s overall framework. In Section 4, the pre-processing and feature extraction stages of human activity recognition are described. Section 5 explains the classification using a hybrid classifier like optimised Bi-directional Gated Recurrent Unit (Bi-GRU) and LSTM. The weight optimization of Bi-GRU and LSTM using an improved aquila optimization with city block distance evaluation is depicted in Sectio 6. Then, Section 7 specifies the result and discussion. Section 8 at the end contains the research’s conclusion. Table 1 depicts the nomenclature.

Table 1
Nomenclature

Abbreviation Description

ANN Artificial Neural Network

AO Aquila Optimizer

BES Bald Eagle Search

Bi-LSTM Bi-directional Long Short-Term Memory

CHIO Corona virus Herd Immunity Optimizer

CNN Convolutional Neural Networks

DCNN Dilated Convolutional Neural Network

FNR False Negative Rate

FPR False Positive Rate

HAD Hierarchical Architecture Design

HAR Human Activity Recognition

IACBD Improved Aquila Optimization with City Block Distance Evaluation

IMU Inertial Measurement Units

IoT Internet of Things

LSTM Long Short Term Memory

NPV Net Predictive Value

PRO Poor and Rich Optimization

QNN Quantum Neural Network

RNN Recurrent Neural Network

SDO Sparse Dictionary Optimization

SLIF Spider Local Image Feature

SOA Seagull Optimization Algorithm

VMM Variable-length Markov Modeling

2. Literature review

Abbreviation	Description
ANN	Artificial Neural Network
AO	Aquila Optimizer
BES	Bald Eagle Search
Bi-LSTM	Bi-directional Long Short-Term Memory
CHIO	Corona virus Herd Immunity Optimizer
CNN	Convolutional Neural Networks
DCNN	Dilated Convolutional Neural Network
FNR	False Negative Rate
FPR	False Positive Rate
HAD	Hierarchical Architecture Design
HAR	Human Activity Recognition
IACBD	Improved Aquila Optimization with City Block Distance Evaluation
IMU	Inertial Measurement Units
IoT	Internet of Things
LSTM	Long Short Term Memory
NPV	Net Predictive Value
PRO	Poor and Rich Optimization
QNN	Quantum Neural Network
RNN	Recurrent Neural Network
SDO	Sparse Dictionary Optimization
SLIF	Spider Local Image Feature
SOA	Seagull Optimization Algorithm
VMM	Variable-length Markov Modeling

In 2021, Saurabh et al. [21] have suggested CNN-GRU, a unique hybrid deep learning approach to identify the complicated human behaviors by researchers. In this investigation, raw sensor data from the WISDM dataset was employed. From the original dataset, distinct datasets for smart phones and smart watches were separated out. The sliding window method was used to manipulate data during preprocessing. This research did not include any manual feature engineering. As a whole, the analysis indicate that hybrid deep learning models can proficiently and spontaneously extract spatial-temporal features from raw sensor data to categories complicated human actions, as well as they can offer higher accuracy than other deep learning techniques used in this research, that had a more complicated architectures.

In 2020, Jie et al. [22] has suggested a high-speed network for human activity recognition. The goal is to increase the performance of optical flow feature extraction and investigate the spatio-temporal feature fusion approach. This technique for spatio-temporal features fuses temporal and spatial data into fusion features. Rather than the VGG16 network, that is employed to analyze optical flow characteristics to acquire plentiful features, they suggest CNN with OFF. On the other hand, they employ CNN to compute optical flow, considerably increasing the model’s speed. Finally, the algorithm could operate at a frame rate of around 140 frames/second. The suggested approach might effectively increase the accuracy of human activity recognition when comparing to extant video action recognition techniques.

In 2021, Kumie et al. [23] has focused on two Branch Novel-View Action Generation technique, that creates a new action sample for arbitrary-view human activity recognition using auxiliary conditional Generative Adversarial Network (GAN). The created sample increases the number of action examples available for training. Additionally, they offer a view-domain generalization approach which enhances the arbitrary-view based on detection capability of human activity recognition by narrowing the description of actions in various views. This technique was validated using two forms of view-invariant assessments on 3 datasets. The suggested method succeeds admirably in the recognition of human actions.

In 2021, Jianjing et al. [24] have proposed a hybrid method to context-aware human activity recognition and predictions based on the merging of Virtual Measurement (VMM) and CNN. The goal was to use the temporal and spatial context encoded in visual information to identify and anticipate human actions. For collaborative context recognition and action detection, a bi-stream CNN framework object information and parses person is determined as the spatial context included in video pictures. In test bed, the effectiveness of the devised approach was experimentally assessed. Both prediction and action recognition have been shown to be extremely accurate.

In 2021, Jonghyun et al. [25] have proposed TA3DNet, a weakly-supervised 3D network with temporal attention in human activity recognition that speeds up 3D CNNs through allocating varying priority to each frame at various times. Then, they use a temporal attention module to give each frame varying weights. In a weakly-supervised method, they trained the temporal attention component in which weights were updated with no supplementary labels as well as class labels. As a consequence, TA3DNet minimizes the input frames amount and builds a network based on lightweight action recognition. Tests revealed that TA3DNet beats traditional algorithms for action recognition on 2 tough datasets.

In 2021, Khan et al. [26] has adopted a BiLSTM-based learning algorithm with a Deep Convolutional Neural Network (DCNN) which systematically concentrates on effective characteristics in the image input. Researchers employ the DCNN framework to identify the prominent features in this varied network, and the residual blocks used to enhance the features to maintain larger data. They also combine Softmax with the centre loss to increase the loss function in video-based action categorization resulting better outcomes. The suggested approach was tested against three datasets, and it obtained recognition rates of 98.3%, 99.1%, and 80.2%, correspondingly, indicating a 1%–3% increase over extant systems.

In 2021, Jansi et al. [27] has suggested hierarchical evolutionary scheme for human activity recognition based on sparse characterizations. The aim of this article was to build a methodology for human activity recognition that yields higher outcomes. Hierarchical Aggregation/disaggregation and Decomposition/composition (HAD) algorithm, a revolutionary approach for determining hierarchical structure from input data, was described. They also proposed a new Sparse Dictionary Optimization (SDO) approach for building dictionaries, which could also help with sparse representation-based categorization. For the USC–HAD and HAPT datasets, the selected classification system received F-score values of 98.01% and 93.51%, correspondingly.

Table 2
Review on extant human activity recognition approaches: features and limitations

Developed model	Features	Challenges	Author [citation]
CNN-GRU method	Larger accuracy Maximum F1-score Improved precision Better recall	More complex DNN models were not considered.	Saurabh et al. [21]
CNN scheme	Better efficiency High speed Improved accuracy	The proposed work does not explored more efficient ways to generate optical flow.	Jie et al. [22]
2-branch novel-view sample generation scheme	Higher recognition accuracy High-quality samples	NTU RGB+D 120 dataset was a challenging dataset for human activity recognition.	Kumie et al. [23]
Hybrid CNN and VMM model	High accuracy Larger recognition rate	The proposed model was not improved the robustness of the action recognition and prediction.	Jianjing et al. [24]
TA3DNet scheme	Increased accuracy High attention scores	The proposed model was not investigated for generating the motion information with minimum computational costs.	Jonghyun et al. [25]
BiLSTM – DCNN model	High accuracy Maximum prediction performance Larger recognition rate	The 2 stream learning scheme doesn’t follow in this study.	Khan et al. [26]
SDO algorithm	Maximum F-score High specificity Larger accuracy	The proposed model has not developed a single-layered scheme for human activity recognition based on sparse representation theory.	Jansi et al. [27]
DNN-based human activity recognition classifiers	Less processing time Average precision High recall Reduced power consumption	Need to extend the experimental settings for higher numbers of epochs, covering loss functions, etc.	Suwannarat et al. [28]

In 2021, Suwannarat et al. [28] have worked to optimize DNN-depending on human activity recognition through lowering the acceleration data dimensionality, selecting the most appropriate sample size for DNN processing, as well as minimizing the parameters of the suggested architecture. To develop our potential designs, they used 2 reported DNN-based human activity recognition frameworks as starting points and baselines. Based on getting acceleration data in the CPU processing time and sensor, the suggested classifiers with optimized parameters were beneficial since they demand less processing time and power usage. They also lower the amount of memory required for parameter storage and ideal to be used in a wearable device.

The review on existing human activity recognition model was expressed in Table 2. First, the CNN-GRU method was used in [21] that has a higher accuracy, maximum F1-score, improved precision, and better recall; nevertheless, more complex DNN models were not considered in this work. CNN model was used in [22] has high speed, better efficiency, and improved accuracy, but the proposed work does not explored more efficient ways to generate optical flow. Moreover, the two-branch novel-view sample generation scheme was determined in [23] with larger recognition accuracy, and high-quality samples. Nevertheless, NTU RGB $+$ D 120 dataset was a challenging dataset for human activity recognition. Likewise, the hybrid CNN and VMM scheme was used in [24], has high accuracy, and larger recognition rate. Still, the proposed model was not improved the robustness of the action recognition and prediction. TA3DNet scheme was exploited in [25] that have increased accuracy, and high attention scores; however, need the proposed model was not investigated for generating the motion information with minimum computational costs. In addition, BiLSTM – DCNN scheme was implemented in [26] has high accuracy, maximum prediction performance, and better recognition rate. Yet, two stream learning scheme was not used in this work. SDO algorithm was designed in [27] has maximum F-score, improved specificity, and larger accuracy. However, the proposed model has not developed a single-layered scheme for human activity recognition based on sparse representation theory. Finally, DNN-based human activity recognition classifiers was determined in [28] has least processing time, average precision, high recall, and reduce power consumption, yet needed to extend the experimental settings for covering loss functions, higher numbers of epochs, etc. The human activity recognition framework needs to be examined in order for the current study to use it effectively.

3. Overall framework of the human activity recognition model

The human activity recognition system presented in this research that involves 3 stages: preprocessing, feature extraction, and classification”. The input image frames (video)are initially processed to the preprocessing step. Initially, median filtering and background subtraction based preprocessing is performed. The preprocessed image is next exposed for the feature extraction, which extracts an improved Bag of Visual words, local texton XOR pattern, as well as SLIF. Furthermore, the extracted features are subjected for the classification. In this instance, a hybrid classifier such as LSTM and Bi-GRU is used to classify the actions. The hybrid model is as follows: The features are separately passed to both the classifiers, and obtains the classified outcome. The outcomes of LSTM and Bi-GRU are averaged to provide the final classified results. To improve performance of the system, the weight of both the LSTM and Bi-GRU will be optimally tuned by the proposed IACBD that ensures precise recognition. Then, the final outcome is extremely accurate. The adopted scheme’s layout is shown in Fig. 1.

Figure 1.

Adopted scheme’s layout.

4. Preprocessing and feature extraction stage

4.1 Preprocessing

To star with the model, preprocessing is the initial process, which will be handled under certain processes. In this work, two processes will be performed to enhance the input, and they are: Median Filtering, and Background subtraction.

4.1.1 Median filtering

During the pre-processing stage, the input image is improved using the median filtering method. Using the median filtering [29] approach, the input image is smoothed and denoised. The neighborhood mask of the median value is also used to recover the noise value or digital picture sequence. The aggregated median value, which replaces the noisy value, and the neighboring pixels are sorted, is recorded depending on the grey levels. Moreover,

$\displaystyle f\left({u,v}\right)=\textit{med}\left\{{g\left({u-c,v-d}\right)c% ,d\in J}\right\}\ \text{is median filtering outcome.}$

where $f\left({u,v}\right)$ , $g\left({u,v}\right)=$ output image as well as original image, $J=$ 2D mask, $H\times H=$ mask size, $H$ is actually odd having $3\times 3$ , $5\times 5$ , etc. The mask can also be cross-shaped, circular, square, or linear, among other shapes.

Lowering the noise performance: A nonlinear filter called median filter evaluate an image with random noise is challenging. The median filtering method is used to calculate the noise variance of a picture underneath a normal distribution having zero mean noise, as stated in Eq. (1).

$\displaystyle\sigma_{\textit{med}}^{2}=\frac{1}{\textit{4HF}^{2}\left(H\right)% }\approx\frac{\sigma_{c}^{2}}{H+\frac{\pi}{2}-1}.\frac{\pi}{2}$ (1)

where $\sigma_{c}^{2}=$ input noise power variance, $H=$ mask size of median filter, $F\left(H\right)=$ noise density function.

The noise variance is represented in Eq. (2) when using the average filter.

$\displaystyle\sigma_{\textit{var}}^{2}=\frac{1}{H}\sigma_{c}^{2}$ (2)

The impacts of median filtering depend on the noise distribution and mask size when comparing Eqs (1) and (2). Furthermore, compared to the average filtering efficiency, the median filtering results yield the least amount of random noise. Since the pulse width is smaller than $H/2$ , the impulse noise has narrow pulses and an effective median filter. The efficiency of the average filtering procedure is increased when it is paired with the median filtering approach, enabling the mask to be expanded in accordance with the noise density.

4.1.2 Background substraction

Background subtraction [30, 31] is a method that involves creating a foreground mask to separate foreground items from the background. Moreover, the foreground mask is calculated through the background subtraction via subtracting the recent frame from a background model that includes the static component of the picture. There are two basic phases in background modelling: Background Initialization, and Background Update.

The background’s initial model is calculated in the 1 ${}^{\text{st}}$ phase, and the changes in the scene are updated in the 2 ${}^{\text{nd}}$ step. This method is used to detect dynamically moving objects from still images. The background removal approach is crucial for object tracking.

Thereby the pre-processed image is denoted as $F^{\textit{pre}}$ .

4.2 Feature extraction

Subsequent to the preprocessing, certain features are extracted from the preprocessed image that aids in recognition. Three sort of features are extracted including (i) Improved bag of visual words, (ii) Local Texton XOR pattern, and (iii) Spider Local Picture Feature (SLIF)

(i) Improved bag of visual words: the bag of visual words (BoW) is probably the most used feature representation approach for videos and still pictures in the field of human activity recognition. The BoW, also known as the bag of visual words is a typical feature symbolization approach used for document symbolization in information retrieval. In the domains of image/video retrieval, this approach was applied. Here the required collection of visual words can be computed in Eq. (3).

$\displaystyle h\left(i\right)=\sum\limits_{p=1}^{C}{\sum\limits_{q=1}^{D}{k% \left({U_{i},p,q}\right)}}\quad\forall_{i}\in\left\{{1,\ldots\left|U\right|}\right\}$ (3)

As per the improved bag of visual words, this required collection of visual words is evaluated by new formula calculation given in Eq. (4). Here, the shannon’s entropy $E\left(I\right)$ is used for finding the probability of images, and its evaluation is given in Eq. (5).

$\displaystyle h\left(i\right)=\sum\limits_{p=1}^{C}{\sum\limits_{q=1}^{D}{% \frac{\left({U_{i},p,q}\right)}{E\left(I\right)}}}\quad\forall_{i}\in\left\{{1% ,\ldots\left|U\right|}\right\}$ (4)

where

$\displaystyle E\left(I\right)=-\sum\limits_{n=0}^{m-1}{K\left(n\right)\log_{2}% \left({K\left(n\right)}\right)}$ (5)

$E\left(I\right)$ indicates the shannon’s entropy, $I$ refers to the original image, $K\left(n\right)$ indicates the probablility of the occurrences of the value $n$ in the image $I$ , $E\left(I\right)$ is a convenient notation for the entropy of an image, $\left|U\right|$ denotes the visual words count, $D$ and $C$ indicates the image height and width, and $h\left(i\right)$ specifies the visual word $U_{i}$ frequency. The extracted improved bag of visual words is denoted as $F^{\textit{IBoVW}}$ .

Figure 2.

Texton shapes.

(ii) Local Texton XOR pattern [32, 33]: In this LTXOR pattern, Seven dissimilar Texton shapes are used for generating the texton image. Figure 2 illustrates the texton shapes where the image is splitted into overlapping $2\times 2$ subblocks known as $B_{1}$ . The gray value positions are regarded as $P, Q, R, S$ for analysis. On the basis of texton shape, the subblocks are coded in Eq. (6).

$\displaystyle Tx\left({Y,Z}\right)=\left\{{\begin{array}[]{ll}1,&B_{1}\left(P% \right)=B_{1}\left(Q\right)\&B_{1}\left(R\right)\neq B_{1}\left(S\right)\\ 2,&B_{1}\left(Q\right)=B_{1}\left(S\right)\&B_{1}\left(P\right)\neq B_{1}\left% (R\right)\\ 3,&B_{1}\left(R\right)=B_{1}\left(S\right)\&B_{1}\left(P\right)\neq B_{1}\left% (Q\right)\\ 4,&B_{1}\left(P\right)=B_{1}\left(R\right)\&B_{1}\left(Q\right)\neq B_{1}\left% (S\right)\\ 5,&B_{1}\left(P\right)=B_{1}\left(S\right)\&B_{1}\left(Q\right)\neq B_{1}\left% (R\right)\\ 6,&B_{1}\left(Q\right)=B_{1}\left(R\right)\&B_{1}\left(P\right)\neq B_{1}\left% (S\right)\\ 7,&B_{1}\left(P\right)=B_{1}\left(Q\right)\&B_{1}\left(R\right)=B_{1}\left(S% \right)\\ 0,&B_{1}\left(P\right)\neq B_{1}\left(Q\right)\&B_{1}\left(R\right)\neq B_{1}% \left(S\right)\\ \end{array}}\right.$ (6)

The centre of each pixel and surrounding neighbors are gathered on the texton picture, after computing the texton image, it performed the XOR operation between the centre texton and neighbor. Equation (7) also determines the local texton XOR patterns.

$\displaystyle\textit{LTxXOR}_{\textit{G,L}}=\sum\limits_{l=1}^{G}{2^{\left({l-% 1}\right)}\times\tilde{f}_{3}\left({\textit{Tx}\left({b_{l}}\right)\otimes% \textit{Tx}\left({b_{a}}\right)}\right)}$ (7)

where

$\displaystyle\tilde{f}_{3}\left({y\otimes z}\right)=\left\{{\begin{array}[]{ll% }1&y\neq z\\ 0&\textit{else}\\ \end{array}}\right.$ (8)

where $\otimes$ indicates the XOR operation among the variables, $Tx\left({b_{a}}\right)$ refers to the texton shape for the center pixel $b_{a}$ , and $Tx\left({b_{l}}\right)$ denotes the texton shape for the neighbor pixel $b_{l}$ .

Further, the specified image of texton is transformed to maps of LTxXORP within 0 to $2^{\tilde{p}}-1$ . In LTxXORP computation, it specified the total map via constructing a histogram in Eq. (7).

$\displaystyle\textit{His}_{\textit{LTxXORP}}\left(\tilde{m}\right)=\sum\limits% _{\tilde{j}=1}^{T_{1}}{\sum\limits_{\tilde{k}=1}^{T_{2}}{\tilde{f}_{2}\left({% \textit{LTxXORP}\left({\tilde{j},\tilde{k}}\right),\tilde{m}}\right)}};\,\,% \tilde{m}\in\left[{0,\left({2^{\tilde{p}}-1}\right)}\right]$ (9)

Figure 3 represents the LTxXORP for an image. The extracted LTxXORP features are specified as $F^{\textit{LTxXORP}}$ .

Figure 3.

Examination of LTxXORP image.

(ii) Spider Local Picture Feature (SLIF): In the SLIF [34] description model, the feature vectors are calculated by a unique orb web sampling pattern stimulated in the simplify orb web scheme. Moreover, SLIF is defined in Eq. (10).

$\displaystyle Y_{r,s}^{t}=\left({\tilde{x}_{t}+\frac{s.\cos\left({\frac{2\pi r% }{X}}\right)}{Z},\tilde{y}_{t}+\frac{s.\sin\left({\frac{2\pi r}{X}}\right)}{Z}% }\right)$ (10)

where $\tilde{y}_{t}$ and $\tilde{x}_{t}$ denotes the vertical and horizontal image coordinate in $t^{\text{th}}$ interest point $\left({PO_{t}}\right)$ , and $Y_{r,s}^{t}$ is the position of the web nodes $\left({r,s}\right)^{t}$ . The extracted SLIF feature is indicated by $F^{\textit{SLIF}}$ .

Additionally, the total set of retrieved features is described as

$\displaystyle F=F^{\textit{IBoVW}}+F^{\textit{LTxXORP}}+F^{\textit{SLIF}}.$

5. Classification through hybrid classifier: Hybridizing optimized LSTM and Bi-GRU

The classification technique was carried out using an optimized hybrid classifier (HC) like LSTM as well as Bi-GRU once the overall feature $F$ has been extracted. The feature $F$ is directly supplied to both classifiers as an input in this case. The output of both classifiers is then averaged to get the overall result.

5.1 Optimized Long Short Term Memory (LSTM)

A gate control unit and linear association are both used by the LSTM network to effectively address gradient desertion issues. Then, the LSTM model in the time-series data detects the considerable reliance.

The development of LSTM [35] comprises the persisting LSTM cells sequences. Moreover, the forget gate, output gate, and input gate are symbolized in 3 units of LSTM cells. Further, these characteristic allows the memory cells of LSTM to suggest and accumulate data for an extended time. Assume $\tilde{L}$ and $\tilde{G}$ denotes hidden and cell states, respectively. $({\tilde{L}_{\hat{t}},\,\,\tilde{G}_{\hat{t}}})$ and specifies the output and $({\tilde{F}_{\hat{t}},\,\tilde{G}_{\hat{t}-1},\tilde{L}_{\hat{t}-1}})$ input layers. The forget gate, output, and input gates are indicated as $\tilde{J}_{\hat{t}},\tilde{O}_{\hat{t}},\,\tilde{I}_{\hat{t}}$ respectively at time $\hat{t}$ . Figure 4 depicts the LSTM model.

Figure 4.

Long short term memory model (LSTM).

The LSTM cell used $\tilde{J}_{\hat{t}}$ for filtering the data primarily, $\hat{G}_{\hat{t}}$ is determined as per Eq. (11).

$\displaystyle\tilde{J}_{t}=\kappa\left({W_{C}\tilde{F}_{\hat{t}}+x_{\tilde{C}}% +W_{\tilde{D}}\tilde{L}_{\hat{t}-1}+x_{\tilde{D}}}\right)$ (11)

where $\left({W_{\tilde{D}},x_{\tilde{D}}}\right)$ and $\left({W_{\tilde{C}},x_{\tilde{C}}}\right)$ specifies the weight matrix and bias parameter. Conventionally, the bias parameter and weight factor are chosen randomly. Here, the weighting factor is tuned optimally by a proposed IACBD model. As a result, the gate ( $\kappa$ ) activation function is chosen to be the sigmoid function. Following that, the LSTM cell employs to aggregate the suitable data in input gate is specified by Eqs (12)–(14). The bias parameters as well as weight matrices that mapped the hidden and input layers to gate cell are denoted as $x_{\tilde{V}},\left({W_{\tilde{V}}}\right)$ and $\left({x_{\tilde{U}},W_{\tilde{U}}}\right)$ . $\left({W_{\tilde{b}},x_{\tilde{b}}}\right)$ and $\left({W_{\tilde{c}},x_{\tilde{c}}}\right)$ describe the weight and bias parameters that relate to $\textit{IL}_{\hat{t}}$ the hidden and input layers.

$\displaystyle\tilde{Z}_{\hat{t}}=\tanh\left({W_{\tilde{V}}\tilde{F}_{\hat{t}}+% x_{\tilde{V}}+W_{\tilde{U}}\tilde{L}_{\hat{t}-1}+x_{\tilde{U}}}\right)$ (12) $\displaystyle\textit{IL}_{\hat{t}}=\kappa\left({W_{\tilde{c}}\tilde{F}_{\hat{t% }}+x_{\tilde{c}}+W_{\tilde{b}}\tilde{L}_{\hat{t}-1}+x_{\tilde{b}}}\right)$ (13) $\displaystyle\tilde{G}_{\hat{t}}=\tilde{J}_{\hat{t}}\hat{G}_{\hat{t}-1}+IL_{% \hat{t}}\tilde{T}_{\hat{t}}$ (14)

Furthermore, in Eqs (12) and (13), the LSTM receives hidden layer (output) from output gate.

$\displaystyle o_{\hat{t}}=\kappa\left({W_{e}\tilde{F}_{\hat{t}}+x_{e}+W_{j}% \tilde{L}_{\hat{t}-1}+x_{j}}\right)$ (15) $\displaystyle\tilde{L}_{\hat{t}}=o_{\hat{t}}\tanh\left({\tilde{G}_{\hat{t}}}\right)$ (16)

where the weight and bias variables for $o_{\hat{t}}$ mapping the input and hidden layers are indicated as $\left({W_{e},x_{e}}\right)$ as well as $\left({W_{j},x_{j}}\right)$ . The LSTM output is indicated as $C_{\textit{LSTM}}$ .

During lengthy sequence training, LSTM eliminates gradient disappearance and gradient explosion. As a result, it performs better during training of lengthier sequences. The long-term memory of the LSTM model is preserved by Bi-GRU, a condensed form of LSTM. The update gate and reset are used in place of the LSTM cell’s input gate, forgetting gate, output gate, and reset in Bi-GRU. The update gate performs similar operations to the input and forgotten gates of the LSTM. It decides what data should be removed and what new data should be added. Another gate that makes the decision to forget knowledge from the past is the reset gate. Bi-GRU performs better when dealing with lengthy digital sequences than LSTM since it uses less tensor operations.

5.2 Bi-directional Gated Recurrent Unit (Bi-GRU)

To organize the sequential information, constructing a system is most advantageous. RNNs specialize at encoding sequence information. It utilized a Bi-GRU to extract DDI and afterwards attached the results to the GCN [36]. Further, the Bi-GRU [37] is separated into 2 components of information transfers for computation: reverse sequence and forward sequence. In addition, the forward GRU for the given sentence is determined: $\tilde{H}=\left({\tilde{h}_{1},\tilde{h}_{2},\ldots,\tilde{h}_{\tilde{n}}}\right)$ , $\tilde{h}\in\Im^{\tilde{q}}$ , $z$ refers present word’s concatenating vector. Figure 5 shows the Bi-GRU model.

Figure 5.

Bi-directional Gated Recurrent Unit (Bi-GRU) model.

The forward GRU is determined:

$\displaystyle\tilde{B}=\vartheta\left({w_{\tilde{s}\tilde{B}}\tilde{s}_{\tilde% {g}}+w_{\tilde{d}\tilde{B}}\tilde{d}_{\tilde{g}-1}+\tilde{a}_{\tilde{B}}}\right)$ (17) $\displaystyle\tilde{l}=\vartheta\left({w_{\tilde{s}\tilde{l}}\tilde{s}_{\tilde% {g}}+w_{\tilde{d}\tilde{l}}\tilde{d}_{\tilde{g}-1}+\tilde{a}_{\tilde{l}}}\right)$ (18) $\displaystyle\tilde{u}=\tanh\left({w_{\tilde{s}\tilde{u}}\tilde{s}_{\tilde{g}}% +w_{\tilde{d}\tilde{u}}\left({\tilde{B}\Theta}\right)\tilde{d}_{\tilde{g}-1}+% \tilde{a}_{\tilde{u}}}\right)$ (19) $\displaystyle\hat{h}=\left({1-\tilde{l}}\right)\Theta\hat{h}_{\tilde{g}-1}+% \tilde{l}\Theta\tilde{s}$ (20)

The weight matrix and bias vector are determined as $w_{*}$ and $\tilde{a}_{*}$ . A suggested IACBD model optimises the weighting factor in this case. $\Theta$ denotes element-wise multiplication, $\tilde{d}_{\tilde{g}}$ refers to hidden state of present time step $\tilde{g}$ , $\sigma$ is the sigmoid function, and $\tilde{s}_{\tilde{g}}$ portray the input word vector at time step $\tilde{g}$ . $\overleftarrow{d}_{\tilde{B}}$ and $\overrightarrow{d}_{\tilde{B}}$ denotes the outputs of backward and forward GRUs, correspondingly. Furthermore, the Bi-GRU outcome is specified as $\tilde{d}_{\tilde{B}}^{\textit{Bi}-\textit{GRU}}=\left[{\overrightarrow{d}_{% \tilde{B}};\overleftarrow{d}_{\tilde{B}}}\right]$ .

The overall results of the classification examination are shown as OT as per Eq. (21).

$\displaystyle\textit{OT}=\frac{C_{\textit{LSTM}}+\tilde{d}_{\tilde{B}}^{% \textit{Bi}-\textit{GRU}}}{2}$ (21)

Despite their effectiveness and resilience, CNN-based techniques are only suitable for fixed and short sequence classification problems; they does’nt advised for use with long-term challenges involving complex time series data. The ability to retain knowledge for a long time is called long short-term memory (LSTM). In addition to learning how to categorize sequences, an LSTM network also learns what information in a sequence would most effectively promote classification. We hypothesize that this is crucial for robust processing. While some (such as conditional random fields) can suffer from excessive computational cost, others (such as those that capture relationships between inputs that are not local in time) can struggle to do so. The ability of LSTM to tractably extract and correlate temporally scattered information makes it an increasingly potent alternative to such approaches. The Bi-GRU consists of both the forward and backward components. The Bi- GRU model beat some old deep learning models utilized for the human activity recognition task because it is both temporally and spatially deep. Multi-input architecture’s capacity to collect both deep and shallow characteristics aids in more accurate activity prediction.

6. Weight optimization of hybrid model via improved aquila optimization with city block distance evaluation (IACBD)

6.1 Objective function and solution encodings

As previously mentioned, the IACBD technique is utilised to modify the weights of the LSTM and the Bi-GRU. Figure 6 contains the input solution for the IACBD model. In this case, the LSTM’s final weights are $N$ and Bi-GRU final weight are $M$ . Additionally, the final results of both LSTM and Bi-GRU are averaged, and the mean error of both LSTM and Bi-GRU is calculated as $\left(\textit{err}\right)$ . The objective function Obj of the implemented system is determined in Eq. (22).

$\displaystyle\textit{Obj}=\min\,\left(\textit{err}\right)$ (22)

Figure 6.

Solution encoding.

6.2 Proposed improved aquila optimization with city block distance evaluation (IACBD) algorithm

Aquila Optimizer (AO) [38] has strong global exploration capabilities, but its local exploitation phase lacks sufficient stability. The IACBD scheme is suggested for addressing these limitations. The quality and accuracy of the discovered optimal solution are then improved during the exploitation phase by doing additional local searches. The various algorithms suffer from slow convergence speed, the tendency to fall into the local optima, and premature convergence The ICABD model keeps exploration and exploitation in the right proportions and provides higher quality solution for proposed one. Normally, previous optimization approaches have provided that self-development is feasible [39, 40, 41, 32, 42]. The activities of Aquilas throughout the capture procedure of prey is the nature inspired Aquila Optimizer.

6.2.1 Solutions initialization

One of the population-based schemes is the Aquila optimization, and its rule started along the population $\left(A\right)$ of candidate solutions indicated as per Eq. (23) that produced from the difficulties of upper $\left({UB}\right)$ and lower $\left({LB}\right)$ bounds stochastically. The most excellent response attained in each iterations is generally picked as the perfect solution.

$\displaystyle A=\left[{{\begin{array}[]{ccccc}{\tilde{z}_{1,1}}&{\ldots}&{% \tilde{z}_{1,\tilde{v}}}&{\tilde{z}_{1,Dim-1}}&{\tilde{z}_{1,Dim}}\\ {\tilde{z}_{2,1}}&\ldots&{\tilde{z}_{2,\tilde{v}}}&\ldots&{\tilde{z}_{2,Dim}}% \\ \ldots&\ldots&\ldots&\ldots&\ldots\\ .&.&.&.&.\\ .&.&.&.&.\\ .&.&.&.&.\\ {\tilde{z}_{\tilde{K}-1,1}}&\ldots&{\tilde{z}_{\tilde{K}-1,\tilde{v}}}&\ldots&% {\tilde{z}_{\tilde{K}-1,Dim}}\\ {\tilde{z}_{\tilde{K}}}&\ldots&{\tilde{z}_{\tilde{K},\tilde{v}}}&{\tilde{z}_{% \tilde{K},Dim-1}}&{\tilde{z}_{\tilde{K},\textit{Dim}}}\\ \end{array}}}\right]$ (23)

where $A$ represents the current candidate solutions set randomly in Eq. (33), $\tilde{K}$ refers to candidate solutions entire count, $A_{\tilde{e}}$ indicates the $\tilde{e}^{th}$ solution positions, andDim denotes the issues dimension size.

$\displaystyle A_{\tilde{e}\tilde{v}}=\textit{rand}\times\left({\textit{UB}_{% \tilde{v}}-\textit{LB}_{\tilde{v}}}\right)+\textit{LB}_{\tilde{v}}\tilde{e}=1,% 2,\ldots,\tilde{K};\,\tilde{v}=1,2,\ldots,\textit{Dim}$ (24)

where $\textit{LB}_{\tilde{v}}$ refers to the $\tilde{v}^{th}$ lower bound, rand indicates a random number, and $\textit{UB}_{\tilde{v}}$ denotes the issues $\tilde{v}^{th}$ upper bound.

6.2.2 Mathematical model of Aquila Optimizer

The developed Aquila Optimizer scheme displays the measures obtained during each hunt step and it simulated the behaviour of Aquila’s in hunt. If $\tilde{p}\leqslant\left({\frac{2}{3}}\right)*\tilde{M}$ the exploration stage is interesting, it employing different behaviours and may go from exploration to exploitation; or else, the exploitation stage will be completed.

The goal of modelling the behaviour of Aquilas as a numerical optimization approach is to find the optimal solution given a variety of constraints. The mathematical equation for this Aquila Optimizer is known.

Phase 1: Expanded exploration $\left({A_{1}}\right)$

During 1 ${}^{st}$ stage $\left({A_{1}}\right)$ , to locate the prey area and choose the ideal hunting stance, the Aquila Optimizer comprises a large soar with a vertical stoop. Higher explorers are required by the Aquila Optimizer to identify the search area in which the prey is situated. As per the proposed logic of IACBD algorithm, a new distance measure is introduced in the update evaluation, and thereby the behaviour is theoretically given in $\left({A_{1}}\right)$ as per Eq. (22).

$\displaystyle A_{1}\left({\tilde{P}+1}\right)=A_{\textit{best}}\left(\tilde{P}% \right)\times\left({1-\frac{\tilde{P}}{\tilde{M}}}\right)+\left(\frac{A_{% \tilde{R}}\left(\tilde{P}\right)-A_{\textit{best}}\left(\tilde{P}\right)}{\hat% {d}_{\hat{i}\hat{j}}}\times\tilde{w}_{\hat{i}}*\textit{rand}\right)$ (25)

where

$\displaystyle\hat{d}_{\hat{i}\hat{j}}=\sum\limits_{\hat{i}=1}^{\hat{k}}{\left|% {\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\frown$}\over{a}}_{% \hat{i}}-\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\frown$}\over{% b}}_{\hat{i}}}\right|}$ (26)

where $\hat{d}_{\hat{i}\hat{j}}$ indicates the city block distance is used for identifying the distance among pixels, $\hat{i}$ denotes the rows, $\hat{j}$ indicates the columns, $\tilde{w}_{\hat{i}}$ is randomly selected weight between [1 to 2]. The upcoming iteration of a $\tilde{t}$ solution created via 1 ${}^{st}$ search strategy $\left({A_{1}}\right)$ is denoted as $A_{1}({\tilde{P}+1})$ . This really $A_{\textit{best}}(\tilde{P})$ refers to most excellent solution obtained in $\tilde{P}^{th}$ iteration as well as suggests the exact location of the prey. This formula $({\frac{1-\tilde{P}}{\tilde{M}}})$ is frequently employed to control the iterations count during exploration. Equation (27) calculates the $A_{\tilde{R}}(\tilde{P})$ mean present value connected solutions at the time of iteration $\tilde{P}^{th}$ . rand denotes a number that is picked at random from 0 to 1. $\tilde{M}$ indicates the larger iterations count and $\tilde{P}$ denotes current iteration.

$\displaystyle A_{\tilde{R}}\left(\tilde{P}\right)=\frac{1}{\tilde{K}}\sum% \limits_{\tilde{e}=1}^{\tilde{K}}{A_{\tilde{e}}\left(\tilde{P}\right)};\quad% \forall\tilde{v}=1,2,\ldots,\textit{Dim}$ (27)

where Dim represents the issue’s dimension size and $\tilde{K}$ specifies the number of possible solutions (population size).

Phase 2: Narrowed exploration $\left({A_{2}}\right)$

The Aquila organises the land, circles in desired prey, and strikes in 2 ${}^{\text{nd}}$ phase $\left({A_{2}}\right)$ whenever the prey position is finded from a bigger soar. Aquila Optimizer observed the location of intended prey. It is defined quantitatively in Eq. (28).

$\displaystyle A_{2}\left({\tilde{P}+1}\right)=A_{\textit{best}}\left(\tilde{P}% \right)\times\textit{Levy}\left(\beta\right)+A_{\tilde{S}}\left(\tilde{P}% \right)+\left({\vec{u}-\vec{v}}\right)*\textit{rand}$ (28)

The upcoming iteration of the $\tilde{P}$ solution as decided via second search technique $\left({A_{2}}\right)$ , is denoted by $A_{2}({\tilde{P}+1})$ as per Eq. (36). Equation (29) $\textit{Levy}\left(\beta\right)$ specifies the function of levy flight distribution, and $\beta$ indicates the dimension space. $A_{\tilde{S}}(\tilde{P})$ represents a solution that was chosen at $\tilde{P}^{th}$ iteration random $\left[{1\,\,\,\tilde{K}}\right]$ .

$\displaystyle\textit{Levy}\left(\beta\right)=\bar{s}\times\frac{\bar{h}\times% \gamma}{\left|\bar{g}\right|^{\frac{1}{\rho}}}$ (29)

where $\bar{h}$ , and $\bar{g}$ determine random numbers between 0 and 1 $\bar{s}=$ 0.01, $\gamma$ is given in Eq. (30).

$\displaystyle\gamma=\left({\frac{\Gamma\left({1+\rho}\right)\times\sin e\left(% {\frac{\pi\rho}{2}}\right)}{\Gamma\left({\frac{1+\rho}{2}}\right)\times\rho% \times 2^{\left({\frac{\rho-1}{2}}\right)}}}\right)$ (30)

where $\rho=$ 1.5. In Eq. (36), $\bar{u}$ and $\bar{v}$ denotes the spiral shape as represented in Eqs (31) and (32).

$\displaystyle\bar{u}=\tilde{Z}\times\cos\left(\theta\right)$ (31) $\displaystyle\bar{v}=\tilde{Z}\times\sin\left(\theta\right)$ (32)

where

$\displaystyle\tilde{Z}=\tilde{Z}_{1}+\bar{J}\times\beta_{1}$ (33) $\displaystyle\theta=-\psi\times\beta_{1}+\theta_{1}$ (34) $\displaystyle\theta_{1}=\frac{3\times\pi}{2}$ (35)

where $\beta_{1}$ indicates a minor integer (0.005), $\bar{J}$ denotes a less value at 0.00565. and $\psi$ indicates an integer value from 1 to $\left(\textit{Dim}\right)$ .

Phase 3: Extended exploitation $\left({A_{3}}\right)$

The 3 ${}^{\text{rd}}$ phase $\left({A_{3}}\right)$ is employed when the prey area is positioned and this Aquila is prepared to attack and land. Equation (36) indicates the behaviour numerically.

$\displaystyle A_{3}\left({\tilde{P}+1}\right)=\left({A_{\textit{best}}\left(% \tilde{P}\right)-A_{\tilde{R}}(\tilde{P})}\right)\times\alpha-\textit{rand}+% \left({\left({\textit{UB}-\textit{LB}}\right)\times\textit{rand}+\textit{LB}}% \right)\times\mu$ (36)

where $A_{3}({\tilde{P}+1})$ represents the third search method’s $({A_{3}})$ solution at upcoming iteration $\tilde{l}$ , $A_{\textit{best}}(\tilde{P})$ denotes prey’s approximate position in $\tilde{P}^{th}$ iteration (the largest solution), and $A_{\tilde{R}}(\tilde{P})$ refers to mean value of current solution’s at $\tilde{t}^{th}$ iteration. Further, rand indicates the number picked at random from the range of 0 to 1. $\alpha$ and $\mu$ have the exploitation modification parameters been reduced to minimum (0.1). LB is the lower bound of the problem and indicates upper bound of problem.

Phase 4: Narrowed exploitation $\left({A_{4}}\right)$

In 4 ${}^{\text{th}}$ technique $\left({A_{4}}\right)$ , whenever the Aquila model prey, the Aquila attacks the prey in land depending on its stochastic movements. “Walk and grab prey” is how it’s referred to. In the last place, Aquila Optimizer attacks the prey. Further, Eq. (37) represents the quantitative behaviour.

$\displaystyle A_{4}\left({\tilde{P}+1}\right)=\textit{QF}\times A_{\textit{% best}}\left(\tilde{P}\right)-\left({\tilde{E}_{1}\times A\left(\tilde{P}\right% )\times\textit{rand}}\right)-\tilde{E}_{2}\times\textit{Levy}\left(\beta\right% )+\textit{rand}\times\tilde{E}_{1}$ (37)

where QF represents a quality function approached by Eq. (38) for equalizing the search schemes and $A_{4}({\tilde{P}+1})$ denotes the outcomes of 4 ${}^{\text{th}}$ phase $\left({A_{4}}\right)$ at iteration $\tilde{P}$ . $\tilde{E}_{1}$ reflect various Aquila Optimizer motions during flight that are used to follow the prey as calculated in Eq. (39). $\tilde{E}_{2}$ gives minimizing counts from 2 to 0. $A(\tilde{P})$ indicates the current iteration’s $\tilde{P}^{\text{th}}$ solution.

$\displaystyle\textit{QF}\left(\tilde{P}\right)=\tilde{P}^{\frac{2\times\textit% {rand}-1}{\left({1-\tilde{M}}\right)^{2}}}$ (38) $\displaystyle\tilde{E}_{1}=2\times\textit{rand}-1$ (39) $\displaystyle\tilde{E}_{2}=2\times\left({1-\frac{\tilde{P}}{\tilde{M}}}\right)$ (40)

where $\textit{QF}(\tilde{P})=\tilde{P}^{th}$ iteration’s quality function value, $\textit{rand}=$ random number among 0 and 1, $\tilde{P}$ and $\tilde{M}=$ present iteration and higher count of iterations.

The suggested IACBD model pseudo-code is illustrated in Algorithm 1.

Algorithm 1: Pseudo code of IACBD approach
1	Initialization phase
2	Aquila Optimizer (AO)’s population initialization of $A$
3	The Aquila Optimizer parameters initialization (i.e., $\alpha$ , $\mu$ , so on).
4	WHILE (The end condition does not met) do
5	Examine the fitness values
6	$A_{\textit{best}}\left(\tilde{P}\right)$
7	for $\left({\tilde{e}=1,2,\ldots,\tilde{K}}\right)$ do
8	updation of mean value $A_{\tilde{R}}\left(\tilde{P}\right)$ .
9	Update the $\bar{v}$ , $\bar{u}$ , $\tilde{E}_{1}$ , $\tilde{E}_{2}$ , $\textit{Levy}\left(\beta\right)$ , etc
10	if $\tilde{P}\leqslant\left({\frac{2}{3}}\right)*\tilde{M}$ then
11	if $\textit{rand}\leqslant 0.5$ then
12	Phase 1: Extended exploration $\left({A_{1}}\right)$
13	The proposed solution is updated as in Eq. (25).
14	if $\textit{Fitness}\left({A_{1}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A\left(\tilde{P}\right)}\right)$ then
15	$A\left(\tilde{P}\right)=\left({A_{1}\left({\tilde{P}+1}\right)}\right)$
16	if $\textit{Fitness}\left({A_{1}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A_{\textit{best}}\left(\tilde{P}\right)}\right)$ then
17	$A_{\textit{best}}\left(\tilde{P}\right)=\left({A_{1}\left({\tilde{P}+1}\right)% }\right)$
18	end if
19	end if
20	else
21	Phase 2: Narrowed exploration $\left({A_{2}}\right)$
22	The solution is updated in Eq. (28).
23	if $\textit{Fitness}\left({A_{2}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A\left(\tilde{P}\right)}\right)$ then
24	$A\left(\tilde{P}\right)=\left({A_{2}\left({\tilde{P}+1}\right)}\right)$
25	if $\textit{Fitness}\left({A_{2}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A_{\textit{best}}\left(\tilde{P}\right)}\right)$ then
26	$A_{\textit{best}}\left(\tilde{P}\right)=\left({A_{2}\left({\tilde{P}+1}\right)% }\right)$
27	end if
28	end if
29	end if
30	else
31	if $\textit{rand}\leqslant 0.5$ then
32	Phase 3: Expanded exploitation $\left({A_{3}}\right)$
33	The solution is updated in Eq. (36).
34	if $\textit{Fitness}\left({A_{3}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A\left(\tilde{P}\right)}\right)$ then
35	$A\left(\tilde{P}\right)=\left({A_{3}\left({\tilde{P}+1}\right)}\right)$
36	if $\textit{Fitness}\left({A_{3}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A_{best}\left(\tilde{P}\right)}\right)$ then
37	$A_{\textit{best}}\left(\tilde{P}\right)=\left({A_{3}\left({\tilde{P}+1}\right)% }\right)$
38	end if
39	end if
40	else
41	Phase 4: Narrowed exploitation $\left({A_{4}}\right)$
42	The solution is updated in Eq. (37).
43	if $\textit{Fitness}\left({A_{4}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A\left(\tilde{P}\right)}\right)$ then
44	$A\left(\tilde{P}\right)=\left({A_{4}\left({\tilde{P}+1}\right)}\right)$

Algorithm 1: continued
45	if $\textit{Fitness}\left({A_{4}\left({\tilde{P}+1}\right)}\right)<\textit{Fitness% }\left({A_{best}\left(\tilde{P}\right)}\right)$ then
46	$A_{\textit{best}}\left(\tilde{P}\right)=\left({A_{4}\left({\tilde{P}+1}\right)% }\right)$
47	end if
48	end if
49	end if
50	end if
51	end for
52	end while
53	Return $\left({A_{best}}\right)$ .

7. Results and discussion

7.1 Simulation procedure

The proposed human activity recognition with HC+ IACBD approach was implemented in Python and their outcomes was verified. Further, examination was performed using 2 datasets that were downloaded from UCF-ARG [43].

Dataset Description: “UCF-ARG (University of Central Florida-Aerial camera, Rooftop camera and Ground camera) Data set is a Multi view Human Action dataset. UCF-ARG includes 10 actions performed through 12 actors recorded from a ground camera, a rooftop camera at a height of 100 feet, and an aerial camera mounted onto the payload platform of a 13’ Kingfisher Aerostat helium balloon. Except for Open-Close Trunk, all the other actions are performed 4 times by each actor in different directions. Open-Close Trunk is performed only 3 times, i.e. on 3 cars parked in different directions. The actions are captured using a high-definition camcorder 1920 X 1080 at 60fps (frames per second)”.

The sample images are illustrated in Fig. 7. The developed HC+IACBD model over the existing approaches including HC+SOA [44], HC+AO [38], HC+CHIO [29], HC+PRO [45], and HC+BES [46], respectively based on certain measures for different learning percentages like 50, 60, 70, and 80, correspondingly.

Figure 7.

Sample images.

Figure 7.

continued.

Figure 8.

Analysis of the developed method’s performance in comparison to extant schemes.

7.2 Performance analysis

The given HC+IACBD scheme’s performance analysis is compared to conventional methods for particular metrics, as seen in Figs 8–10. Moreover, the presented HC+IACBD approach at learning percentage 50 is 5.20%, 8.33%, 3.12%, 1.04%, and 7.29% superior accuracy than the existing schemes like HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES, correspondingly in Fig. 8(b). As a result, it demonstrates that accepted HC+IACBD work has more accuracy than existing techniques. Further, the developed HC+IACBD scheme attains higher sensitivity ( $\sim$ 0.84) in human activity recognition model than other existing schemes at learning percentage 70 in Fig. 9(a). Likewise, the specificity and precision of the presented HC+IACBD scheme is also found to be larger for every variation in learning percentages than previous models including HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES, respectively. The significance of the proposed idea has been demonstrated by the findings of this investigation, which is trained with the appropriate features. Furthermore, it creates the way for improved recognition outcomes with maximum accuracy as the LSTM and Bi-GRU weights are tuned optimally.

The given HC+IACBD model to various standard methods based on negative metrics like FPR and FNR is given in Fig. 9. The suggested HC+IACBD work’s minimum FNR (0.15) value shows it is less susceptible to errors, leading in exact prediction results for learning percentage 80 (Fig. 9(a)). While comparing to various standard methods like HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES, the FPR of the given HC+IACBD system is 0.018 at learning percentage 50, which would be the lowest value. The difference in performance is shown by the variation in learning percentage. This has demonstrated that the implemented optimization method ensures that the model with optimal weights assures the least amount of inaccuracy.

Figure 9.

Analysis of the developed method’s performance in comparison to extant schemes.

Figure 10.

Analysis of the developed method’s performance in comparison to extant schemes.

Figure 10 represents the accepted HC+IACBD approach to other traditional models in regards of extra values having NPV, Matthews correlation coefficient (MCC), and F-measure. The F-measure of the selected HC+IACBD scheme in Fig. 10(b) outperformed other conventional systems including HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES for learning percentage 80. The accepted HC+IACBD system has a better NPV (0.98) at learning percentage 70, but the evaluated previous systems have lower values as shown in Fig. 10(a). The implemented HC+IACBD method obtains the maximum MCC with a learning percentage of 80 as opposed to the learning percentage 60 in Fig. 10(c). Therefore, the performance of the provided HC+IACBD model outperformed than other existing schemes.

7.3 Overall performance evaluation

Table 3 represents the overall performance assessment of recommended HC+IACBD system in terms of the different measures. The selected HC+IACBD method has demonstrated its recognition ability to extant models including LSTM, RNN, DBN, SVM, Linear SVM, Classic CNNs, Ensemble classifiers (paper 1) as seen in the table. Additionally, as compared to previous systems, the suggested HC+IACBD system achieves the highest accuracy values (0.933). Similarly, the suggested HC+IACBD system outperformed standard algorithms such as LSTM, RNN, DBN, SVM, Linear SVM, Classic CNNs, Ensemble classifiers (paper 1) in terms of MCC. Table 3 presents that the selected HC+IACBD system has a reduced FPR value with superior performance to conventional approaches such as LSTM, RNN, DBN, SVM, Linear SVM, Classic CNNs, Ensemble classifiers (paper 1). The outcomes show that the HC+IACBD scheme is higher than traditional schemes for human activity recognition model. This proves that the proposed method was less prone to misclassification of actions.

Table 3
Overall performance analysis of recommended and previous techniques

Measures	LSTM [37]	RNN [47]	DBN [48]	SVM [49]	Linear SVM [30]	Classic CNNs [50]	Ensemble classifiers (paper 1) [51]	Adopted HC+IACBD model
Accuracy	0.8375	0.837963	0.818981	0.818981	0.829167	0.831019	0.874740	0.933827
Sensitivity	0.1875	0.189815	0.094907	0.094907	0.145833	0.155093	0.946443	0.869136
Specificity	0.909722	0.909979	0.899434	0.899434	0.905093	0.906121	0.928264	0.985460
Precision	0.1875	0.189815	0.094907	0.094907	0.145833	0.155093	0.847261	0.869136
F-Measure	0.1875	0.189815	0.094907	0.094907	0.145833	0.155093	0.860782	0.869136
MCC	0.097222	0.099794	$-$ 0.005660	$-$ 0.005660	0.050926	0.061214	0.812670	0.854595
NPV	0.909722	0.909979	0.899434	0.899434	0.905093	0.906121	0.956984	0.985460
FPR	0.090278	0.090021	0.100566	0.100566	0.094907	0.093879	0.055357	0.014541
FNR	0.8125	0.810185	0.905093	0.905093	0.854167	0.844907	0.125260	0.130864

7.4 Statistical analysis

Table 4 shows the statistical analysis of the provided HC+IACBD system in comparison to the previous system on the basis of error measure. Naturally, meta-heuristic methods are stochastic; thereby to determine the exact results, the algorithms are allowed to run for numerous times to examine the achievement of defined objective. The selected HC+IACBD strategy has better mean outcomes to other previous models like HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES. The best-case scenario demonstrates that suggested approach is better value ( $\sim$ 0.027) to other existing models. As a consequence, the suggested HC+IACBD study has demonstrated its value by recognizing the human activity with least error. Thus, the effectiveness of the HC+IACBD approach has been verified successfully.

Table 4
Statistical analysis based on error measures: developed vs previous models

Measures	Best	Worst	Mean	Median	Standard Deviation (SD)
HC+SOA [44]	0.099753	0.109630	0.103056	0.101421	0.003886
HC+AO [38]	0.059259	0.091852	0.078113	0.080671	0.012692
HC+CHIO [29]	0.043016	0.090370	0.070661	0.074630	0.017211
HC+PRO [45]	0.054815	0.085926	0.064030	0.057690	0.012703
HC+BES [46]	0.041852	0.072435	0.056967	0.056790	0.012687
Adopted HC+IACBD model	0.027936	0.042469	0.032879	0.030556	0.005653

7.5 Analysis based on features and optimization

The analysis of developed approach based on features and optimization is determined in Table 5. Furthermore, the adopted HC+IACBD model hold better MCC than the adopted model with no Optimization and adopted technique with extant BoW. Further, the accuracy of adopted HC+IACBD model has shown ( $\sim$ 0.985) than other adopted model without Optimization and adopted scheme with traditional BoW. With regard to each of the other metrics, comparable performance is seen. The influence of suggested features $+$ LSTM $+$ Bi-GRU $+$ adopted HC+IACBD model in this study was demonstrated. The suggested HC+IACBD model helps to recognise human activity more precisely, but traditional characteristics both adopted model without optimization and adopted model with extant BoW perform poorly while comparing with suggested model. This clearly indicates that the adopted combination is appropriate for the human activity recognition scheme. This is because, the inclusion of city block distance in the update evaluation has shown its convergence efficiency on tuning the appropriate weights of classifiers that helps in determining accurate recognition of actions.

Table 5
Analysis of proposed work based on features and optimization

Metrics	Developed approach without Optimization	Developed approach with traditional BoW	Developed HC+ IACBD approach
Sensitivity	0.818427	0.820674	0.933827
Specificity	0.092135	0.103371	0.869136
Accuracy	0.899126	0.900375	0.985460
Precision	0.092135	0.103371	0.869136
F-measure	0.092135	0.103371	0.869136
MCC	$-$ 0.008740	0.003745	0.854595
NPV	0.899126	0.900375	0.985460
FPR	0.100874	0.099626	0.014541
FNR	0.907865	0.896629	0.130864

7.6 Convergence analysis

The convergence of the chosen IACBD framework and the conventional approaches is assessed by changing the iteration count from 0, 5, 10, 15, 20, and 25 Convergence analysis of the provided technique to standard approaches is shown in Fig. 11. Recommended IACBD approach holds the least cost function. The IACBD technique has reduced cost function since the iterations count maximizes. The cost function of the developed PRDO system fell between the 16 ${}^{\text{th}}$ and 18 ${}^{\text{th}}$ iterations. At the 25 ${}^{\text{th}}$ iteration, the selected IACBD method yields least cost function value (1.145) to extant approaches such as SOA, AO, CHIO, PRO, and BES. Additionally, while comparing to the other approaches, the cost function of the selected IACBD method achieves lower cost values with superior performance. Thus, it is obvious that the adopted IACBD scheme yielded the lowest cost function with better convergence.

Figure 11.

Convergence analysis of provided approach to previous methods.

7.7 Region of curve analysis

The ROC curve is shown in Fig. 12. The diagnostic capability of a binary classifier system is represented graphically by a receiver operating characteristic curve, or ROC curve, which changes as the discrimination threshold is altered. The proposed solution has a significant advantage over other traditional methods.

Figure 12.

Analysis interms of Region of Curve.

7.8 Analysis on area under curve

The term “Area under the ROC Curve” (AUC) refers to the level or assessment of separability. It demonstrates how well the model can distinguish between classes. The greater the AUC, the better the approach does in distinguishing between the positive and negative classifications. Analysis of the AUC are blinked in Table 6.

Table 6
Area under curve analysis

Methods	Values
SOA	2.60419
CHIO	2.62943
BES	2.64843
AO	2.66986
PRO	2.69116
IACBO	2.71114

8. Conclusion

This article has described the human activity recognition system, which includes 3 stages: preprocessing, feature extraction, and classification. Moreover, the input human action images were provided to the preprocessing stage. Here, the median filtering and background subtraction was processed during pre-processing phase. From the preprocessed image, an Improved Bag of Visual Words, local texton XOR pattern, and SLIF were extracted during the feature extraction step. Further, the extracted features were provided to the classification stage. Here, the classification was done via hybrid classifier (HC) including Bi-GRU and LSTM. Next, the LSTM and Bi-GRU outputs were averaged to provide an effective output. The weight of both the LSTM and Bi-GRU was optimally tuned by IACBD approach to make the recognition more accurate and precise. Lastly, the actual outcome was highly exact. Subsequently, the results of developed method were compared to existing approach based on more metrics. Moreover, the presented HC+IACBD scheme for learning percentage 50 was 5.20%, 8.33%, 3.12%, 1.04%, and 7.29% superior accuracy than the existing schemes like HC+SOA, HC+AO, HC+CHIO, HC+PRO, and HC+BES. The accepted HC+IACBD system has a better NPV (0.98), but the evaluated prfevious systems have lower values at learning percentage 70. The best-case scenario demonstrates that suggested approach was better value ( $\sim$ 0.027) to other existing models.

Footnotes

Declaration of statement

To the best of the authors’ knowledge, the paper entitled “Hybrid Classifier Model with Tuned Weights for Human Activity Recognition” is not considered for publication elsewhere and has not been published anywhere.

Author’s Bios

Mr. Anshuman Tyagi received the B.E. degree in electrical engineering from M.M.M.E.C, D.D.U University, Gorakpur, India, and M.Tech.( Computer Application ) from Indian School of Mines Dhanbad. India. Currently, he is a research scholar at the department of computer science and engineering at Amity University, Lucknow, India also working as an assistant professor in Pranveer Singh Institute of Technology, Kanpur India. His research area in the field of machine learning, deep learning and computer vision. He is member of IETE( Institution of Electronics and Telecommunication Engineers).

Dr. Pawan Singh received the B.E. (Computer Science and Engineering) from CCS University, Meerut, India, M.Tech. (Information Technology) from GGSIPU, New Delhi, India, and Ph.D. (Computer Science) from Magadh University, Bodh Gaya, India in 2013. Currently, he is serving in the Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University, Lucknow Campus, India. His research interests include software metrics, software cost estimation, web structure mining, energy-aware scheduling, cloud computing, medical imaging, nature-inspired meta-heuristic optimization techniques, and their applications. He has authored and co-authored several research papers in the journals of international reputation. Dr. Harsh Dev is a Professor in Dept. Of Computer Science and Engineering, he is also holding the position of Dean Research at Pranveer Singh Institute of Technology, Kanpur, India. He got his M.Sc.

Degree in 1995 from Lucknow University, India and Ph.D. degree in Computer Science in 2005 from Babasaheb Bhimrao Ambedkar University, Lucknow, India. He has 25 years of teaching experience and 18 years of research experience in the field of Computer Graphics, Cryptography, Software Engineering, and Data Mining. He has published more than 48 International and National publications. Seven students have been awarded a Ph.D. degree in Computer Science under his guidance. He is a member of the editorial board of a reputed Journal. He is a member of the Computer Society of India, the Indian Science Congress, and the International Association of Engineers (IAENG).

Dr. Pawan has served as a reviewer in various SCI and SCIE indexed journals. Dr. Pawan served as a technical committee for many international conferences. He has been the Special Sessions Cochair of the “Emerging Trends towards Communication, Computing, and Internet of Things”, 2nd International Conference on Communication and Computing Systems (ICCCS – 18) and National Seminar cum Workshop on Data Science and Information Security, 28

{}^{\text{th}}

Feb–2

{}^{\text{nd}}

March 2019. He has been a Guest Editor of Special Issue on Advanced Optimization Techniques for Operation and Control of Intelligent Power Systems, Journal of Control Science and Engineering, Hindawi Publications, London, the United Kingdom He is a member of the IEEE and IEEE Computational Intelligence Society (CIS).

References

Yoshikawa

Shigeto

Takeuchi

and Meta

V.D.

, A Meta Video Dataset for enhancing human action recognition datasets, Computer Vision and Image Understanding 212 (2021) (Cover date: November 2021) Article 103276.

Khan

Z.N.

and Ahmad

, Attention induced multi-head convolutional neural network for human activity recognition, Applied Soft Computing 110 (2021) (Cover date: October 2021) Article 107671.

Rehman Javed

Faheem

and Omer Beg

, A smartphone sensors-based personalized human activity recognition system for sustainable smart cities, Sustainable Cities and Society 71 (2021) (Cover date: August 2021) Article 102970.

Raja

and Vasudevan

S.V.

, A deep genetic algorithm for human activity recognition leveraging fog computing frameworks, Journal of Visual Communication and Image Representation 77 (2021) (Cover date: May 2021) Article 103132.

Xiao

and Zhao

, A federated learning system with enhanced feature extraction for human activity recognition, Knowledge-Based Systems 229 (2021) (Cover date: 11 October 2021) Article 107338.

Ghora Prabono

Nugroho Yahya

and Lee

S.-L.

, Hybrid domain adaptation with deep network architecture for end-to-end cross-domain human activity recognition, Computers and Industrial Engineering 151 (2020) (Cover date: January 2021) Article 106953.

Tarafdar

and Bose

, Recognition of human activities for wellness management using a smartphone and a smartwatch: A boosting approach, Decision Support Systems 140 (2020) (Cover date: January 2021) Article 113426.

Men

Wang

and Wu

, MiTAR: a study on human activity recognition based on NLP with microscopic perspective, Front Comput Sci 15 (2021), 155330.

Singh

Kushwaha

A.K.S.

and Srivastava

, Multi-view recognition system for human activity based on multiple features for video surveillance system, Multimed Tools Appl 78 (2019), 17165–17196.

10.

Khelalef

Ababsa

and Benoudjit

, An Efficient Human Activity Recognition Technique Based on Deep Learning. Pattern Recognit, Image Anal 29 (2019), 702–715.

11.

Khowaja

S.A.

Yahya

B.N.

and Lee

SL.

, CAPHAR: context-aware personalized human activity recognition using associative learning in smart environments, Hum Cent Comput Inf Sci 10 (2020), 35.

12.

Shreyas

D.G.

Raksha

and Prasad

B.G.

, Implementation of an Anomalous Human Activity Recognition System, SN COMPUT SCI 1 (2020), 168.

13.

Chen

Yao

Zhang

Wang

Chang

and Nie

, A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition, IEEE Transactions on Neural Networks and Learning Systems 31(5) (2020), 1747–1756.

14.

Perello-Nieto

Santos-Rodriguez

and Flach

, Human Activity Recognition Based on Dynamic Active Learning, IEEE Journal of Biomedical and Health Informatics 25(4) (2021), 922–934.

15.

Yan

Zhang

Wang

and Xu

, WiAct: A Passive WiFi-Based Human Activity Recognition System, IEEE Sensors Journal 20(1) (2020), 296–305, doi: 10.1109/JSEN.2019.2938245.

16.

Rajeshbhai Mehta

, Human Facial Age Estimation Using Whale MLP-NN, Multimedia Research 4(2), (2021).

17.

Rajeyyagari

, Automatic speaker diarization using deep LSTM in audio lecturing of e-Khool Platform, Journal of Networking and Communication Systems 3(4), (2020).

18.

Liu

, Hybrid shark smell optimization based on world cup optimization algorithm for minimization of THD, Journal of Computational Mechanics, Power System and Control 3(3) (2020).

19.

Wang

and Zhang

, Attention-Based Convolutional Neural Network for Weakly Labeled Human Activities’ Recognition With Wearable Sensors, IEEE Sensors Journal 19(14) (2019), 7598–7604, doi: 10.1109/JSEN.(.

20.

Pham

et al., SensCapsNet: Deep Neural Network for Non-Obtrusive Sensing Based Human Activity Recognition, IEEE Access 8 (2020), 86934–86946, doi: 10.1109/ACCESS.(2020).2991731.

21.

Gupta

, Deep learning based human activity recognition (HAR) using wearable sensor data, International Journal of Information Management Data Insights 1(2) (2021) (Cover date: November 2021) Article 100046.

22.

Song

and Huang

, A fast human action recognition network based on spatio-temporal features, Neurocomputing 441 (2020), 350–358. (Cover date: 21 June 2021).

23.

Gedamu

and Shen

H.T.

, Arbitrary-view human action recognition via novel-view action generation, Pattern Recognition 118 (2021) (Cover date: October 2021) Article 108043.

24.

Zhang

Wang

and Gao

R.X.

, Hybrid machine learning for human action recognition and prediction in assembly, Robotics and Computer-Integrated Manufacturing 72 (2021) (Cover date: December 2021) Article 102184.

25.

Kim

and Kim

, Weakly-supervised temporal attention 3D network for human action recognition, Pattern Recognition 119 (2021) (Cover date: November 2021) Article 108068.

26.

Muhammad

Victor Hugo

and de Albuquerque

, Human action recognition using attention based LSTM network with dilated CNN features, Future Generation Computer Systems 125 (2021), 820–830 (Cover date: December 2021).

27.

Jansi

and Amutha

, Hierarchical evolutionary classification framework for human action recognition using sparse dictionary optimization, Swarm and Evolutionary Computation 63 (2021) (Cover date: June 2021) Article 100873.

28.

Suwannarat

and Kurdthongmee

, Optimization of deep neural network-based human activity recognition for a wearable device, Heliyon 7(6) (2021) (Cover date: August 2021) Article e07797.

29.

Mohammed

A.-B.

Zaid

Mohammed

and Iyad

, Coronavirus herd immunity optimizer (CHIO), Neural Computing and Applications 33 (2021).

30.

Varges da Silva

and Nilceu Marana

, Human action recognition in videos based on spatiotemporal features and bag-of-poses, Applied Soft Computing 95 (2020) (Cover date: October 2020) Article 106513.

31.

Background Subatrction, from; https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html. [Access Date: (2021)-06-23].

32.

George

and Rajakumar

B.R.

, APOGA: An Adaptive Population Pool Size based Genetic Algorithm, AASRI Procedia – 2013 AASRI Conference on Intelligent Systems and Control (ISC 2013) 4 (2013), 288–296,

33.

Bala

and Kaur

, Local texton XOR patterns: A new feature descriptor for content-based image retrieval, Engineering Science and Technology, an International Journal 19(1) (2016), 101–112.

34.

Fausto

Cuevas

and Gonzales

, A new descriptor for image matching based on bionic principles, Pattern Anal Applic 20 (2017), 1245–1259.

35.

Zhu

and Huang

, An Improved Median Filtering Algorithm for Image Noise Reduction, Physics Procedia 125 (2012), 609–616.

36.

Zhao

Wang

and Zhang

, Extracting drug–drug interactions with hybrid bidirectional gated recurrent unit and graph convolutional network, Journal of Biomedical Informatics 99 (2019) (Cover date: November 2019) Article 10329.

37.

Zhou

Lin

Zhang

Shao

and Liu

, Improved itracker combined with bidirectional long short-term memory for 3D gaze estimation using appearance cues, Neuro computing In press, corrected proof, Available Online 20 October 2019.

38.

Abualigah

Yousri

Abd Elaziz

Ewees

A.A.

Al-qaness Mohammed

A.A.

and Gandomi

A.H.

, Aquila Optimizer: A novel meta-heuristic optimization algorithm, Computers and Industrial Engineering 157 (2021). 107250.

39.

Rajakumar

B.R.

, Impact of Static and Adaptive Mutation Techniques on Genetic Algorithm, International Journal of Hybrid Intelligent Systems 10(1) (2013), 11–22.

40.

Rajakumar

B.R.

, Static and Adaptive Mutation Techniques for Genetic algorithm: A Systematic Comparative Analysis, International Journal of Computational Science and Engineering 8(2) (2013), 180–193.

41.

Swamy

S.M.

Rajakumar

B.R.

and Valarmathi

I.R.

, Design of Hybrid Wind and Photovoltaic Power System using Opposition-based Genetic Algorithm with Cauchy Mutation, IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2013), Chennai, India, 2013.

42.

Rajakumar

B.R.

and George

, A New Adaptive Mutation Technique for Genetic Algorithm, In proceedings of IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) 2012, pp. 1–7, Coimbatore, India.

43.

https://www.crcv.ucf.edu/data/UCF-ARG.php.

44.

Dhiman

and Kumar

, Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems, Knowledge-Based Systems 165 (2018), 169–196. (Cover date: 1 February 2019).

45.

Hamid Samareh Moosavi

and Khatibi Bardsiri

, Poor and rich optimization algorithm: A new human-based and multi populations algorithm, Engineering Applications of Artificial Intelligence 86 (2019), 165–181 (Cover date: November 2019).

46.

Alsattar

H.A.

Zaidan

A.A.

and Zaidan

B.B.

, Novel meta-heuristic bald eagle search optimisation algorithm, Artif Intell Rev 53 (2020), 2237–2264.

47.

Kao

L.-J.

and Chou Chiu

, Application of integrated recurrent neural network with multivariate adaptive regression splines on SPC-EPC process, Journal of Manufacturing Systems 57 (2020), 109–118.

48.

Wang

H.Z.

Wang

G.B.

G.Q.

Peng

J.C.

and Liu

Y.T.

, Deep belief network based deterministic and probabilistic wind speed forecasting approach, Applied Energy 182 (2016), 80–93.

49.

Avci

, A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier, Expert Systems with Applications 36(7) (2009), 10618–10626.

50.

Mliki

Fatma

and Mohamed

, Human activity recognition from UAV-captured video sequences, Pattern Recognition 100 (2019). 107140.

51.

Zhi-Hua

, Ensemble methods: foundations and algorithms, CRC press, 2012.

52.

Cha

and Ma

, Ensemble machine learning: methods and applications, Springer Science and Business Media, 2012.

53.

Tagel

Rorissa

and Srinivasagan

, Stacking-Based Ensemble Learning Method for Multi-Spectral Image Classification, Technologies 10(1) (2022), 17.

54.

Hyunjin

Park

and Lee

, Stacking ensemble technique for classifying breast cancer, Healthcare Informatics Research 25(4) (2019), 283–288.

55.

Alexandropoulos Stamatios-Aggelos

et al., Stacking strong ensembles of classifiers, IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Cham, 2019.

56.

Gangappa

Kiran Mai

and Sammulal

, Enhanced Crow Search Optimization Algorithm and Hybrid NN-CNN Classifiers for Classification of Land Cover Images, Multimedia Research 2(3) (2019), 12–22.

57.

Vishwambhar Darekar

and Panjabrao Dhande

, Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm, Multimedia Research 2(4) (2019), 12–22.

58.

Gokulkumari

, Classification of Brain tumor using Manta Ray Foraging Optimization-based DeepCNN classifier, Multimedia Research 3(4) (2020).

59.

Shinde

S.S.

, Enhanced Manta-Ray Foraging Optimization Algorithm based DCNN for Lane Detection, Multimedia Research 4(3) (2021).

Hybrid classifier model with tuned weights for human activity recognition

Abstract

Keywords

1. Introduction

Table 2 Review on extant human activity recognition approaches: features and limitations

4.1 Preprocessing

4.1.1 Median filtering

4.2 Feature extraction

5.1 Optimized Long Short Term Memory (LSTM)

6.1 Objective function and solution encodings

6.2.1 Solutions initialization

7.1 Simulation procedure

Table 3 Overall performance analysis of recommended and previous techniques

Table 4 Statistical analysis based on error measures: developed vs previous models

Table 5 Analysis of proposed work based on features and optimization

Table 6 Area under curve analysis

Footnotes

Declaration of statement

Author’s Bios

References

Table 2
Review on extant human activity recognition approaches: features and limitations

Table 3
Overall performance analysis of recommended and previous techniques

Table 4
Statistical analysis based on error measures: developed vs previous models

Table 5
Analysis of proposed work based on features and optimization

Table 6
Area under curve analysis