Abstract
An emotion is a conscious logical response that varies for different situations in women’s life. These mental responses are caused by physiological, cognitive, and behavioral changes. Gender-based violence undermines the participation of women in decision-making, resulting in a decline in their quality of life. More accurate and automatic classification of women’s emotions can enhance human-computer interfaces and security in real time. There are some wearable technologies and mobile applications that claim to ensure the safety of women. However, they rely on limited social action and are ineffective at ensuring women’s safety when and where it is needed. In this work, a novel CDB-LSTM network has been proposed to accurately classify the emotions of women in seven different classes. The electroencephalogram (EEG) offers non-radioactive methods of identifying emotions. Initially, the EEG signals are preprocessed and they are converted into images via Time-Frequency Representation (TPR). A smoothed pseudo-Wigner-Ville distribution (SPWVD) is employed to convert the EEG time-domain signals into input images. Consequently, these converted images are given as input to the Convolutional Deep Belief Network (CDBN) for extracting the most relevant features. Finally, Bi-directional LSTM is used for classifying the emotions of women into seven classes namely: happy, relax, sad, fear, anxiety, anger, and stress. The proposed CDB-LSTM network preserves the high accuracy range of 97.27% in the validation phase. The proposed CDB-LSTM network improves the overall accuracy by 6.20% 32.98% 6.85% and 3.30% better than CNN-LSTM, Multi-domain feature fusion model, GCNN-LSTM and CNN with SVM and DT respectively.
Keywords
Introduction
An emotion is a physiological state made up of a personal experience, expressive and physiological responses. Emotions are essential in this regard for interaction, interpretation, and decision-making [1]. Emotions have a substantial effect on human behavior, perception, and interactions. A high level of mental workload can negatively impact performance, mental exhaustion, and health [2]. Recently, several deep learning networks have been trained on electroencephalogram (EEG) and electrocardiogram (ECG) signals [3]. In recent days, many researchers have introduced several techniques for classifying emotions based on EEG signals for various purposes. Nandi., et al., [4] devised a Real-time Emotion Classification System (RECS) in 2021, based on logistic regression that was trained internet with the SGD algorithm. Islam., et al., [5] devised a CNN-based modified model for identifying the emotional-based single feature PCC of EEG sub-bands. EEG data were transformed to PCC, which included images of EEG sub-band channel correlations. However, such approaches either ignore the noise that occurs in EEG signals or use an arbitrary set of pre-processing techniques to reduce noise and identify women experiencing various emotions [6].
In the twenty-first century, women have made significant contributions to society and have joined males in a variety of industries. The criminal offense on women such as eve-teasing, harassment, molestation, rape, kidnapping, and domestic abuse, is not decreasing, but rather increasing [7]. Khateeb., et al., [8] proposed a Multi-domain feature fusion model in 2021 for emotion categorization with the DEAP dataset. The government has taken many pre-emptive actions to prevent these misbehaving acts, but nothing has changed the number of crimes and they remain untouched [9, 10]. Currently, there are more crimes committed against women than ever before. Harassment of women occurs not only at night or during the evening, but also during the day at home, at work, and even while shopping. Chen., et al., [11] presented an emotion detection approach of EEG signals by the AdaBoost method. Emotional components in the time domains, time-frequency domains, and non-linear domains are extracted from preprocessed EEG signals and fused into an eigenvector matrix. Women are often scared of strangers and worried about their safety [12]. About 80% of our country’s women are concerned about their safety. Women can use security apps on their smartphones to send emergency alerts to selected persons and to let others know where they are in case of an emergency [13]. When a woman has met with an accident at late night and no one is there to rescue her, she might not be able to communicate her problems.
Emotions in women can be identified by their behavior, vocal tones, facial expressions, and physiological signals. Yin. Y., et al., had introduced a DL-based emotion classification model. The segmented EEG signals are fed into a deep network that fuses graph CNN and LSTM [14]. Maheshwari., et al., [15] designed a rhythm-specific multi-channel CNN for automatic emotion detection with a multi-channel EEG signal. EEG rhythmic patterns from randomly chosen channels combined with deep CNN are used to categorize emotions namely valence, arousal and dominance. Liao., et al., [16] introduced a system for a multi-modal physiological signal for emotion classification using a convolutional RNN. It is possible for someone to intentionally conceal their true emotions, which may be contradictory [17]. On the other hand, physiological responses are more unbiased and reliable when detecting emotions. EEG signal is widely used for recognizing human emotions [18].
EEG signals are formed by the Central Nervous System (CNS), which response to emotional changes more quickly. Qing., et al., [19] introduced an emotional activation curve in 2019 to determine the activation process of various emotions. This activation curve represents the emotional stimulation process. Chen. JX., et al., [20] devised an emotion recognition system with deep CNN and multiple machine learning algorithms. Alazrai., et al., [21] presented an EEG-based emotion detection system with the time-frequency feature extraction model in 2018. To create a high-resolution TFR of the EEG signals and detention their spectral variants by using quadratic TFD. Furthermore, EEG signals have been used because of their usefulness in detecting emotions [22]. EEG-based Neuro-Control Interface (NCI) for translational and healthcare applications has gained renewed interest because of wearable sensors, [23, 24] real-time streaming data, machine learning classifiers, and deep learning models such as DNN, CNN and LSTM [25–27] networks.
Many of the existing approaches described above rely on physical feature extraction and classification, which is a time-consuming process that impedes performance. Most of the aforementioned works take the EEG signals as input, which leads to a low-performance rate. The difficulties stated above require the development of automatic signal decomposition and classification. This work focuses on improving the performance level by converting the EEG signals into images with SPWVD which improves the classification accuracy of the proposed CDB-LSTM network. The key contribution of the proposed classification method are as follows,
•The primary purpose of this study is to classify the pre-processed EEG signals into seven emotions viz., happy, relax, sad, fear, anxiety, angry, and stress.
•Initially, the pre-processed EEG signals from module 1 are converted into images using the Smoothed Pseudo-Wigner-Ville distribution (SPWVD).
•The converted images are fed into the Convolutional Deep Belief Network (CDBN) for extracting the relevant features.
•Finally, Bi-LSTM is used to categorize the seven emotions of women based on the extracted features.
The remainder of this study was organized as follows. Section 2 contains the detailed description publicly available DEAP dataset, Section 3 includes the proposed CDB-LSTM network for classifying the EEG signals into seven classes, Section 4 encompasses results and discussion and finally, Section 5 enfolds with conclusion and future work.
Dataset description
DEAP dataset is a well-known dataset in the field of EEG-based emotion identification. The mental states of 32 people were monitored while they viewed music videos24. Every subject was recorded using 40 channels recording at a sampling rate of 512 Hz, then downsampling to 128 Hz during 40 trials. Each subject received a different video/trial, with four labels: arousal, valence, dominance, and like/dislike. A self-assessment questionnaire24 was used to measure valence, dominance, arousal, and likes or dislikes. The dimensions were mapped to seven different emotions associated with DEAP recordings: happy, sad, angry, stress, fear, anxious, and relax.
Proposed methodology
In this section, we proposed a Convolutional deep belief network with Bi-directional LSTM (CDB-LSTM) for classifying the emotions of women. The schematic representation of the proposed method is displayed in Fig. 1. Smoothed Pseudo-Wigner-Ville Distribution (SPWVD) for converting the pre-processing EEG signals into images. These TFR images are fed as an input to Convolutional Deep Belief Network (CDBN) for extracting the features. The pretrained Bi-directional LSTM is used to categorize the images into various emotions based on the extracted relevant features.

The overall workflow of the proposed model.
The EEG signals recorded from women could be influenced by unsolicited noise due to movement of eyes, electrode interference, power line, cortical potential and electrocardiogram. Interference of these distorted noise affects the signal-to-noise ratio (SNR) and it reduces the efficiency of a classification model. The raw EEG signals with different emotions are collected in the form of happy (E1), sad (E2), angry (E3), stress (E4), fear (E5), anxiety (E6), and relax (E7) as shown in Fig. 2(a) and they are pre-processed using various techniques to reduce the irrelevant distortions as shown in Fig. 2(b). In the first module, the EEG signals are divided into different frames to analyze the non-stationary characteristics better via ZTFW. Each division of EEG signals is given to the DWT to denoising the extrinsic signal artifacts.

(a). Original EEG signals.

(b). Pre-processed EEG signals.
These processed signals are decomposed as a nonstationary signal into multiple components using EMD. Finally, the FEF is used for removing the intrinsic noise artifacts of brain waves from the EEG signals.
Wigner-Ville distributions produce a cross-term and a diminution at low frequency. To overwhelm these challenges, SPWVD is utilized to convert the time-domain filtered EEG signals into TFR. To achieve maximum resolution, independent time and frequency smoothing is applied to the SPWVD. STFT and CWT have issues with their time-frequency resolution, but SPWVD has good resolution. The time-domain signals are converted into time-frequency representations in-order to keep track in the spectral domain. TFR is the simultaneous spatial representation of amplitude, time, and frequency. A sliding window in both frequency and time domains is added to the SPWVD signal to improve frequency resolution while performing the quadratic time-frequency transforming. An SPWVD-based time-frequency representation of fear EEG signal that corresponds to fear signal (E5) is shown in Fig. 3. This shows a discriminative representation of the fear state based on time-frequency-energy analysis and approximately 10000 amplitudes are present in a fear state.

SPWVD-based TFR of fear EEG signal corresponding to the filtered fear signal.
The length and sort of the cross-term lessening window can be independently selected in the frequency and time domains. As a result, SPWVD has better time-frequency cluster features. SPWVD is mathematically formulated as follows:
Where x(u) and y(u) are the cross-terms that reduce time and frequency domain windows. Scales for smoothing in the frequency and time domain can be simply adjusted. The length of the windows is x(u) and y(u) can be selected independently. According to the visual examination of Figs. 2 3, the converted EEG signals offer more valuable data than filtered time-domain EEG signals.
The proposed CDB-LSTM network has been described in this sub-section. The Convolutional Deep Belief network is used for extracting the relevant features. Based on these features Bi-directional LSTM is employed to classify the emotions of women into seven different classes. The architectural representation of the proposed CDB-LSTM network is depicted in Fig. 4.

Architecture of proposed CDB-LSTM Network.
The Convolutional Deep Belief Network (CDBN) is an improved version of DBN, and they share many features like the technique of stacking the building blocks, the approach of training the network, and the building blocks. The primary discrepancy between them is the formation of the building blocks. CDBNs are constructed by convolutional restricted Boltzmann machines (CRBMs). CRBM is devised to address the issue of scaling methods to realistic-sized images.
The main concept is weight sharing, in which weights of hidden layers and visible layers are distributed across all places in the images, greatly reducing the number of constraints and making the depictions invariant to minor translations. The CRBM is composed of two layers: visible layers L v and hidden layers L h . Each channel of the visible layer is composed of NL v XNL v real-valued units. The hidden layer contains G groups, each of which comprises NL h XNL h hidden units. Furthermore, the probabilistic maximum pooling process can lessen the computational load though still enabling full probabilistic inferences. CRBM combines max-pooling characteristics with probability-based inference. The probabilistic max-pooling energy function is defined as follows,
Where
In this phase, the Bi-LSTM model was trained to classify the emotions based on the best-extracted features from CDBN. LSTM is one of the RNN models capable of constructing a large-scale neural network structure. As opposed to RNN, the gradient problems are avoided by LSTM by using memory efficiently. However, Bi-directional LSTM, has input flowing in both directions, either backward or forward differentiating it from a regular LSTM. As information propagates forward, the state of the LSTM model can only be determined based on previously processed input. On the other hand, Bi-LSTM takes both past and future data into account, allowing it to handle contextual data efficiently.
There are four main parts to the LSTM classifier: input gate, output gate, memory cell and forget gate as shown in Fig. 5. The input data is kept in the memory cell for a short period or a long period. The input gate manages the volume of data, whereas the Forget Gate controls data retention in the LSTM cell. By controlling the information in the LSTM layer cell, output activation for the gate can be computed and formatted. The relationship between input, hidden states, and different gates can be derived from Equations (4–8).

Basic structure of LSTM.
Where, ig
t
, og
t
, fg
t
, andc
t
represents the input gate, output gate, forget gate, and the cell at time t;
The primary function of the global max pooling layer is to extract as many features as possible from the given temporal data. A Flatten layer is typically used to transform the final feature maps into a one-dimensional array. The one-dimensional array is then used as the input of the fully connected layer. The FC layer is the feed-forward neural network, and all the neurons between layers are interconnected. By multiplying the inputs by the weight matrices and adding the bias vectors, the output of the FC layer is determined as,
Where x denotes the input of fully connected and O(x) denotes the output of the network. The softmax layer translates the values into prospects, and the classification layer classifies the women’s emotions into seven different classes.
The learning rate and batch size are fixed at 0.001 and 50 respectively. The number of epochs is fixed to 30 and a total of 3000 iterations are performed with 100 iterations per epoch. The hyperparameter setting of the proposed model is represented in Table 1. Based on the EEG signals as a sample of input data, this approach targets the most optimal solution to progress the efficiency of the proposed network by using these parameters.
Hyperparameter settings of the proposed network
In this section, the proposed Deep belief-LSTM approach has been assessed using various parameters namely accuracy, specificity and sensitivity with the DEAP dataset. The experimental setup was implemented using the Python platform. The benchmark metric comprises with the total accuracy rate, which is clearly stated and assessed the performance of the proposed CDB-LSTM network. Furthermore, the evaluation of the proposed CDB-LSTM network with traditional machine and deep learning models is also presented in this section.
Performance analysis
The efficacy of the proposed CDB-LSTM network was evaluated using specific metrics specificity, sensitivity and accuracy.
Performance metrics of the proposed methodology

Performance analysis for 7-class emotion classification.
The efficiency of the proposed CDB-LSTM network is exposed in the Table 1 based on the DEAP dataset. In this result analysis, the DEAP dataset was employed for classifying the seven different emotions from the EEG signals, in which 70% of the data was used as training data and 30% was used for testing. The validation data was taken from the training data to tune the parameters of the network. The proposed CDB-LSTM network has acquired high accuracy in both validation and training phases as shown in Fig. 7 based on epochs that range from 0 to 30. Similarly, the validation and training loss curve is depicted in Fig. 8 with the same number of epochs. The accuracy attained by the proposed model is 97.27% with the loss value of 2.73% for 30 epochs.

Validation and Training accuracy of the proposed CDB-LSTM network.

Validation and Training loss of the proposed CDB-LSTM network.
The ROC generated for seven classes that include happy, sad, relax, fear, anxiety, angry, and stress can be measured via FPR and TPR parameters. The ROC is depicted in Fig. 9. The proposed method attained higher AUC of 0.996, 0.977, 0.976, 0.984, 0.958, 0.954 and 0.964 for the above-mentioned seven different classes respectively.

ROC of the proposed CDB-LSTM network.
A comparative evaluation was made between the proposed network and the classic deep learning and machine learning models in this division. Table 2 illustrates the comparison of the overall performance of machine learning models with the proposed technique. From Table 3 the traditional ML classifiers like Support vector machine, Random forest, Decision tree, Naive Bayes, and K-nearest neighbor were compared with LSTM. The hyperparameters of each aforementioned ML classifier are different for emotion classification. To the SVM algorithm, the most important two hyperparameters are C and Gamma, and it is set as 0.031 and 3.051 respectively. The polynomial and RBF are used as SVM kernels for attaining 88.2% of accuracy. RF and DT have max_depth of 14 and 25 for obtaining the accuracy of 87.6% and 85.9% respectively. The kNN yields an accuracy of 93.2% by tuning parameters like n_neighbors of 5 and leaf_size of 30. On the other hand, the Naive Bayes classifier works on the prior probability of different classes, as well as the probability of different attributes for each class with an accuracy of 85%. The LSTM model tunes with the hyperparameters like the batch size as 128, dropout of 0.2 and softmax activation function that preserves the high accuracy of 95.4%. From the above comparison, the LSTM attains better performance than the traditional ML classifiers in terms of accuracy, sensitivity and specificity. Figure 10 shows that the accuracy obtained by SVM, RF, DT and NB is 88.21, 87.65, 85.98, and 85.02 respectively.
Comparison of traditional machine learning models for classification
Comparison of traditional machine learning models for classification

Graphical comparison of ML classifiers with the proposed model.
Table 4 illustrates the comparison of the performance of ML classifiers with the proposed approach based on specificity, accuracy and sensitivity. From Table 4 the traditional deep learning networks viz CNN and LSTM obtain less accuracy compared to the proposed approach.
Comparison of traditional deep learning models for classification
The proposed Bi-LSTM preserves the high accuracy range of 97.27%. Figure 11 shows that the accuracy obtained by CNN and LSTM is 93.15% and 95.44% which is comparatively low than the Bi-LSTM. The accuracy rate obtained by the proposed Bi-LSTM model is more efficient than the existing models.

Graphical comparison of DL networks with the proposed model.
Similarly, the efficiency of the proposed approach was evaluated using quantitative measures such as computational complexity and time. Figure 12 shows that the Bi-LSTM is more suitable for emotion classification because it has low computational complexity with minimal classification time. On the other hand, traditional methods like CNN and LSTM are more time-consuming networks with high computational complexity in which LSTM performs better than CNN. Since CNN extract high-level features with convolutional layers and maximum pooling layers that require more time, while LSTM can capture long-term dependencies between EEG signals, so they are more suitable for emotion recognition. From this evaluation, Bi-LSTM attains a high level of accuracy with low computational complexity.

Comparison of DL networks with the proposed model based on a quantitative measure.
Performance comparison of CDB-LSTM network with the latest emotion recognition models
Table 5 demonstrates the accuracy assessment of the proposed approach with various datasets like DEAP, GAMEEMO and DREAMER. Due to the presence of both EEG and ECG signals in the DREAMER dataset, the proposed CBD-LSTM network attains low accuracy, as it does not train using ECG signals. Moreover, by using the GAMEEMO dataset, the proposed model yields better accuracy than the DREAMER dataset. From this comparison, the proposed CDB-LSTM network illustrates high level of accuracy in the DEAP dataset, but it does not achieve good results in the DREAMER and GAMEEMO datasets.
Accuracy comparison of CDB-LSTM network with different datasets
The comparison of the CDB-LSTM network with the latest emotion recognition models is depicted in Table 6. The accuracy of the proposed model was evaluated based on the DEAP dataset for emotion recognition and obtained a high accuracy rate. From Table 6, the CNN-LSTM model [2] was proposed for classifying the emotions into five categories namely: Happy, sad, fear, anger, and disgust with an accuracy range of 91.24%. Multi-domain feature fusion model [10] was proposed for classifying emotions into nine categories namely: Happy, relaxed, pleased, neutral, excited, calm, depressed, miserable, and distressed, with an accuracy range of 65.19% which is comparatively low compared to other models. GCNN-LSTM model [14] was used for classifying the emotions into four categories namely: Anger, sadness, fear, and disgust with an accuracy range of 90.60%. The deep CNN with SVM and DT [23] was used for classifying the emotions into four categories namely: Joy, sadness, surprise, anger, disgust, and fear with an accuracy range of 94.06%.
In comparison to the proposed CDB-LSTM network, the existing emotion recognition system does not perform efficiently. The proposed CDB-LSTM network outperforms by attaining high classification accuracy on this site and it obtains 97.27% of accuracy. The proposed CDB-LSTM network improves the overall accuracy range of 6.20% 32.98% 6.85% and 3.30% better than CNN-LSTM, Multi-domain feature fusion, GCNN-LSTM, and CNN with SVM and DT respectively. Thus, it is seen that the CDB-LSTM network performs better than other techniques. In the future, the ten-class classification framework will be trained with different datasets to accurately classify emotions.
The proposed CDB-LSTM network focuses on classifying EEG signals into seven classes more accurately. Initially, the preprocessed EEG signals described in the first module are converted into images via the time-frequency representation (TFR). Time-domain EEG signals are converted into input images by SPWVD. Consequently, these converted images are given to the Convolutional Deep Belief Network (CDBN) as input for extracting the most relevant features. Finally, Bi-directional LSTM is used for classifying women’s emotions into seven classes, namely: happy, relax, sad, fear, anxiety, anger, and stress. In terms of precision, the proposed model outperforms pretrained networks while requiring far fewer learnable parameters. A comparison of the results reveals that the proposed CDB-LSTM network outperforms the state-of-art frameworks. This method can be applied to the creation of a human-computer interface based on EEG data. The proposed CDB-LSTM network preserves the high accuracy range of 97.27%. The proposed CDB-LSTM network improves the overall accuracy range of 6.20% 32.98% 6.85% and 3.30% better than CNN-LSTM, Multi-domain feature fusion, GCNN-LSTM, and CNN with SVM and DT models respectively. In the future, optimal window sizes and selections can be used to transform EEG signals into images. The optimization of hyperparameters can be utilized to progress the system efficiency, and further classifications will be trained to accurately classify the emotions to progress the security of women.
