Abstract
The paper proposes a deep learning model based on Chebyshev Network Gated Recurrent Units, which is called Spectral Graph Convolution Recurrent Neural Network, for multichannel electroencephalogram emotion recognition. First, in this paper, an adjacency matrix capturing the local relationships among electroencephalogram channels is established based on the cosine similarity of the spatial locations of electroencephalogram electrodes. The training efficiency is improved by utilizing the computational speed of the cosine distance. This advantage enables our method to have the potential for real-time emotion recognition, allowing for fast and accurate emotion classification in real-time application scenarios. Secondly, the spatial and temporal dependence of the Spectral Graph Convolution Recurrent Neural Network for capturing electroencephalogram sequences is established based on the characteristics of the Chebyshev network and Gated Recurrent Units to extract the spatial and temporal features of electroencephalogram sequences. The proposed model was tested on the publicly accessible dataset DEAP. Its average recognition accuracy is 88%, 89.5%, and 89.7% for valence, arousal, and dominance, respectively. The experiment results demonstrated that the Spectral Graph Convolution Recurrent Neural Network method performed better than current models for electroencephalogram emotion identification. This model has broad applicability and holds potential for use in real-time emotion recognition scenarios.
Keywords
Introduction
Emotion recognition has become a research focus in the fields of psychology, neuroscience, and medicine [1]. In order to accurately capture and interpret human emotions, researchers have adopted various measurement methods, primarily including audiovisual techniques and physiological techniques [2]. Audiovisual techniques rely on external expressions such as facial expressions, language, and gestures, which are prone to overlooking subtle emotions and are influenced by human control and deception. In contrast, physiological techniques based on electroencephalography (EEG) provide a more reliable and objective approach to emotion recognition [3]. As a result, there has been increasing attention to emotion recognition based on EEG signals.
Mehrabian expanded the emotion model from two-dimensional to three-dimensional [4]. The three-dimensional emotion model includes the addition of dominance to the V-A model initially proposed by Russel [5]. It involves describing the emotional state of individuals based on three dimensions: valence (i.e., calm/excited), arousal (i.e., unpleasant/pleasant), and dominance (i.e., uncontrollable/controllable). This study employs a three-dimensional emotion model to evaluate the classification performance of the system. This model provides a more comprehensive representation of emotions and possesses enhanced capabilities for emotion analysis.
Deep learning methods have been widely applied in emotion recognition research based on EEG. Due to the temporal characteristics of EEG signals, some researchers utilize recurrent convolutional networks to capture the temporal dependencies of EEG signals and better explore the temporal correlations within the signals. Chowdary, MK et al. used three architectures, Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), and Gated Recurrent Unit (GRU), to identify emotions with EEG signals. It was finally concluded that RNN improved the recognition results compared with traditional classification methods [6].
However, these methods focus solely on temporal features while neglecting the spatial dimension. Xu, GX et al. introduced a hybrid GRU and CNN deep learning framework called GRU-Conv to extract critical spatial and temporal features from EEG data, with an average accuracy of 70.07% on the DEAP dataset Valence [7]. In reality, the distribution of EEG channels is not grid-like but rather exhibits irregular connections. This poses limitations for CNN in capturing structural information from the electrodes.
To overcome this issue, researchers have proposed methods to construct complex brain networks, where electrodes are abstracted as nodes and their connections are abstracted as edges. Graph Neural Network (GNN) can be utilized to learn from this type of graph-structured data. Zhu et al. [8] and Yin et al. [9] employed distance-based approaches using GCN to explore the relationships among EEG channels. However, compared to computationally intensive Euclidean distance, cosine similarity is more suitable for describing the directionality and correlation between channels. Demir et al. [10] proposed the EEG-GNN algorithm, which utilizes one-dimensional convolution in the temporal dimension to calculate Pearson correlation coefficients and employs them as functional connection weights, achieving superior classification performance compared to CNN architectures. However, the Pearson correlation coefficient is primarily suitable for measuring linear correlations and may not accurately capture the similarity of signals with non-linear relationships. To overcome this issue, this study employs cosine similarity to construct the adjacency matrix of EEG signals. Cosine similarity does not rely on the assumption of linear correlation in signals, resulting in higher computational efficiency as it only involves inner product operations between vectors. This approach is applicable to real-time systems and enables better capture of the similarity between signals.
To overcome the limitations of existing techniques, this paper proposes a deep learning model called Spectral Graph Convolutional Recurrent Neural Network (SGCRNN) based on ChebNet and GRU, specifically designed for emotion recognition from multi-channel EEG data. The SGCRNN model effectively extracts the spatiotemporal features of EEG signals and captures the local relationships between channels. In this study, ChebNet approximates graph convolutional operations using Chebyshev polynomials, enabling the effective capture of local relationships among nodes in the graph data. Modeling these local relationships is crucial for capturing channel-to-channel correlations in EEG signals and facilitates the extraction of more accurate feature representations. The main contributions of this paper are as follows: By utilizing the cosine similarity of the spatial positions of the EEG electrodes, it is possible to capture the local relationships between EEG channels more accurately. This method takes advantage of the computational efficiency of cosine distance, effectively reducing training time, and enhancing its practicality and value in real-time monitoring applications. The use of ChebNet as a replacement for matrix multiplication in the GRU network, along with the utilization of Chebyshev polynomials for local approximation, avoids explicit matrix multiplication operations. This approach benefits the model’s complexity and computational efficiency. This method contains the advantages of the Chebyshev network for extracting spatial features of EEG sequences and also utilizes the features of GRU for extracting EEG sequences. SGCRNN solves the problem of the weak spatial feature extraction ability of RNN, achieving full capture of the spatial and temporal dependence of EEG sequences. This study proposes preprocessing methods such as time slicing and data augmentation and demonstrates their effectiveness through ablation experiments. By comparing with other models, the results show that the SGCRNN method achieves superior emotion recognition performance in the three-dimensional emotion model. These experiments validate the effectiveness and superiority of the novel method proposed in this paper.
Related work
Feature extraction
In emotion recognition research, commonly used representative EEG features are shown in Table 1. Despite the existence of various manually extractable EEG features, these traditional handcrafted features are based on a significant accumulation of domain knowledge, thus increasing the learning cost for researchers. Furthermore, most of the current neural signal features are still based on traditional time-series signal analysis theories and methods. However, the correlation between these signal features and emotional states remains unclear and requires further exploration, with certain limitations in their effectiveness.
Common methods for EEG feature extraction
Common methods for EEG feature extraction
High-level cognitive functions rely on subtle coordination between local and global brain activities, which are closely related to the network of neurons and brain regions [11]. There is inherent correlation in the brain electrical signals originating from different brain regions, making the study of brain networks a topic of extensive interest [12]. J. Jia et al. proposed a method that combines the distance and functional connectivity between EEG channels to construct a graph network for emotion classification in a two-dimensional emotion model [13]. However, calculating Pearson correlation coefficients and Euclidean distances is relatively complex, requiring consideration of multiple factors such as means and standard deviations. In contrast, cosine similarity calculations are relatively simple and efficient, involving only inner products between vectors. This makes cosine similarity advantageous for processing large-scale data and real-time systems.
The brain network constructed using cosine similarity reflects the coupling correlation between two EEG channels, making it insensitive to amplitude changes. This characteristic reduces the impact of inter-individual differences and helps establish robust and accurate EEG-based recognition models. Considering these factors, this study chooses the method of constructing brain networks using cosine similarity for extracting emotional features from EEG signals.
Hand-engineered approaches have certain limitations in the analysis of EEG signals for emotion recognition. Firstly, they rely on domain knowledge, which restricts their generalizability and applicability. Secondly, these methods often focus only on local feature extraction and fail to capture the global dynamics and spatiotemporal relationships of EEG signals comprehensively. Moreover, handcrafted methods have limited expressive power, which may result in the loss of important information and affect the accuracy of emotion classification. They are also highly dependent on specific tasks and datasets, making them less applicable to new tasks and datasets. Lastly, manual operations and subjectivity can lead to uncertainties and irreproducibility. To overcome these limitations, exploring methods based on machine learning and deep learning can automatically learn relevant features and patterns in EEG signals, thereby improving the accuracy and generalization capability of emotion analysis.
EEG signals are a type of sequential data, and the memory units and temporal feedback connections in RNN enable it to effectively handle the temporal characteristics of the signals. Moreover, the emotional information in EEG signals may be influenced by long-term dependencies, and RNN can capture such dependencies and model the emotional features more effectively. Therefore, many researchers choose to apply RNN in the study of EEG-based emotion recognition. J. X. CHEN et al. proposed a hierarchical bidirectional recurrent unit with an attention GRU network for human emotion classification from continuous EEG signals. The model showed a more robust classification performance than the baseline LSTM model [14]. However, in EEG signals, the arrangement of electrodes forms spatial relationships, which would be overlooked if only RNN is used for analysis. To fully leverage the spatial information in EEG signals, we can introduce CNN, which can effectively capture the local spatial features in EEG signals.
Through CNN, we can extract local spatial features from EEG signals, such as the correlation between electrodes and the topological structure. This helps to analyze the emotional content of EEG signals more accurately. S Tripathi et al. investigated two neural network models, a simple Deep Neural Network (DNN) and a CNN, to categorize user emotions by EEG signals. It showed that neural networks could effectively classify brain signals that outperform traditional methods [15]. Yang et al. proposed a method based on a multicolumn CNN algorithm that can classify emotions based on EEG signals obtained from a DEAP database [16]. Liao et al. extracted statistical features of EEG and sent them to CNN, and the accuracy of Valence in binary classification reached 81.4% [17]. Salama et al. used a 3D-CNN deep learning architecture to extract spatiotemporal features from EEG signals and proposed a combination of data augmentation and integrated learning techniques to obtain the final fusion prediction [18]. Cui et al. proposed a emotion recognition method based on two-dimensional convolution neural networks and three-dimensional convolution neural networks, called ResNeXt Attention 2D-3D Convolutional Neural Networks (RA2-3DCNN). The results proved the spatio-temporal effectiveness of the method for emotion classification [19]. Iyer et al. developed a hybrid model based on a combination of CNN and LSTM for precise emotion detection. The results indicate that the integration of CNN and LSTM outperforms the use of a single CNN in feature extraction [20]. Kim et al. proposed integrating a CNN with an RNN with skip connections, creating a superior predictive model based on time-series data. The results indicate the remarkable efficiency of GRU compared to LSTM [21]. However, CNN is primarily designed to handle flat-structured data and faces challenges in directly processing the connectivity relationships within EEG signals. EEG signals exhibit complex connections between electrodes, forming a brain network. In contrast, GCN can effectively capture the inter-electrode connectivity relationships in EEG signals and utilize graph structures for information propagation.
An increasing number of researchers are utilizing GCN [22] for EEG-based emotion recognition tasks. P Zhong et al. proposed a regularized graph neural network (RGNN) for EEG-based emotion recognition, which considered the biotopology among different brain regions and modeled the inter-channel relationships in EEG signals by the adjacency matrix in the graph neural network [23]. T Song et al. proposed a novel dynamic graph convolutional neural network (DGCNN)-based method for multi-channel EEG emotion recognition, which can be trained to dynamically learn the intrinsic relationships among different EEG channels, thus facilitating EEG feature extraction [24].
In summary, this paper proposes an innovative approach that combines Chebnet with GRU. The method leverages Chebnet to replace the matrix multiplication operation in GRU, resulting in more efficient computations. By introducing Chebnet, the computational complexity is reduced, and the training and inference speed of the model are accelerated. This combined approach not only improves computational efficiency but also retains the advantages of GRU in sequence modeling, enabling the model to better handle the temporal relationships in EEG signals. Therefore, this method has the potential advantage in tasks such as EEG signal processing and emotion recognition.
Graph-based EEGs modeling
This section presents a method of feature extraction using brain networks, specifically by constructing brain networks based on the cosine similarity of electrode spatial positions in EEG data. Additionally, we provide a detailed explanation of the principles behind spectral graph convolution and GRU, which form the foundational components of the SGCRNN method.
Brain network construction
A graph structure in mathematical terms can be written as the following expression.
Functional connectivity, distance-based and neural networks can be employed to determine the value of W ij . This paper leverages the fast computation advantage of cosine distance and adopts a method based on cosine distance between EEG electrodes to construct an adjacency matrix, which captures the local relationship between EEG signals.
Due to the presence of noise in brain electrical signals, using traditional Euclidean distance to construct the adjacency matrix may result in a matrix that is too sparse, leading to poor feature extraction effectiveness. However, using cosine distance to construct the edge matrix can better capture the local relationships in the brain electrical signals and effectively reduce the impact of noise interference. Additionally, due to the fast calculation of cosine distance, using cosine distance to construct the adjacency matrix can effectively reduce the training time of the model and be suitable for real-time monitoring. To capture the local relationship among EEG electrodes, the adjacency matrix W is constructed using the cosine distance between the EEG electrode position vectors. The cosine distance between EEG electrodes is calculated as follows.
The equation for constructing the adjacency matrix W is as follows.
Based on preliminary experiments, κ = 1.5 is chosen to construct the adjacency matrix for all EEG fragments used in the experiment. The resulting universal undirected weighted graph is shown in Fig. 1.

The undirected weighted graph generated at κ = 1.5.
The brain network we have constructed reflects the coupling correlation between two EEG channels. As a result, the network is not highly sensitive to changes in amplitude. This characteristic helps reduce the impact of inter-individual differences on the results, thus facilitating the establishment of a robust and accurate EEG-based recognition model.
Spectral graph convolution is an algorithm for processing graph data using neural networks. It combines the concepts of graph theory and neural networks by using graph convolutional operations to model and process graph data, such as Laplace transform and Fourier transform. Graph data is represented as a frequency spectrum matrix, which combines the frequency information of each node with the graph structure information. Then, through convolutional operations on the frequency spectrum matrix, graph features are extracted, and graph Laplace matrix is used to study the properties of the graph. The symmetric normalized Laplace matrix of graph G is defined as follows.
For a given spatial signal
Where
The convolution operation for two signals x and y on graph *G is defined as
g (·) denotes a filter function, and the signal x filtered by g (L) can be expressed as
Since the step of doing the eigendecomposition of L is time-consuming, the K-order Chebyshev polynomial is used instead of the spectral domain convolution kernel, that is the approximation g (Λ), to reduce the parameter complexity. The derivation equation is as follows
Combined with equation (8), it can be converted as follows
GRU is a variant of RNN with a gating mechanism, which is a gated recurrent neural network to better capture the dependencies with larger intervals in the temporal data. Its input contains: input x
t
at t, hidden layer state ht-1 at t - 1, and output structure contains: hidden node output y
t
at t, hidden layer state h
t
passed to the next node. The process of obtaining the state of reset gate x
t
and update gate t - 1 by the state ht-1 of the previous layer and the current input x
t
is as follows.
The candidate hidden layer states are
The final hidden state is
This section provides a detailed overview of the SGCRNN model for addressing EEG emotion recognition.
SGCRNN model
Inspired by DCRNN [27], this paper uses a recurrent neural network with spectral graph convolution as an EEG signal sentiment feature extractor to simulate the spatial and temporal dependence of EEG signals. In this paper, ChebNet [28] is employed instead of matrix multiplication in GRU for spatial and temporal modeling of EEG signals (referred to as CNGRU). CNGRU has the advantage of both GRU for extracting temporal correlation and spectral graph convolution for extracting frequency and spatial domain features.
The internal computations of *G are represented as shown in Fig. 2. The input x concat consists of the concatenation of the input at time t, x t , and the hidden layer ht-1 at time t - 1. The output is the result of the Chebnet operation.

Internal computation representation of “*G”.
The CNGRU network structure is shown in Fig. 3. According to Equations (15)–(18), they can be expressed as follows.

CNGRU network architecture diagram.
The SGCRNN model consists of two stacked CNGRU layers, a fully connected layer, and a pooling layer for the EEG signal sentiment identification work. The SGCRNN model is shown in Fig. 4. The input section of the model involves handling 32-channel raw EEG signals, each characterized by a specified duration. After applying preprocessing procedures, discrete signals are generated. Subsequently, cosine similarity is employed to construct adjacency matrices for each second, facilitating the creation of a brain network adept at capturing spatial features. It’s important to highlight that the attributes of each node originate from EEG signal feature vector values associated with distinct EEG electrodes.

SGCRNN model diagram.
Further insights into the model’s fundamental architecture are established. In this context, a two-layer stacked CNGRU network assumes a pivotal role as the encoder. Sharing similarities with a GRU but featuring enhanced complexities, this network operates through iterative computations executed via time loops. Elaboration on this operational mechanism can be found in Section 4.1, alongside pertinent formulas. Within the encoder module, the “Time2” parameter is set to encompass 12 layers, mirroring the concept of 12 time-based iterations. Consequently, the outputs of the hidden layers adopt a specific structure of (seq_len, hidden_units * num_nodes). For clarity enhancement, topology maps are employed as visual aids, effectively illustrating data formatted as (hidden_units * num_nodes,). Within this visualization, each individual node encapsulates hidden_units data. By leveraging the “Last Relevant Output” component, the model extracts the ultimate pertinent output from the sequence. This output structure takes the form of (num_nodes, hidden_units), thereby delineating the configuration of subsequent “Out” components in terms of hidden_units layers.
The subsequent transition involves the FC layer, employing a Linear function to convert the data into a format denoted as (num_nodes, num_classes). Conclusively, post-processing through a max-pooling layer drives data transformation into the configuration of (num_classes).
In summation, the model adeptly amalgamates spatial aspects of EEG signals with temporal considerations, resulting in an all-encompassing examination of brain networks. This methodology culminates in precise predictions as the model adeptly deciphers intricate patterns and interconnectedness within the brain network.
In this study, the network parameters are iterated to their optimum values using the backpropagation approach. Therefore, a loss function is defined based on the mean square error, and the SGCRNN model’s loss function is defined as follows.
The SGCRNN algorithm is described in Algorithm 1.
Introduction to data sets
More than 85% of physiological signal emotion recognition studies use the DEAP dataset [29]. The DEAP database [30] is an experimentally gathered multimodal dataset by researchers from Queen Mary University of London in the UK and other institutions to study human emotional states. The researchers recorded EEG and peripheral physiological signals from 32 participants while watching 40 one-minute music videos. Each movie was given a rating from 1 to 9 by participants based on its valence, arousal, like, and dominance.
Data preprocessing
The 32 channels of labeled EEG signals acquired from this dataset were used for the experiments in this paper, and the data were preprocessed as follows. First, the data were downsampled to 128 Hz, EOG artifacts were removed, and a band-pass frequency filter of 4.0-45.0 Hz was applied to average the data to the same reference. Delete the first three seconds of the baseline signal. In general, the duration of human emotional states is 1 second to 12 seconds. To increase the amount of training data, the 60-second EEG experiment is divided into 12-second time slices. The divided data S = {S1, S2, . . . , S n }, where, S i ∈ RM*T, the number of EEG channels M = 32, the number of sampling points T = 1536, and the number of time slices n = 5. Apply the “fft” function from the Scipy python package to each t-second window and retain the logarithmic amplitude of the non-negative frequency components. During the training process, data augmentation can be used by applying random reflections along the scalp midline. This method increases the diversity and randomness of the data by applying random reflections to the EEG sequence and scaling the amplitude of the EEG signal randomly in the range [0.8, 1.2]. This improves the reliability and accuracy of data analysis.
The shape of the data in the dataset is shown in Table 2.
Dataset format
Dataset format
The prediction accuracy and mean absolute error are used to evaluate the SGCRNN model performance, and they are calculated as follows.
The SGCRNN model consists of two stacked CNGRU layers and 64 hidden units. The Chebyshev polynomial order is set to K = 2, and the number of graph nodes is 32. The activation function is the ReLU activation function. The maximum number of epochs MAX is 300. The dropout probability is 0 (i.e., no dropout). The learning rate η = 1e(-4). The batch size for the training set is 512, while the batch size for the validation set and test set is 128. The regularization coefficient of the loss function is α = 0.001. The optimizer uses the Adam optimizer. During training, if the loss value after ɛ = 5 epochs is higher than the previous epoch, the training is terminated. CosineAnnealingLR learning rate scheduler is used to train the deep learning model. The scheduler adjusts the learning rate periodically based on the time function of the learning rate. During training, the learning rate gradually decreases with time, achieving better training results. The learning rate curve is shown in Fig. 5. The model was trained and tested on RTX 3090, implemented using Python 3.8.10 and Pytorch 1.11.0. The training set, validation set and test set were divided in the ratio of 8 : 1:1 in the experiments.

Learning rate scheduler curve.
Ablation experiments are performed in this section to explore the contribution of several important components used in this article to the approach. The first ablation experiment was conducted to verify the effect of the fast Fourier transform, time slice and data enhancement methods used in this paper on the improvement of prediction ability. The second ablation experiment is to test whether the proposed method of establishing the adjacency matrix can further improve the prediction accuracy.
Different ways to process data
After experimental comparison, this study obtained four sets of data. Experiment 1 directly used time-domain features for training, without FFT; experiment 2 used 60 seconds of data for training, without time slicing; experiment 3 did not perform data augmentation. Experiment 4 used fast Fourier transform and divided the data into 12-second time slices, while also performing data augmentation. We obtained corresponding results through accuracy tests in the three dimensions of valence, arousal, and dominance, as shown in Fig. 6. In addition, the MAE values of various methods have been listed in Fig. 7.

Accuracy of validation set in three dimensions.

MAE of different methods on the test set.
The three line graphs in Fig. 6 illustrate how the validation set accuracy of the model across three sentiment dimensions changes with an increase in training epochs. They offer a visual understanding of the model’s performance and learning progress. The horizontal axis of the line graphs represents the number of training epochs, while the vertical axis represents the accuracy of the model on the validation set. The accuracy on the validation set serves as a measure of the model’s performance in this sentiment analysis task. A higher accuracy signifies a better match between the model’s predictions and the actual sentiment labels. The bar chart in Fig. 7 provides a summary of the model’s accuracy on the test set for each sentiment dimension, allowing for a quick comparison of the model’s performance across different emotion categories.
Through the comparison of the four experiments mentioned above, it can be observed that the approach used in Experiment 1 had lower accuracy and relatively larger errors in all emotional dimensions, performing worse compared to Experiments 3 and 4. Similarly, Experiment 2’s approach exhibited lower accuracy and larger errors in all emotional dimensions, indicating poorer performance. This suggests that using longer data segments for training is not conducive to improving the accuracy of emotion prediction. On the other hand, Experiment 3’s approach had relatively smaller errors in Valence and Arousal, but slightly larger errors in Dominance. Experiment 4 achieved the highest accuracy and lowest loss values by utilizing techniques such as FFT, time slicing, and data augmentation.
This result indicates that using these technologies can effectively improve the effectiveness of sentiment analysis. Specifically, the fast Fourier transform can convert time-domain signals into frequency-domain signals, thereby better capturing signal characteristics in different frequency ranges. Time slicing can divide long time series into multiple short time periods for processing, avoiding the complexity and difficulty brought by long time series, and better grasping the dynamic changes in instantaneous situations. In addition, using randomly reflected signals along the midline of the scalp can be used for data augmentation, which can extend the dataset, increase the diversity of data, and thus improve the model’s generalization ability. Therefore, the experimental results demonstrate the advantages of the methods used in the data processing process in this paper.
The experiment compared three methods for constructing adjacency matrices, including the method based on Euclidean distance of EEG electrode spatial positions, the method based on cosine similarity of EEG electrode spatial positions, and the method based on correlation of EEG channel features. As shown in Fig. 8, the corresponding accuracy results were achieved in three dimensions valence, arousal, and dominance. The MAE values for different methods, as well as the total training time and testing set evaluation time, are shown in Table 3.

Accuracy of validation set in three dimensions.
Test set MAE of different methods
The three line graphs in Fig. 8 illustrate the changes in validation set accuracy across three sentiment dimensions as the training epochs progress, considering the variations resulting from different methods used to construct the adjacency matrix.
Method 1 had MAE values of 1.792, 1.700, and 1.668 in the three-dimensional emotional dimensions. The total training time was 3.1192 hours, and the testing set evaluation time was 4 seconds. Method 2 had MAE values of 1.794, 1.700, and 1.652 in the three-dimensional emotional dimensions. The total training time was 14.7242 hours, and the testing set evaluation time was 20 seconds. Method 3 had MAE values of 1.787, 1.694, and 1.650 in the three-dimensional emotional dimensions. The total training time was 2.8892 hours, and the testing set evaluation time was 3 seconds.
Compared to the other two methods, the method of constructing graph adjacency matrix based on cosine similarity of EEG electrode spatial positions shows superiority in training time, accuracy, and loss value. Specifically, the proposed method in this paper has a significantly shorter training duration compared to the other two methods. This will significantly improve training efficiency and reduce the time and energy costs for researchers. Furthermore, in terms of accuracy and loss value, our proposed method outperforms the other two methods, significantly improving the predictive performance and generalization ability of the model. Therefore, our proposed method has high practical value in the application of real-time monitoring.
This study evaluated eight EEG emotion recognition models by comparing their prediction accuracy on the validation and test sets. The first model is an Long Short-Term Memory (LSTM) model based on LSTM recurrent neural networks. The second model is a CNN-LSTM model that combines Convolutional Neural Networks with LSTM. The third model is an ACRNN [31] model that combines CNN, LSTM, and attention. The fourth model is a Mean_fusion model that combines the SGCRNN model with the ACRNN model and averages the fusion of EEG signals with peripheral physiological signals. The fifth model is a Attention_fusion model that utilizes attention for multimodal mental signal fusion. The sixth model is a DGCNN [24] model based on Dynamic Graph Convolutional Neural Networks. The seventh model is a GRU model without the ChebNet operation. Additionally, a novel SGCRNN model proposed in this paper is included. All models were trained using the data preprocessing methods proposed earlier. On the validation set, the eight models’ accuracy in predicting Valence, Arousal, and Dominance is shown in Fig. 9, while their prediction accuracy on the test set is shown in Fig. 10.

Accuracy of validation set in three dimensions.

MAE of different methods on the test set.
Incorporating Chebyshev polynomials as a replacement for matrix multiplication in the GRU architecture results in a notable enhancement of parameter efficiency for the SGCRNN model. Specifically, the SGCRNN model exhibits a reduced number of trainable parameters, with a total count of 748,562, as opposed to the GRU model which boasts 1,098,817 trainable parameters. This difference underscores the efficacy of our proposed approach in achieving parameter reduction while maintaining model performance. This superiority can be analyzed from two critical perspectives. Firstly, the reduction in trainable parameters contributes to alleviating model complexity, consequently mitigating the risk of overfitting to a certain extent. Secondly, the diminished parameter count translates to reduced computational load and memory requirements, potentially leading to accelerated inference speeds. That is a key advantage, especially in real-time applications.
Moreover, the decrease in trainable parameters does not substantially compromise the performance of the SGCRNN model. Despite the reduced parameter count, the incorporation of Chebyshev polynomials enables the model to preserve its ability to capture spatiotemporal features and handle sequential data, thus ensuring model accuracy and efficacy.
Figure 9 illustrates the changes in validation set accuracy of the model across different sentiment dimensions with an increase in training epochs. The horizontal axis represents the number of training epochs, while the vertical axis represents the accuracy for the corresponding sentiment dimension. As the number of training epochs increases, the curve exhibits different trends and shapes, reflecting the varying learning capacity and convergence of different comparative models. Figure 10 displays the test set accuracy of different models across three sentiment dimensions, using three sets of bar graphs. The horizontal axis represents the sentiment dimensions, while the vertical axis represents the accuracy. Each bar in the bar graphs represents the accuracy of the corresponding model on the respective sentiment dimension.
According to the experimental results above, it can be found that the performance of SGCRNN model exceeds that of other seven methods (LSTM, CNN-LSTM, ACRNN, GRU, Mean_fusion, Attention_fusion, and DGCNN) in all evaluation indicators. In terms of Valence, Arousal, and Dominance, SGCRNN achieved the highest scores of 88%, 89.5%, and 89.7%, respectively. This indicates that SGCRNN has the best effect on emotion recognition of EEG time series.
In terms of convergence speed, this paper conducted an extensive comparison among seven emotion analysis models. Specifically, the SGCRNN model, due to its incorporation of the nonlinear characteristics of Chebyshev networks, captures emotion-related features within EEG signals more rapidly, resulting in a relatively swift convergence trend during the feature learning phase. On the other hand, the fusion of convolutional and recursive operations in the ACRNN model might require more training iterations to achieve stability, thus fully leveraging their role in feature extraction and temporal modeling. Within the CNN_LSTM model, the amalgamation of convolution and LSTM operations might lead to a longer training process, with the aim of better capturing the interaction between temporal and spatial information. Meanwhile, the LSTM model, due to its complex cyclic structure, might exhibit a slightly slower convergence trait when processing time-series data. Significantly, the GRU model, benefiting from its simplified gating mechanism, demonstrates a relatively fast convergence speed when learning long sequence data. In the case of fusion methods, the training speeds of the Mean_fusion and Attention_fusion models are similar, implying a minor influence of fusion strategies on training speed.
Considering both the accuracy results and convergence speeds of the models holistically, this research explicitly demonstrates the superior performance of the SGCRNN model in the task of emotion analysis.
The SGCRNN model uses ChebNet instead of matrix multiplication in GRU, which has the following advantages. Firstly, SGCRNN model can better capture the dynamic evolution of data by combining spatiotemporal dependency, which greatly improves its ability in extracting emotional features from EEG signals. Secondly, the RNN architecture of SGCRNN model can well preserve the sequential relationship of emotion information, inherit the strong sequence learning ability of RNN, and adaptively adjust the parameters of its structure based on feedback mechanism. In conclusion, SGCRNN model is an efficient and accurate method for EEG signal emotion recognition.
In this article, we propose a novel SGCRNN model for EEG emotion recognition. Specifically, we first construct a graph adjacency matrix based on the cosine similarity of EEG electrode spatial locations. Then, ChebNet is used to replace matrix multiplication in GRU, resulting in the proposed CNGRU. The EEG sequence is fed into the SGCRNN model, which consists of two stacked CNGRU layers, an FC layer, and a max-pooling layer, to obtain the prediction results. Two ablation experiments and a contrastive experiment were conducted using the DEAP dataset, and the results showed that the data preprocessing methods used in this study, such as using FFT to extract frequency domain features, segmenting time into 12-second slices, and using randomly reflected signals along the scalp for data augmentation, all contribute to improving the model’s accuracy. The novel method proposed in this study to construct the graph adjacency matrix can capture the local relationships between EEG channels and effectively improve training efficiency, outperforming existing methods for constructing adjacency matrices. Moreover, the new SGCRNN model for emotion recognition proposed in this paper can simulate the spatiotemporal dependencies of EEG time series and performs better than other advanced emotion recognition models.
In future research, we will consider the application of real-time emotion recognition and explore how to compress the SGCRNN model for real-time emotion recognition scenarios. Additionally, further research in emotion recognition should focus on addressing individual differences and incorporating them into the emotion recognition model to enhance personalized emotion recognition accuracy and effectiveness. Long-term variations in emotions should also be considered, and models should be developed to capture trends and patterns in long-term emotional changes for long-term emotion recognition and analysis. By delving into these issues, we can strengthen the research and application of emotion recognition based on EEG signals, expanding its potential value in fields such as psychology, medicine, and human-computer interaction.
By combining the expertise of manual engineering with the powerful capabilities of deep learning, we can develop more accurate, efficient, and interpretable emotion recognition systems. These systems can help businesses understand customer emotions and needs, providing personalized products and services. Additionally, they can play a crucial role in psychology and medicine, aiding in the diagnosis and treatment of emotional disorders, as well as monitoring and intervening in emotional states. Through further research and application of these methods, we can explore novel domains and contribute to society with more beneficial solutions.
Footnotes
Acknowledgment
This work is supported by the Science and Technology Development Project in Jilin Province of China (Grantnumbers: 20210402078GH).
