Abstract
Emotional state recognition is an important part of emotional research. Compared to non-physiological signals, the electroencephalogram (EEG) signals can truly and objectively reflect a person’s emotional state. To explore the multi-frequency band emotional information and address the noise problem of EEG signals, this paper proposes a robust multi-frequency band joint dictionary learning with low-rank representation (RMBDLL). Based on the dictionary learning, the technologies of sparse and low-rank representation are jointly integrated to reveal the intrinsic connections and discriminative information of EEG multi-frequency band. RMBDLL consists of robust dictionary learning and intra-class/inter-class local constraint learning. In robust dictionary learning part, RMBDLL separates complex noise in EEG signals and establishes clean sub-dictionaries on each frequency band to improve the robustness of the model. In this case, different frequency data obtains the same encoding coefficients according to the consistency of emotional state recognition. In intra-class/inter-class local constraint learning part, RMBDLL introduces a regularization term composed of intra-class and inter-class local constraints, which are constructed from the local structural information of dictionary atoms, resulting in intra-class similarity and inter-class difference of EEG multi-frequency bands. The effectiveness of RMBDLL is verified on the SEED dataset with different noises. The experimental results show that the RMBDLL algorithm can maintain the discriminative local structure in the training samples and achieve good recognition performance on noisy EEG emotion datasets.
Keywords
Introduction
The Internet of Things (IoT) is a network that uses telecommunications networks and the Internet as information carriers to connect everyday things with independent functionality [1]. IoT is not only the connection between object “things” and “things”, but also more importantly, the interaction between object “things” and subject “person”. In order to make “things” better serve “person”, IoT requires more flexible and efficient human-computer interaction (HCI). As is well known, emotions play a significant par in the interaction between people, and the same is true in HCI. When “things” can perceive human emotions, they can engage in more personalized and targeted interactions, and emotional state recognition is a key technology among them. There are various modes of emotion perception, such as facial expressions, voice intonation, body posture, physiological signals, etc., which can be used to recognize the emotional state of an individual in HCI. However, facial expressions, voice intonation, and body posture can be artificially modified through deliberate expression or disguise. Physiological signals are non-invasive, do not require active cooperation from individuals, and can be obtained continuously. Therefore, physiological signals are more objective in emotion perception [2].
Brain-computer interface (BCI) provides an effective channel for information transfer between brain and external environment by interpreting the physiological information of the brain during human thinking activities. With the development of portable BCI, emotional state recognition using BCI is also widely used in the field of HCI. Neuropsychologists believe that the left and right regions of the brain are involved in the generation of emotions, and the distinct and differential neural activities can be observed in the brain for different emotions, such as anger, fear, sadness, and disgust, all of which have distinctly different neural activity in the brain. Researches have also shown that the left frontal cortex is more associated with positive emotions and social competence, as well as negative anger emotions. The right frontal cortex is responsible for survival-related emotions, i.e., asymmetric frontal brain activity [3]. Because of the apparent direct response of emotion generation in the brain, electroencephalogram (EEG) emotional state recognition attracts a great deal of research attention. For example, in safe driving, real-time monitoring of the driver’s emotional state through EEG and taking relevant measures when negative emotions arise can reduce the occurrence of traffic accidents. In mental health, psychologists can identify the emotional state of visitors through EEG and obtain feedback information on psychological counseling; thereby they can provide effective psychological counseling programs for visitors based on emotional feedback. In healthcare, detecting emotional changes in patients through EEG can help understand their behavior, especially for those who have certain language barriers. EEG can timely obtain their emotional status, which is beneficial for obtaining detailed care and communication with the outside world.
Machine learning has been increasingly applied in EEG emotional state recognition. Some researchers divide EEG signals into several time windows and extract statistical features such as mean, variance, maximum, and median from each window. Then, machine learning models are used to model and analyze the EEG data. Cao et al. [4] analyzed the emotional state of physiological signals generated while watching videos, and further constructed a visual enhanced model to establish the connection between physiological signals and video comments. Yoon et al. [5] designed an EEG feature extraction method using Bayesian weighted log-posterior function, and then used a Bayesian classifier to identify emotional states with maximum feature values. Yi et al. [6] used continuous wavelet transform to obtain EEG temporal and frequency information. They used penalty multifunctional logistic regression to classify emotions. Although a large number of deep learning models, such as spatial dependence multi-task transformer network [7], region aggregation graph convolutional network [8], have been developed, considering that deep learning model requires a large amount of training data and high requirements for computer hardware, this study still conducts research based on machine learning model.
EEG signals are generally classified into five frequency bands, namely δ (1–3 Hz), θ (4–7 Hz), α (8–13 Hz), β (14–30 Hz), and γ (31–50 Hz). The examples of five frequency bands of EEG signals are shown in Fig. 1. The θ wave activity in the right frontal lobe is enhanced when subjects’ emotions are evoked using pleasant music. Most emotional load may be related to anterior temporal α wave activity. References [9, 10] found that the α wave activity in the frontal region can express emotional potency and emotional intensity. Experimentally, music was used to induce four emotions in subjects: happy, happy, sad and scared, and it was found that listening to music with positive and negative emotions produced stronger EEG activity in the left and right prefrontal. It was also shown that higher frequency signals such as β and γ waves have higher accuracy in recognizing emotions, while lower frequency band signals or limited in emotional state recognition [11].

Examples of multi-frequency bands of EEG signals.
Although these EEG emotional state recognition methods have been effectively applied, most of them use pure EEG data without considering the influence of noise factors. However, since the EEG signal itself is non-linear, non-smooth and very weak, it is easily contaminated by different types of noise, which makes it difficult to guarantee the reliability of the model when recognizing emotions. On the other hand, traditional EEG emotional state recognition methods mostly use splicing EEG signals of multiple frequency bands together, which cannot fully explore the connections between different frequency bands, resulting in poor recognition performance. To address the above issues, we propose a robust multi-frequency band joint dictionary learning with low-rank representation (RMBDLL) for EEG emotional state recognition. The overall system block diagram of the RMBDLL algorithm is illustrated in Fig. 2. RMBDLL consists of two parts: robust dictionary learning and intra-class/inter-class local constraint learning. In robust dictionary learning, we utilize the feature consistency and complementarity of EEG to establish multi-frequency band joint learning of dictionary learning. We use two norm constraints to characterize complex noise and apply low-rank representation in dictionary learning to remove the influence of noise. A clean dictionary can be obtained by removing the complex noise. We think that mining complementary information between EEG signals from different frequency bands will help improve the performance of EEG emotional state recognition. In intra-class/inter-class local constraint learning, based on the local geometric structure of atoms in sub-dictionaries, we construct the intra-class and inter-class local constraint terms, and unify them in the regularization term for joint optimization. The experimental results on the SEED dataset show that compared with comparison algorithms, the proposed RMBDLL algorithm is robust to noise and can maintain the data discriminative local structure, thus achieves better recognition performance in EEG emotional state recognition.
The contributions of this paper are: RMBDLL is a noise-robust dictionary learning algorithm with low-rank constraint, which combines ℓ1-norm and ℓ2-norm to characterize Laplacian distribution and Gaussian distribution noises, respectively. Unlike traditional learning methods that assume noise follows a single distribution, RMBDLL is more robust to complex noise. RMBDLL is a multi-frequency band dictionary learning algorithm, which learns corresponding sub-dictionaries for each EEG frequency band. According to the consistency of emotional state recognition, the sparse encodes learned by sub-dictionaries are the same, which can mine the complementary information between different frequency bands. In addition to considering the reconstruction performance of EEG signals, with discriminative local structure constraints, RMBDLL utilizes the intra-class similarity and inter-class difference of EEG multi-frequency bands and incorporates a regularization term composed of intra-class and inter-class local constraints to improve the model classification ability.

Overall system block diagram of the RMBDLL algorithm.
In this section, we briefly review the techniques of sparse representation, dictionary learning, and low-rank representation. For convenience, the main notations used in this study are shown in Table 1.
The main notations used in this study
The main notations used in this study
The original signal itself has varying degrees of redundancy, indicating that the signal can be sparsized and compressed. If the signal can be converted into other forms that are easier to represent, the representation of the signal will become more concise. The purpose of sparse representation [2] is to find the concise representation of input signals, and the basic framework is written as,
where
The solution of Equation (1) is an NP hard problem with complex optimization, and the most typical solution algorithm is the greedy algorithm. The greedy algorithm takes the current optimal choice in each step of the calculation process. To improve the convergence efficiency of algorithm iteration, the relaxation algorithm relaxes the norm in Equation (1) to the norm solution, as shown in Equation (2) below,
When 0 < p < 1, the solution of Equation (2) becomes a nonconvex optimization problem, which can be solved by the iteratively reweighted algorithm (IRLS). When p = 1, the solution of Equation (2) becomes a convex optimization solution, and the common solution algorithms are basis pursuit (BP) algorithm, LASSO algorithm, and iterative shrinkage threshold algorithm [12], etc.
When performing the classification task, the sparse representation uses the sparse coefficients to reconstruct the sub-dictionary in each class separately, and the classification is performed according to the reconstruction error,
Dictionary learning algorithms have shown excellent performance in EEG signal classification. The main purpose of dictionary learning is the sparse reconstruction of EEG signals, which is effective when dealing with low-complexity data. However, when the EEG signal contains noise, the performance of traditional dictionary learning is greatly degraded. In practical applications, EEG signals are often contaminated by various types of noise, the dictionary learning may not be able to accurately separate complex noise in EEG signals.
The rank of a matrix usually represents the maximum linearly independent columns or rows.
Due to the discreteness of the rank function, it is difficult to solve the rank minimization problem. Replacing the rank constraint with nuclear norm, Equation (4) becomes a convex optimization problem. Simultaneously, the ℓ1-norm is used to constrain
Robust dictionary learning
In practical applications, EEG signals are often contaminated by different types of noise, such as the ℓ1-norm based Laplace distribution or the ℓ2-norm based Gaussian distribution. In order to separate complex noise from EEG signals, the RMBDLL algorithm not only focuses on the reconstruction ability of sub-dictionaries in different frequency bands, but also aims to establish robust classification model,
The column vector of a dictionary is called a dictionary atom. Let
From Equation (7), we can see that if the similarity between two atoms is low, w
ab
is large. On the contrary, if the similarity between two atoms is high, w
ab
is small. Due to the fact that the encoding coefficients of similar atoms need to be as close as possible, one need obtain a small w
ab
to ensure high similarity between
The inter-class local constraints for encoding coefficients is constructed according to the local geometric structure of dictionary atoms of different classes,
The graph matrix
Combining Equations (8) and (10), the intra-class and inter-class local constraints are unified in the regularization term as,
where
With simplification, Equation (11) becomes as,
Defining Laplacian matrix
Suppose there is an EEG dataset
The EEG signals are rich in emotion information, and at the same time, the correlation between frequent bands carries a lot of redundant information and noise. The low-rank model can decompose the EEG signals into a dictionary learning-based algorithm jointed with low-rank representation and sparse noise. Based on the robust dictionary learning and intra-class/inter-class local constraint in single frequency band, we extend this idea into EEG multi-frequency bands and develop the RMBDLL algorithm. The objective function of RMBDLL is defined as,
To facilitate the solution of
The augmented Lagrangian function corresponding to Equation (16) is,
Step 1. Update
The closed-form solution of
Step 2. Updating
The singular value thresholding algorithm [13] is used to update
Step 3. Updating
Using the soft-thresholding operator S
β (·),
Step 4. Updating
The closed-form solution of
From these four steps above, the parameters
Based on the learned multi-band sub-dictionaries Given a test sample
Calculate the distance d
s
between the encoding coefficient of test sample
Classify the test EEG sample
Algorithm 1 gives a more detailed description of the training and testing steps.
Experimental data and settings
This article uses the SJTU Emotion EEG Dataset (SEED) [14] provided by Shanghai Jiao Tong University, including 15 participants. Each participant watched 15 movie clips in each experiment to induce different emotions (5 positive, 5 negative, and 5 neutral clips). The example EEG sample in SEED dataset is shown in Fig. 3. A total of 15 corresponding trials were included in each experiment. In a trial, there was a 5-second start prompt, with a movie playback time of 4 minutes, a self-evaluation time of 45 seconds, and a rest time of 15 seconds. The selected films had been tested and evaluated, and effectively induced target emotions. Each participant conducted three rounds of experiments with a time interval of one week between each round. Each participant conducted a total of 45 experiments. Each participant recorded 62 channels of EEG signals in each emotion inducing experiment, and collected EEG signals from different positions of the brain according to the international 10–20 standard system.

The example EEG sample in SEED dataset.
In order to avoid the impact of repeated experiments on the intensity of evoked emotions, the EEG data was collected during three rounds of experiments for all subjects. Feature extraction was performed in each single electrode lead, resulting in a size of 15×15×62 sample set, with the first 15 being the number of participants, the second 15 being the number of movie clips watched by each participant in an experiment, and 62 being the number of spatial electrode leads. For each emotional state, the sample size obtained is 15×5×62 (4650). Since the intensity of EEG rhythmic activity reflects different emotion states of the brain, different EEG rhythms are closely related to emotional states. In this experiment, we use a digital band-pass filter to obtain power spectral density (PSD) features in five different frequency bands: δ, θ, α, β, and γ. We add two types of noises into the original EEG signals. One is random noise and the other is Gaussian noise. Their noise levels are 5%, 10%, and 15%.
The comparison algorithms include: the classical dictionary learning algorithm LC-SVD [15] and correlation-based label consistent K-SVD algorithm CLC-KSVD [16], multi-resolution dictionary learning algorithm MRDL [17], multi-view SVM algorithm MV-SVM [18], noise insensitive TSK fuzzy model PCB-ICL-TSK [19], and multi-layer joint sparse regularized dictionary learning algorithm M-JSDL [20]. The number of atoms in each sub-dictionary is selected in 10, 20, 30, 40, 50. The Gaussian kernel is used in MV-SVM, and the kernel parameter is selected in 0.01, 0.01,..., 100. The regularization parameters in comparison algorithms are selected in 0.001, 0.01,..., 10. The number of fuzzy rules in PCB-ICL-TSK is selected in 20, 30, 40, 50. All parameters in RMBDLL are selected in 1e-5, 5e-5,..., 5e-3. The number of layers is three in M-JSDL. We randomly select 80% samples for training, and used the rest 20% samples for testing. We repeat the experiments 10 times and record the classification accuracy for performance evaluation.
Tables 2, 3 present the recognition results on the SEED dataset with random noise and Gaussian noise, respectively. With an increase in noise level, the recognition accuracy of these algorithms decline. However, despite having a certain level of noise, RMBDLL obtains the highest recognition accuracy among all comparison algorithms, showing the RMBDLL algorithm’s efficacy and stability. When the intensity of random noise increases uniformly from 10%, 20% to 30%, the recognition accuracies of the RMBDLL algorithm are 83.54%, 80.91%, and 78.06%, with an average accuracy of 80.84%. The average recognition accuracy of LC-SVD, CLC-KSVD, and MV-SVM are 71.35%, 72.62%, and 73.96%, respectively. The multi-view algorithm MV-SVM treats each frequency band as a view and better explores the internal connections between different frequency bands than traditional dictionary learning, but it is sensitive to noise. The average recognition accuracy of MRDL is 76.57% with random noise. As a multi-resolution dictionary learning algorithm, MRDL treats each frequency band as a resolution and can effectively mine information hidden between EEG frequency bands. However, it is also sensitive to noise. Both PCB-ICL-TSK and M-JSDL are robust machine learning models, with their average recognition accuracy of 77.32% and 78.51% with random noise, respectively. They are still 3.52% and 2.32% lower than the RMBDLL algorithm, respectively. The RMBDLL algorithm considers the impact of different types of noise and can accurately separate complex noise in EEG signals. Therefore, the RMBDLL algorithm is an effective robust dictionary learning model for EEG emotional state recognition in noisy scenes.
Recognition accuracy (standard deviation) of all algorithms with random noise
Recognition accuracy (standard deviation) of all algorithms with random noise
Recognition accuracy (standard deviation) of all algorithms with Gaussian noise
Figures 4, 5 present the confusion matrices of all algorithms with random noise and Gaussian noise, respectively. From the perspective of three types of emotions, positive emotions have the highest recognition rate, followed by neutral emotions, and negative emotions have the lowest recognition accuracy. This result indicates that the EEG patterns of negative and neutral emotions may be similar. From the perspective of algorithm classification accuracy, whether with random noise or with Gaussian noise, the RMBDLL algorithm has the highest classification accuracy for three types of emotions. From the perspective of algorithm classification accuracy, whether with random noise or with Gaussian noise, the RMBDLL algorithm has the highest classification accuracy for the three types of emotions. The RMBDLL algorithm learns a sub dictionary on each EEG frequency band, and constructs a regularization term that constraints the intra-class similarity and inter-class differences of EEG signals. These factors greatly improve the recognition accuracy of RMBDLL.

Confusion matrices of all algorithms with random noise, (a) LC-SVD, (b) CLC-KSVD, (c) MV-SVM, (d) MRDL, (e) PCB-ICL-TSK, (f) M-JSDL, (g) RMBDLL.

Confusion matrices of all algorithms with Gaussian noise, (a) LC-SVD, (b) CLC-KSVD, (c) MV-SVM, (d) MRDL, (e) PCB-ICL-TSK, (f) M-JSDL, (g) RMBDLL.
Tables 4–5 show the recognition accuracies of the RMBDLL algorithm using different frequency band combinations with random noise and Gaussian noise, respectively. It can be found that the RMBDLL algorithm achieves the best performance when all five frequency bands are used. This indicates that multi-frequency bands can complement each other in EEG emotional state recognition tasks. At the same time, it also indicates that the RMBDLL algorithm can retain the main feature information of EEG emotions by using multi-frequency band joint dictionary learning.
Recognition accuracy (standard deviation) of RMBDLL using different frequent bands with random noise
Recognition accuracy (standard deviation) of RMBDLL using different frequent bands with random noise
Recognition accuracy (standard deviation) of RMBDLL using different frequent bands with Gaussian noise
To evaluate the effectiveness of the low-rank term

Ablation experiment of RMBDLL, (a) with random noise, (b) with Gaussian noise.
α, β, γ are three parameters that need to be adjusted in the RMBDLL algorithm. They are used to tradeoff the low-rank term, ℓ1-norm constraint term, and intra-class and inter-class local constraint term, respectively. Figures 7–9 illustrate the recognition accuracies of the RMBDLL algorithm versus parameters α, β, γ, respectively. It can be found that RMBDLL is not sensitive to α and β. When the parameter γ is within the range of [1e-5, 1e-4], the recognition accuracy of RMBDLL has good stability and remains unchanged. Therefore, RMBDLL has a certain degree of robustness to parameter settings.

Parameter analysis of α on RMBDLL with, (a) random noise, (b) Gaussian noise.

Parameter analysis of β on RMBDLL with, (a) random noise, (b) Gaussian noise.

Parameter analysis of γ on RMBDLL with, (a) random noise, (b) Gaussian noise.
In this study, a robust multi-frequency band joint dictionary learning with low rank representation algorithm is proposed for EEG emotional state recognition in noisy scenes. The algorithm exploits EEG multi-frequency band information and combines ℓ1-norm and ℓ2-norm to characterize various complex noises; meanwhile, it combines the supervision information and local structure information of dictionary atoms to construct intra-class and inter-class local constraint term, so that similar samples have similar coding coefficients, while homogeneous samples have high differences in coding coefficients. In future research work, we will make improvements in the following aspects. Deep learning is increasingly used in biological data processing, and combining deep learning methods with dictionary learning for big data EEG emotional state recognition is a worthy research direction. Although the effectiveness of the proposed algorithm can be demonstrated from the experimental results, there may still be several problems when applied to real-time BCI systems. The time complexity of RMBDLL is relatively high and it is an offline algorithm. Its real-time online EEG emotional state recognition has not yet been achieved. Its applicability and practicality in other EEG datasets need to be further tested.
Data availability statement
The SEED dataset is available at: http://bcmi.sjtu.edu.cn/∼seed/seed.html
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Acknowledgments
This work was supported in part by the Technology Project of Changzhou City under Grant CE20215032.
