Robust multi-frequency band joint dictionary learning with low-rank representation

Abstract

Emotional state recognition is an important part of emotional research. Compared to non-physiological signals, the electroencephalogram (EEG) signals can truly and objectively reflect a person’s emotional state. To explore the multi-frequency band emotional information and address the noise problem of EEG signals, this paper proposes a robust multi-frequency band joint dictionary learning with low-rank representation (RMBDLL). Based on the dictionary learning, the technologies of sparse and low-rank representation are jointly integrated to reveal the intrinsic connections and discriminative information of EEG multi-frequency band. RMBDLL consists of robust dictionary learning and intra-class/inter-class local constraint learning. In robust dictionary learning part, RMBDLL separates complex noise in EEG signals and establishes clean sub-dictionaries on each frequency band to improve the robustness of the model. In this case, different frequency data obtains the same encoding coefficients according to the consistency of emotional state recognition. In intra-class/inter-class local constraint learning part, RMBDLL introduces a regularization term composed of intra-class and inter-class local constraints, which are constructed from the local structural information of dictionary atoms, resulting in intra-class similarity and inter-class difference of EEG multi-frequency bands. The effectiveness of RMBDLL is verified on the SEED dataset with different noises. The experimental results show that the RMBDLL algorithm can maintain the discriminative local structure in the training samples and achieve good recognition performance on noisy EEG emotion datasets.

Keywords

Multi-frequency band dictionary learning electroencephalogram noise data low-rank representation

1 Introduction

The Internet of Things (IoT) is a network that uses telecommunications networks and the Internet as information carriers to connect everyday things with independent functionality [1]. IoT is not only the connection between object “things” and “things”, but also more importantly, the interaction between object “things” and subject “person”. In order to make “things” better serve “person”, IoT requires more flexible and efficient human-computer interaction (HCI). As is well known, emotions play a significant par in the interaction between people, and the same is true in HCI. When “things” can perceive human emotions, they can engage in more personalized and targeted interactions, and emotional state recognition is a key technology among them. There are various modes of emotion perception, such as facial expressions, voice intonation, body posture, physiological signals, etc., which can be used to recognize the emotional state of an individual in HCI. However, facial expressions, voice intonation, and body posture can be artificially modified through deliberate expression or disguise. Physiological signals are non-invasive, do not require active cooperation from individuals, and can be obtained continuously. Therefore, physiological signals are more objective in emotion perception [2].

Brain-computer interface (BCI) provides an effective channel for information transfer between brain and external environment by interpreting the physiological information of the brain during human thinking activities. With the development of portable BCI, emotional state recognition using BCI is also widely used in the field of HCI. Neuropsychologists believe that the left and right regions of the brain are involved in the generation of emotions, and the distinct and differential neural activities can be observed in the brain for different emotions, such as anger, fear, sadness, and disgust, all of which have distinctly different neural activity in the brain. Researches have also shown that the left frontal cortex is more associated with positive emotions and social competence, as well as negative anger emotions. The right frontal cortex is responsible for survival-related emotions, i.e., asymmetric frontal brain activity [3]. Because of the apparent direct response of emotion generation in the brain, electroencephalogram (EEG) emotional state recognition attracts a great deal of research attention. For example, in safe driving, real-time monitoring of the driver’s emotional state through EEG and taking relevant measures when negative emotions arise can reduce the occurrence of traffic accidents. In mental health, psychologists can identify the emotional state of visitors through EEG and obtain feedback information on psychological counseling; thereby they can provide effective psychological counseling programs for visitors based on emotional feedback. In healthcare, detecting emotional changes in patients through EEG can help understand their behavior, especially for those who have certain language barriers. EEG can timely obtain their emotional status, which is beneficial for obtaining detailed care and communication with the outside world.

Machine learning has been increasingly applied in EEG emotional state recognition. Some researchers divide EEG signals into several time windows and extract statistical features such as mean, variance, maximum, and median from each window. Then, machine learning models are used to model and analyze the EEG data. Cao et al. [4] analyzed the emotional state of physiological signals generated while watching videos, and further constructed a visual enhanced model to establish the connection between physiological signals and video comments. Yoon et al. [5] designed an EEG feature extraction method using Bayesian weighted log-posterior function, and then used a Bayesian classifier to identify emotional states with maximum feature values. Yi et al. [6] used continuous wavelet transform to obtain EEG temporal and frequency information. They used penalty multifunctional logistic regression to classify emotions. Although a large number of deep learning models, such as spatial dependence multi-task transformer network [7], region aggregation graph convolutional network [8], have been developed, considering that deep learning model requires a large amount of training data and high requirements for computer hardware, this study still conducts research based on machine learning model.

EEG signals are generally classified into five frequency bands, namely δ (1–3 Hz), θ (4–7 Hz), α (8–13 Hz), β (14–30 Hz), and γ (31–50 Hz). The examples of five frequency bands of EEG signals are shown in Fig. 1. The θ wave activity in the right frontal lobe is enhanced when subjects’ emotions are evoked using pleasant music. Most emotional load may be related to anterior temporal α wave activity. References [9, 10] found that the α wave activity in the frontal region can express emotional potency and emotional intensity. Experimentally, music was used to induce four emotions in subjects: happy, happy, sad and scared, and it was found that listening to music with positive and negative emotions produced stronger EEG activity in the left and right prefrontal. It was also shown that higher frequency signals such as β and γ waves have higher accuracy in recognizing emotions, while lower frequency band signals or limited in emotional state recognition [11].

Fig. 1

Examples of multi-frequency bands of EEG signals.

Although these EEG emotional state recognition methods have been effectively applied, most of them use pure EEG data without considering the influence of noise factors. However, since the EEG signal itself is non-linear, non-smooth and very weak, it is easily contaminated by different types of noise, which makes it difficult to guarantee the reliability of the model when recognizing emotions. On the other hand, traditional EEG emotional state recognition methods mostly use splicing EEG signals of multiple frequency bands together, which cannot fully explore the connections between different frequency bands, resulting in poor recognition performance. To address the above issues, we propose a robust multi-frequency band joint dictionary learning with low-rank representation (RMBDLL) for EEG emotional state recognition. The overall system block diagram of the RMBDLL algorithm is illustrated in Fig. 2. RMBDLL consists of two parts: robust dictionary learning and intra-class/inter-class local constraint learning. In robust dictionary learning, we utilize the feature consistency and complementarity of EEG to establish multi-frequency band joint learning of dictionary learning. We use two norm constraints to characterize complex noise and apply low-rank representation in dictionary learning to remove the influence of noise. A clean dictionary can be obtained by removing the complex noise. We think that mining complementary information between EEG signals from different frequency bands will help improve the performance of EEG emotional state recognition. In intra-class/inter-class local constraint learning, based on the local geometric structure of atoms in sub-dictionaries, we construct the intra-class and inter-class local constraint terms, and unify them in the regularization term for joint optimization. The experimental results on the SEED dataset show that compared with comparison algorithms, the proposed RMBDLL algorithm is robust to noise and can maintain the data discriminative local structure, thus achieves better recognition performance in EEG emotional state recognition.

The contributions of this paper are:

RMBDLL is a noise-robust dictionary learning algorithm with low-rank constraint, which combines ℓ₁-norm and ℓ₂-norm to characterize Laplacian distribution and Gaussian distribution noises, respectively. Unlike traditional learning methods that assume noise follows a single distribution, RMBDLL is more robust to complex noise.

RMBDLL is a multi-frequency band dictionary learning algorithm, which learns corresponding sub-dictionaries for each EEG frequency band. According to the consistency of emotional state recognition, the sparse encodes learned by sub-dictionaries are the same, which can mine the complementary information between different frequency bands.

In addition to considering the reconstruction performance of EEG signals, with discriminative local structure constraints, RMBDLL utilizes the intra-class similarity and inter-class difference of EEG multi-frequency bands and incorporates a regularization term composed of intra-class and inter-class local constraints to improve the model classification ability.

Fig. 2

Overall system block diagram of the RMBDLL algorithm.

2 Related work

In this section, we briefly review the techniques of sparse representation, dictionary learning, and low-rank representation. For convenience, the main notations used in this study are shown in Table 1.

Table 1
The main notations used in this study

Notation Description

Y The sample matrix

Y _i The sample matrix in the i-th frequency band

D The dictionary corresponding to Y

D _i The i-th sub-dictionary corresponding to Y_i

d ⁽ⁱ⁾ The i-th dictionary atom in D

X The sparse coefficients of Y

B The noise matrix

l ⁽ⁱ⁾ The class identification of d⁽ⁱ⁾

W The graph matrix on the dictionary atoms of the same class

w _ab The element in W

V The graph matrix on the dictionary atoms of different classes

v _ab The element in V

S The graph matrix built according to Equation (13)

L Laplacian matrix constructed on S

Notation	Description
Y	The sample matrix
Y _i	The sample matrix in the i-th frequency band
D	The dictionary corresponding to Y
D _i	The i-th sub-dictionary corresponding to Y_i
d ⁽ⁱ⁾	The i-th dictionary atom in D
X	The sparse coefficients of Y
B	The noise matrix
l ⁽ⁱ⁾	The class identification of d⁽ⁱ⁾
W	The graph matrix on the dictionary atoms of the same class
w _ab	The element in W
V	The graph matrix on the dictionary atoms of different classes
v _ab	The element in V
S	The graph matrix built according to Equation (13)
L	Laplacian matrix constructed on S

2.1 Sparse representation and dictionary learning

The original signal itself has varying degrees of redundancy, indicating that the signal can be sparsized and compressed. If the signal can be converted into other forms that are easier to represent, the representation of the signal will become more concise. The purpose of sparse representation [2] is to find the concise representation of input signals, and the basic framework is written as,

$\begin{matrix} min {∥ X ∥}_{0}, \\ s . t . Y = DX \end{matrix}$ (1)

where Y ∈ R^d×n represents the input samples. D ∈ R^d×K represents the dictionary. X ∈ R^K×n represents the corresponding sparse coefficients. ∥X ∥ ₀ is the l₀-norm which represents the number of elements that are not zero in X.

The solution of Equation (1) is an NP hard problem with complex optimization, and the most typical solution algorithm is the greedy algorithm. The greedy algorithm takes the current optimal choice in each step of the calculation process. To improve the convergence efficiency of algorithm iteration, the relaxation algorithm relaxes the norm in Equation (1) to the norm solution, as shown in Equation (2) below,

$\begin{matrix} min {∥ X ∥}_{p}, \\ s . t . Y = DX \end{matrix}$ (2)

When 0 < p < 1, the solution of Equation (2) becomes a nonconvex optimization problem, which can be solved by the iteratively reweighted algorithm (IRLS). When p = 1, the solution of Equation (2) becomes a convex optimization solution, and the common solution algorithms are basis pursuit (BP) algorithm, LASSO algorithm, and iterative shrinkage threshold algorithm [12], etc.

When performing the classification task, the sparse representation uses the sparse coefficients to reconstruct the sub-dictionary in each class separately, and the classification is performed according to the reconstruction error,

$label (y) = arg min_{i} {e_{i}},$ (3) where $e_{i} = {∥ y - D_{i} {\hat{x}}_{j} ∥}_{2}$ represents the reconstruction error on the i-th sub-dictionary. $\hat{x} = [{\hat{x}}_{1}, {\hat{x}}_{2}, . . ., {\hat{x}}_{C}]$ , where ${\hat{x}}_{j}$ represents the sparse coefficients of the tested sample y in the j-th class sub-dictionary. C is the number of classes.

Dictionary learning algorithms have shown excellent performance in EEG signal classification. The main purpose of dictionary learning is the sparse reconstruction of EEG signals, which is effective when dealing with low-complexity data. However, when the EEG signal contains noise, the performance of traditional dictionary learning is greatly degraded. In practical applications, EEG signals are often contaminated by various types of noise, the dictionary learning may not be able to accurately separate complex noise in EEG signals.

2.2 Low-rank representation

The rank of a matrix usually represents the maximum linearly independent columns or rows. A is an m×n matrix, the rank of matrix A is r if there is at least one non zero sub equation of order r in matrix A and all other sub-formulas of order r + 1 are zero. Thus, the rank reflects the richness of the information contained in the sample, and the original samples can be theoretically restored by a linear combination of only r linearly independent features. Given a data matrix Y containing noise or corruption, separating noise matrix B ∈ R^d×n and low rank matrix X from matrix Y can be transformed into a regularization rank minimization problem [13],

$min_{X, B} rank (X) + λ {∥ B ∥}_{p},$ (4) where λ is a balance parameter. Due to rank (DX) ≤ rank (X), DX can be considered as a low-rank recovery of X.

Due to the discreteness of the rank function, it is difficult to solve the rank minimization problem. Replacing the rank constraint with nuclear norm, Equation (4) becomes a convex optimization problem. Simultaneously, the ℓ₁-norm is used to constrain B, Equation (4) is represented as,

$min_{X, B} {∥ X ∥}_{*} + λ {∥ B ∥}_{1},$ (5) where ∥B ∥ ₁ can make the noise as sparse as possible, reducing the interference of noise. ∥X ∥ _* is the nuclear norm, which represents the sum of singular values of matrix X.

3 Robust multi-frequency band joint dictionary learning with low-rank representation

3.1 Robust dictionary learning

In practical applications, EEG signals are often contaminated by different types of noise, such as the ℓ₁-norm based Laplace distribution or the ℓ₂-norm based Gaussian distribution. In order to separate complex noise from EEG signals, the RMBDLL algorithm not only focuses on the reconstruction ability of sub-dictionaries in different frequency bands, but also aims to establish robust classification model,

$\underset{D, X, B}{arg min} {∥ Y - DX - B ∥}_{F}^{2} + α {∥ X ∥}_{*} + β {∥ B ∥}_{1},$ (6) where α and β are regularization parameters. E = Y - DX - B ∈ R^d×n and B ∈ R^d×n represent the noises following Gaussian and Laplacian distributions expected to be separated from the EEG feature Y, respectively. Obviously, considering mixed noise distribution, the RMBDLL algorithm is more robust than dictionary learning only considering single noise distribution assumption and can make the learned dictionary cleaner.

3.2 Intra-class and inter-class local constraint term

The column vector of a dictionary is called a dictionary atom. Let d^(a) and d^(b) represent the a-th and b-th dictionary atoms in dictionary D, respectively, where a, b ∈ [1, 2, . . . , K]. l^(a) and l^(b) are class labels of d^(a) and d^(b), respectively. If l^(a)=l^(b), d^(a) and d^(b) belong to the same class; if l^(a) ≠ l^(b), d^(a) and d^(b) belong to different classes. We introduce the intra-class and inter-class local constraints to obtain discriminative coding coefficients. The intra-class local constraints for encoding coefficients is constructed according to the local geometric structure of dictionary atoms of the same class,

$min_{X} \frac{1}{2} \sum_{a = 1}^{m} \sum_{b = 1}^{m} (x^{(a)} - x^{(b)})^{2} w_{ab},$ (7) where x^(a) and x^(b) are the encoding coefficients (also called as contour vectors) corresponding to d^(a) and d^(b). The graph matrix W is built on the dictionary atoms of the same class, and its element w_ab is defined as,

$w_{ab} = {\begin{matrix} ∥ d^{(a)} - d^{(b)} ∥, d^{(b)} \notin KNN (d^{(a)}) or d^{(a)} \notin KNN (d^{(b)}) \\ exp (- \frac{∥ d^{(a)} - d^{(b)} ∥}{σ}), d^{(b)} \in KNN (d^{(a)}) or d^{(a)} \in KNN (d^{(b)}) \end{matrix}$ (8) where the function KNN (z) returns the k nearest neighbors of z. W is used to control the intra-class similarity of dictionary atoms of the same class.

From Equation (7), we can see that if the similarity between two atoms is low, w_ab is large. On the contrary, if the similarity between two atoms is high, w_ab is small. Due to the fact that the encoding coefficients of similar atoms need to be as close as possible, one need obtain a small w_ab to ensure high similarity between x^(a) and x^(b).

The inter-class local constraints for encoding coefficients is constructed according to the local geometric structure of dictionary atoms of different classes,

$min_{X} \frac{1}{2} \sum_{a = 1}^{m} \sum_{b = 1}^{m} (x^{(a)} - x^{(b)})^{2} v_{ab},$ (9)

The graph matrix V is built on the dictionary atoms of the different classes, and its element v_ab is defined as,

$v_{ab} = {\begin{matrix} 0, ∥ d^{(a)} - d^{(b)} ∥ θ \\ max (∥ d^{(a)} - d^{(b)} ∥ - θ, - ∥ d^{(a)} - d^{(b)} ∥), ∥ d^{(a)} - d^{(b)} ∥ < θ \end{matrix}$ (10) where θ is a weight threshold. If ∥d^(a) - d^(b) ∥ ≥ θ, the distance between two heterogeneous dictionary atoms is relatively large, thus we don’t need to constrain the distance between x^(a) and x^(b), thus we can set v_ab = 0. If ∥d^(a) - d^(b) ∥ < θ, the distance between two heterogeneous dictionary atoms is relatively small, to avoid the high similarity in the coding coefficients of different classes, we use the max function to maintain the large difference between x^(a) and x^(b).

Combining Equations (8) and (10), the intra-class and inter-class local constraints are unified in the regularization term as,

$\begin{matrix} min_{X} \frac{1}{2} \sum_{a = 1}^{m} \sum_{b = 1}^{m} [η_{a, b} (x^{(a)} - x^{(b)})^{2} w_{ab} \\ + (1 - η_{a, b}) (x^{(a)} - x^{(b)})^{2} v_{ab}], \end{matrix}$ (11)

where $η_{a, b} = {\begin{matrix} 1, if l^{(a)} \neq l^{(b)} \\ 0, else \end{matrix}$ .

With simplification, Equation (11) becomes as,

$min_{X} \frac{1}{2} \sum_{a = 1}^{m} \sum_{b = 1}^{m} S_{a, b} (x^{(a)} - x^{(b)})^{2},$ (12) where the element S_a,b in S is defined as,

$S_{a, b} = {\begin{matrix} w_{ab}, if l^{(a)} \neq l^{(b)} \\ v_{ab}, else \end{matrix}$ (13)

Defining Laplacian matrix L constructed based on the matrix S as, L = P - S, P = diag (p₁, p₂, . . . , p_m), $p_{a} = \sum_{b = 1}^{m} S_{a, b}$ , Equation (12) can be written as,

$min_{X} tr (X^{T} LX),$ (14) where diag() and tr() are diagonal matrix and trace of matrix, respectively.

3.3 The objective function and its optimization

Suppose there is an EEG dataset Y divided into q frequency bands {Y₁, Y₂, . . . , Y_q}, where $Y_{i} = [y_{i}^{(1)}, y_{i}^{(2)}, . . ., y_{i}^{(n)}] \in R^{d \times n}$ is the subset of sample features in the i-th frequency band. {D₁, D₂, . . . , D_q} is the dictionary set corresponding to Y, and $D_{i} = [d_{i}^{(1)}, d_{i}^{(2)}, . . .,, d_{i}^{(K)}] \in R^{d \times K}$ is the i-th sub-dictionary corresponding to Y_i, where $d_{i}^{(j)}$ is the jth dictionary atom in D_i. For EEG emotion data, the encoding coefficients of EEG signals reflected in various frequency bands should be kept as consistent as possible. Therefore, we assume that the encoding coefficients corresponding to all frequency bands are the same. The encoding coefficients are denoted as $X = [x_{1}, x_{2}, . . ., x_{n}] \in R^{d \times n}$ , where $x_{s}$ is the encoding coefficient of the s-th sample.

The EEG signals are rich in emotion information, and at the same time, the correlation between frequent bands carries a lot of redundant information and noise. The low-rank model can decompose the EEG signals into a dictionary learning-based algorithm jointed with low-rank representation and sparse noise. Based on the robust dictionary learning and intra-class/inter-class local constraint in single frequency band, we extend this idea into EEG multi-frequency bands and develop the RMBDLL algorithm. The objective function of RMBDLL is defined as,

$\underset{D, B, X}{\arg \min} \sum_{i = 1}^{q} {\frac{1}{2} {‖ Y_{i} - D_{i} X - B_{i} ‖}_{F}^{2} + α {‖ X ‖}_{*} + β {‖ B_{i} ‖}_{1} + \frac{γ}{2} t r (X L_{i} X)} .$ (15)

To facilitate the solution of X, the alternating direction multiplier method (ADMM) can be used to solve Equation (15). Firstly, an auxiliary variable J is introduced, Equation (15) is represented as follows,

$\underset{D, B, X}{\arg \min} \sum_{i = 1}^{q} {\frac{1}{2} {‖ Y_{i} - D_{i} X - B_{i} ‖}_{F}^{2} + α {‖ J ‖}_{*} + β {‖ B_{i} ‖}_{1} + \frac{γ}{2} t r (X L_{i} X)}, s . t . X = J$ (16)

The augmented Lagrangian function corresponding to Equation (16) is,

$\underset{D, B, X}{\arg \min} \sum_{i = 1}^{q} {\frac{1}{2} {‖ Y_{i} - D_{i} X - B_{i} ‖}_{F}^{2} + α {‖ J ‖}_{*} + β {‖ B_{i} ‖}_{1} + \frac{γ}{2} t r (X L_{i} X) + \frac{μ}{2} {‖ X - J + \frac{C}{μ} ‖}_{F}^{2} - \frac{1}{2 μ} {‖ C ‖}_{F}^{2}}, s . t . X = J$ (17) where C is the Lagrange multiplier and μ is the penalty parameter.

Step 1. Update X by fixing D, B and J. Equation (17) becomes,

$\begin{matrix} \underset{X}{arg min} \sum_{i = 1}^{q} & {{∥ Y_{i} - D_{i} X - B_{i} ∥}_{F}^{2} + γ tr ({XL}_{i} X) \\ + μ {∥ X - J + \frac{C}{μ} ∥}_{F}^{2}}, \end{matrix}$ (18)

The closed-form solution of X is,

$X = \sum_{i = 1}^{q} (D_{i}^{T} D_{i} + μ I + γ L_{i})^{- 1} (D_{i}^{T} (Y_{i} - B_{i}) + μ J - C) .$ (19)

Step 2. Updating J by fixing D, B and X. Equation (17) becomes,

$\underset{J}{\arg \min} (α {‖ J ‖}_{*} + \frac{μ}{2} {‖ J - (X + \frac{C}{μ}) ‖}_{F}^{2}) .$ (20)

The singular value thresholding algorithm [13] is used to update J,

$J = I_{\frac{α}{μ}} (X + \frac{C}{μ}) .$ (21) where $I_{\frac{α}{μ}} (\cdot)$ is the singular value shrinkage operator.

Step 3. Updating B_i by fixing D, J and X. Equation (17) becomes,

$\underset{B_{i}}{\arg \min} \sum_{i = 1}^{q} {\frac{1}{2} {‖ Y_{i} - D_{i} X - B_{i} ‖}_{F}^{2} + β {‖ B_{i} ‖}_{1}} .$ (22)

Using the soft-thresholding operator S_β (·), B_i can be solved by,

$B_{i} = S_{β} (Y_{i} - D_{i} X) .$ (23)

Step 4. Updating $D_{i}$ by fixing D, B and X. Equation (17) becomes,

$\underset{D_{i}}{arg min} \sum_{i = 1}^{q} {{∥ Y_{i} - D_{i} X - B_{i} ∥}_{F}^{2}} .$ (24)

The closed-form solution of $D_{i}$ is,

$D_{i} = (Y_{i} - B_{i}) X^{T} (X (X)^{T})^{- 1} .$ (25)

From these four steps above, the parameters X, J,B,D can be updated alternately.

3.4 Classification

Based on the learned multi-band sub-dictionaries D₁, D₂, . . . , D_q, we encode the tested EEG signal and classify it using the learned encoding coefficients. The classification steps are as follows:

Given a test sample y, calculate its encoding coefficients for each frequency band, where the encoding coefficient learned based on the jth frequency band dictionary is,

${\tilde{x}}_{j} = (D_{j}^{T} D_{j})^{- 1} D_{j}^{T} y, 1 \leq j \leq q .$ (26)

Calculate the distance d_s between the encoding coefficient of test sample y and the encoding coefficient of the s-th training sample,

$\begin{matrix} d_{s} = ∥ {\tilde{x}}_{1} - x_{s, 1} ∥ + ∥ {\tilde{x}}_{2} - x_{s, 2} ∥ \\ + . . . + ∥ {\tilde{x}}_{q} - x_{s, q} ∥, \end{matrix}$ (27)

Classify the test EEG sample y: y can be classified corresponding to the sample of minimum d_s. The classifier f(y) is,

$f (y) = arg min_{s} {d_{s}}, s = 1, 2, . . ., n$ (28)

Algorithm 1 gives a more detailed description of the training and testing steps.

Algorithm 1. The RMBDLL algorithm
Input: Training data Y;
Output: The classifier f(y);
Initialize $D_{i}$ using LC-SVD algorithm [15];
//Training
while not converged do
for i = 1 to q do
Construct the graph matrices W and V via Equations (8) and (10), respectively;
Update X via Equation (19);
Update J via Equation (21);
UpdateB_ivia Equation (23);
Update $D_{i}$ via Equation (25);
Update C = C + μ (X - J);
Update μ = min(μ_max, ρμ), where ρ is the learning rate;
end for
end while
// Testing
Calculate the encoding coefficients of testing sample y for each frequency band via Equation (26);
Calculate the distance d_s between y and each training sample via Equation (27);
Construct the classifier f(y) via Equation (28)

4 Experiment

4.1 Experimental data and settings

This article uses the SJTU Emotion EEG Dataset (SEED) [14] provided by Shanghai Jiao Tong University, including 15 participants. Each participant watched 15 movie clips in each experiment to induce different emotions (5 positive, 5 negative, and 5 neutral clips). The example EEG sample in SEED dataset is shown in Fig. 3. A total of 15 corresponding trials were included in each experiment. In a trial, there was a 5-second start prompt, with a movie playback time of 4 minutes, a self-evaluation time of 45 seconds, and a rest time of 15 seconds. The selected films had been tested and evaluated, and effectively induced target emotions. Each participant conducted three rounds of experiments with a time interval of one week between each round. Each participant conducted a total of 45 experiments. Each participant recorded 62 channels of EEG signals in each emotion inducing experiment, and collected EEG signals from different positions of the brain according to the international 10–20 standard system.

Fig. 3

The example EEG sample in SEED dataset.

In order to avoid the impact of repeated experiments on the intensity of evoked emotions, the EEG data was collected during three rounds of experiments for all subjects. Feature extraction was performed in each single electrode lead, resulting in a size of 15×15×62 sample set, with the first 15 being the number of participants, the second 15 being the number of movie clips watched by each participant in an experiment, and 62 being the number of spatial electrode leads. For each emotional state, the sample size obtained is 15×5×62 (4650). Since the intensity of EEG rhythmic activity reflects different emotion states of the brain, different EEG rhythms are closely related to emotional states. In this experiment, we use a digital band-pass filter to obtain power spectral density (PSD) features in five different frequency bands: δ, θ, α, β, and γ. We add two types of noises into the original EEG signals. One is random noise and the other is Gaussian noise. Their noise levels are 5%, 10%, and 15%.

The comparison algorithms include: the classical dictionary learning algorithm LC-SVD [15] and correlation-based label consistent K-SVD algorithm CLC-KSVD [16], multi-resolution dictionary learning algorithm MRDL [17], multi-view SVM algorithm MV-SVM [18], noise insensitive TSK fuzzy model PCB-ICL-TSK [19], and multi-layer joint sparse regularized dictionary learning algorithm M-JSDL [20]. The number of atoms in each sub-dictionary is selected in 10, 20, 30, 40, 50. The Gaussian kernel is used in MV-SVM, and the kernel parameter is selected in 0.01, 0.01,..., 100. The regularization parameters in comparison algorithms are selected in 0.001, 0.01,..., 10. The number of fuzzy rules in PCB-ICL-TSK is selected in 20, 30, 40, 50. All parameters in RMBDLL are selected in 1e-5, 5e-5,..., 5e-3. The number of layers is three in M-JSDL. We randomly select 80% samples for training, and used the rest 20% samples for testing. We repeat the experiments 10 times and record the classification accuracy for performance evaluation.

4.2 Performance comparison

Tables 2, 3 present the recognition results on the SEED dataset with random noise and Gaussian noise, respectively. With an increase in noise level, the recognition accuracy of these algorithms decline. However, despite having a certain level of noise, RMBDLL obtains the highest recognition accuracy among all comparison algorithms, showing the RMBDLL algorithm’s efficacy and stability. When the intensity of random noise increases uniformly from 10%, 20% to 30%, the recognition accuracies of the RMBDLL algorithm are 83.54%, 80.91%, and 78.06%, with an average accuracy of 80.84%. The average recognition accuracy of LC-SVD, CLC-KSVD, and MV-SVM are 71.35%, 72.62%, and 73.96%, respectively. The multi-view algorithm MV-SVM treats each frequency band as a view and better explores the internal connections between different frequency bands than traditional dictionary learning, but it is sensitive to noise. The average recognition accuracy of MRDL is 76.57% with random noise. As a multi-resolution dictionary learning algorithm, MRDL treats each frequency band as a resolution and can effectively mine information hidden between EEG frequency bands. However, it is also sensitive to noise. Both PCB-ICL-TSK and M-JSDL are robust machine learning models, with their average recognition accuracy of 77.32% and 78.51% with random noise, respectively. They are still 3.52% and 2.32% lower than the RMBDLL algorithm, respectively. The RMBDLL algorithm considers the impact of different types of noise and can accurately separate complex noise in EEG signals. Therefore, the RMBDLL algorithm is an effective robust dictionary learning model for EEG emotional state recognition in noisy scenes.

Table 2
Recognition accuracy (standard deviation) of all algorithms with random noise

10% 20% 30%

LC-SVD 76.87 (6.61) 71.73 (7.08) 65.45 (7.06)

CLC-KSVD 77.20 (7.34) 72.79 (7.39) 67.87 (8.85)

MV-SVM 78.44 (7.20) 74.96 (6.68) 68.48 (6.52)

MRDL 80.77 (8.95) 76.71 (6.43) 72.23 (8.11)

PCB-ICL-TSK 81.09 (6.39) 77.10 (8.07) 73.77 (7.10)

M-JSDL 81.53 (6.02) 78.79 (7.18) 75.22 (6.82)

RMBDLL 83.54 (5.60) 80.91 (6.26) 78.06 (5.77)

	10%	20%	30%
LC-SVD	76.87	(6.61)	71.73	(7.08)	65.45	(7.06)
CLC-KSVD	77.20	(7.34)	72.79	(7.39)	67.87	(8.85)
MV-SVM	78.44	(7.20)	74.96	(6.68)	68.48	(6.52)
MRDL	80.77	(8.95)	76.71	(6.43)	72.23	(8.11)
PCB-ICL-TSK	81.09	(6.39)	77.10	(8.07)	73.77	(7.10)
M-JSDL	81.53	(6.02)	78.79	(7.18)	75.22	(6.82)
RMBDLL	83.54	(5.60)	80.91	(6.26)	78.06	(5.77)

Table 3

Recognition accuracy (standard deviation) of all algorithms with Gaussian noise

	10%		20%		30%
LC-SVD	76.66	(6.19)	71.36	(7.16)	66.10	(6.57)
CLC-KSVD	77.22	(6.70)	73.28	(7.40)	68.02	(7.21)
MV-SVM	78.39	(7.69)	74.13	(5.97)	69.60	(7.30)
MRDL	81.48	(7.30)	76.18	(6.91)	71.67	(6.39)
PCB-ICL-TSK	80.68	(6.57)	76.65	(8.54)	73.89	(6.33)
M-JSDL	81.72	(6.00)	78.18	(6.28)	75.01	(6.01)
RMBDLL	83.11	(6.25)	80.23	(5.39)	78.57	(5.92)

Figures 4, 5 present the confusion matrices of all algorithms with random noise and Gaussian noise, respectively. From the perspective of three types of emotions, positive emotions have the highest recognition rate, followed by neutral emotions, and negative emotions have the lowest recognition accuracy. This result indicates that the EEG patterns of negative and neutral emotions may be similar. From the perspective of algorithm classification accuracy, whether with random noise or with Gaussian noise, the RMBDLL algorithm has the highest classification accuracy for three types of emotions. From the perspective of algorithm classification accuracy, whether with random noise or with Gaussian noise, the RMBDLL algorithm has the highest classification accuracy for the three types of emotions. The RMBDLL algorithm learns a sub dictionary on each EEG frequency band, and constructs a regularization term that constraints the intra-class similarity and inter-class differences of EEG signals. These factors greatly improve the recognition accuracy of RMBDLL.

Fig. 4

Confusion matrices of all algorithms with random noise, (a) LC-SVD, (b) CLC-KSVD, (c) MV-SVM, (d) MRDL, (e) PCB-ICL-TSK, (f) M-JSDL, (g) RMBDLL.

Fig. 5

Confusion matrices of all algorithms with Gaussian noise, (a) LC-SVD, (b) CLC-KSVD, (c) MV-SVM, (d) MRDL, (e) PCB-ICL-TSK, (f) M-JSDL, (g) RMBDLL.

4.3 Model analysis

Tables 4–5 show the recognition accuracies of the RMBDLL algorithm using different frequency band combinations with random noise and Gaussian noise, respectively. It can be found that the RMBDLL algorithm achieves the best performance when all five frequency bands are used. This indicates that multi-frequency bands can complement each other in EEG emotional state recognition tasks. At the same time, it also indicates that the RMBDLL algorithm can retain the main feature information of EEG emotions by using multi-frequency band joint dictionary learning.

Table 4
Recognition accuracy (standard deviation) of RMBDLL using different frequent bands with random noise

10% 20% 30%

β + γ 79.03 (6.19) 75.58 (5.15) 72.17 (6.57)

α + β + γ 80.25 (5.72) 77.32 (6.92) 73.48 (6.00)

θ + α + β + γ 81.47 (5.69) 78.41 (6.21) 75.72 (5.83)

δ + θ + α + β + γ 83.54 (5.60) 80.91 (6.26) 78.06 (5.74)

	10%	20%	30%
β + γ	79.03	(6.19)	75.58	(5.15)	72.17	(6.57)
α + β + γ	80.25	(5.72)	77.32	(6.92)	73.48	(6.00)
θ + α + β + γ	81.47	(5.69)	78.41	(6.21)	75.72	(5.83)
δ + θ + α + β + γ	83.54	(5.60)	80.91	(6.26)	78.06	(5.74)

Table 5

Recognition accuracy (standard deviation) of RMBDLL using different frequent bands with Gaussian noise

	10%		20%		30%
β + γ	79.05	(5.86)	73.29	(5.94)	71.93	(5.85)
α + β + γ	80.51	(6.91)	77.42	(5.70)	73.40	(6.42)
θ + α + β + γ	81.96	(6.53)	79.84	(6.06)	76.82	(6.58)
δ + θ + α + β + γ	83.11	(6.25)	80.23	(5.39)	78.57	(5.92)

4.4 Ablation experiment

To evaluate the effectiveness of the low-rank term ${∥ X ∥}_{*}$ , ℓ₁-norm constraint term ${∥ B_{i} ∥}_{1}$ , and intra-class and inter-class local constraint term tr (XL_iX)in the RMBDLL algorithm, Fig. 6 illustrates the recognition accuracy of RMBDLL in ablation experiment. For the convenience of representation, the RMBDLL algorithms without low-rank term and without ℓ₁-norm constraint term are named as RMDLR_low and RMDLR_norm, respectively. The RMBDLL algorithm without intra-class and inter-class local constraint term is named as RMDLR_int. It can be found that these three terms play the important parts in the RMBDLL algorithm. Therefore, both the low-rank and ℓ₁-norm constraint terms can effectively improve the model robust. The intra-class and inter-class local constraint term can effectively increase the discriminability of inter-class sample representation.

Fig. 6

Ablation experiment of RMBDLL, (a) with random noise, (b) with Gaussian noise.

4.5 Parameter analysis

α, β, γ are three parameters that need to be adjusted in the RMBDLL algorithm. They are used to tradeoff the low-rank term, ℓ₁-norm constraint term, and intra-class and inter-class local constraint term, respectively. Figures 7–9 illustrate the recognition accuracies of the RMBDLL algorithm versus parameters α, β, γ, respectively. It can be found that RMBDLL is not sensitive to α and β. When the parameter γ is within the range of [1e-5, 1e-4], the recognition accuracy of RMBDLL has good stability and remains unchanged. Therefore, RMBDLL has a certain degree of robustness to parameter settings.

Fig. 7

Parameter analysis of α on RMBDLL with, (a) random noise, (b) Gaussian noise.

Fig. 8

Parameter analysis of β on RMBDLL with, (a) random noise, (b) Gaussian noise.

Fig. 9

Parameter analysis of γ on RMBDLL with, (a) random noise, (b) Gaussian noise.

5 Conclusion

In this study, a robust multi-frequency band joint dictionary learning with low rank representation algorithm is proposed for EEG emotional state recognition in noisy scenes. The algorithm exploits EEG multi-frequency band information and combines ℓ₁-norm and ℓ₂-norm to characterize various complex noises; meanwhile, it combines the supervision information and local structure information of dictionary atoms to construct intra-class and inter-class local constraint term, so that similar samples have similar coding coefficients, while homogeneous samples have high differences in coding coefficients. In future research work, we will make improvements in the following aspects. Deep learning is increasingly used in biological data processing, and combining deep learning methods with dictionary learning for big data EEG emotional state recognition is a worthy research direction. Although the effectiveness of the proposed algorithm can be demonstrated from the experimental results, there may still be several problems when applied to real-time BCI systems. The time complexity of RMBDLL is relatively high and it is an offline algorithm. Its real-time online EEG emotional state recognition has not yet been achieved. Its applicability and practicality in other EEG datasets need to be further tested.

Data availability statement

The SEED dataset is available at: http://bcmi.sjtu.edu.cn/∼seed/seed.html

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Acknowledgments

This work was supported in part by the Technology Project of Changzhou City under Grant CE20215032.

References

Ajmeria

, Mondal

, Reya

et al., A critical survey of EEG-based BCI systems for applications in industrial Internet of things, IEEE Communications Surveys & Tutorials 25(1) (2023), 184–212.

Ruan

, Du

and Ni

, Transfer discriminative dictionary pair learning approach for across-subject EEG emotion classification, Frontiers in Psychology 13(5) (2022), 35619785.

Houssein

, Hammad

and Ali

, Human emotion recognition from EEG-based brain-computer interface using machine learning: a comprehensive review, Neural Computing and Applications 34(5) (2022), 12527–12557.

Cao

, Zhang

, Wu

et al., Video emotion analysis enhanced by recognizing emotion in video comments, International Journal of Data Science and Analytics 14(5) (2022), 175–189.

Yoon

and Chung

, EEG-based emotion estimation using Bayesian weighted-log-posterior function and perceptron convergence algorithm, Computers in Biology and Medicine 43(12) (2013), 2230–2237.

, Billor

, Liang

et al., Classification of EEG signals: an interpretable approach using functional data analysis, Journal of Neuroscience Methods 376(6) (2022), 109609.

, Lv

, Li

et al., SDMT: spatial dependence multi-task transformer network for 3D knee MRI segmentation and landmark localization, IEEE Trans Medical Imaging 42(8) (2023), 2274–2285.

, Jiang

, Liu

et al., RAGCN: region aggregation graph convolutional network for bone age assessment from X-Ray images, IEEE Trans Instrumentation and Measurement 71(4) (2022), 1–12.

Sarlo

, Buodo

, Poli

et al., Changes in EEG alpha power to different disgust elicitors: the specificity of mutilations, Neuroscience Letters 382(3) (2005), 291–296.

10.

, Chao

and Zhang

, Emotion classification based on brain wave: a survey, Human-centric Computing and Information Sciences 9(12) (2019), 42.

11.

, Hayashi

, Ding

et al., Dictionary learning with theℓ_1/2-regularizer and the coherence penalty and itsconvergence analysis, Int J Mach Learn & Cyber 9(3) (2018), 1351–1364.

12.

Chen

, Wu

, Yin

et al., Noise-robust dictionary learning with slack block-diagonal structure for face recognition, Pattern Recognition 100(11) (2019), 107118.

13.

SEED dataset, http://bcmi.sjtu.edu.cn/seed/seed.html.

14.

Jiang

, Lin

and Davis

, Label consistent K-SVD: learning a discriminative dictionary for recognition, IEEE Trans Pattern Analysis and Machine Intelligence 35(11) (2013), 2651–2664.

15.

Kashefpoor

, Rabbani

and Barekatain

, Supervised dictionary learning of EEG signals for mild cognitive impairment diagnosis, , Biomedical Signal Processing and Control 53 (2019), 101559.

16.

Luo

, Xu

and Yang

, Multi-resolution dictionary learning for face recognition, Pattern Recognition 93(9) (2019), 283–292.

17.

Huang

, Chung

and Wang

, Multi-view L2-SVM and its multi-view core vector machine, Neural Networks 75(3) (2016), 110–125.

18.

, Gu

and Zhang

, An intelligence EEG signal recognition method via noise insensitive TSK fuzzy system based on interclass competitive learning, , Frontiers in Neuroscience 14 (2021), 837.

19.

Miao

, Cao

, Jin

et al., Joint sparse regularization for dictionary learning, Cognitive Computation 11(6) (2020), 697–710.