Classification of EEG signals in epilepsy using a novel integrated TSK fuzzy system

Abstract

The use of machine learning technology to recognize electrical signals of the brain is becoming increasingly popular. Compared with doctors’ manual judgment, machine learning methods are faster. However, only when its recognition accuracy reaches a high level can it be used in practice. Due to the difference in the data distributions of the training dataset and the test dataset and the lack of training samples, the classification accuracies of general machine learning algorithms are not satisfactory. In fact, among the many machine learning methods used to process epilepsy electroencephalogram (EEG) signals, most are black box methods; however, in medicine, methods with explanatory power are needed. In response to these three challenges, this paper proposes a novel technique based on domain adaptation learning, semi-supervised learning and a fuzzy system. In detail, we use domain adaptation learning to reduce deviation from the data distribution, semi-supervised learning to compensate for the lack of training samples, and the Takagi-Sugen-Kang (TSK) fuzzy system model to improve interpretability. Our experimental results show that the performance of the new method is better than those of most advanced epilepsy classification methods.

Keywords

EEG signal recognition epilepsy classification integrated learning mechanism domain adaptation learning semi-supervised learning TSK fuzzy system

1 Introduction

Epilepsy is a chronic disease with sudden abnormal discharges of brain neurons, and this leads to transient brain dysfunction. To diagnose epilepsy [1, 2], people usually use electroencephalogram (EEG) signals. Since it takes considerable time and energy for doctors to diagnose epilepsy based on EEG signals, a new machine learning method has been adopted by researchers; that is, according to EEG signals, epileptic seizures can be detected automatically [3 –11]. There are many classical machine learning models, such as decision trees (DTs) [9], the naive Bayes (NB) method [7, 9], support vector machines (SVMs) [6, 11], K-nearest neighbors (KNN) classification [7] and linear discriminant analysis (LDA) [5 , 10]. There is no doubt that machine learning methods exhibit faster detection and better consistency than manual annotation, but they may be problems with low detection accuracy. Jiang et al. [12] found that most machine learning methods default to the same data distribution, but in actual, complex applications, this may not be accurate. To solve this problem, this paper adopts a domain adaptation learning method based on a deep subdomain that aligns each subdomain and retains the precision of features. The results show that subdomain adaptation learning can solve this problem well.

In addition, two problems are considered: (1) We attempt to improve the interpretability of the model, which would be very helpful for medical research. (2) We determine whether the unlabeled test data can be fully utilized to obtain some characteristic information to further improve the model performance. For the first question, we use the Takagi-Sugen-Kang (TSK) fuzzy system as the core classifier. For the second question, semi-supervised learning (SSL) is used to obtain feature information from unlabeled test data.

The rest of this paper is organized as follows. The second section introduces the common methods in the field of epileptic EEG signal recognition. The third section introduces the concepts and learning algorithms of the two types of TSK fuzzy systems. The fourth section introduces a novel TSK fussy system model using subdomain adaptation learning and semi-supervised learning (SDA-SSL-TSK-FS) along with its learning algorithm. The fifth section compares the performance of the proposed method with those of existing methods on six real EEG datasets. The sixth section is the conclusion.

2 Related work

This section mainly introduces the methods of feature extraction and machine learning in epilepsy recognition.

2.1 Method of extracting original signal features

Signal feature extraction methods are used to extract accurate and useful features from an original EEG signal. The use of processed data can not only reduce computational costs [4 , 8] but also improve classification performance [12 , 23–29]. Generally, we use three common features: 1) time domain features [23, 25], such as principal component analysis features; 2) frequency domain features [3, 24], such as Fourier transform features; and 3) time-frequency features [3 , 25–27], such as wavelets. This article considers these three characteristics.

2.2 Machine learning methods

Machine learning methods can quickly detect epilepsy through EEG signals. As long as they are trained correctly, machine learning methods can achieve satisfactory accuracy. Currently, there are many classic machine learning methods, such as the DT, SVM, KNN and LDA methods, being used. To use these machine learning methods, there is usually a premise: the training data and testing data must have the same or similar data distributions, but in many practical situations, the sample distribution is different. Therefore, the actual test results are often not satisfactory. Transduction transfer learning [13] is a well-known method for dealing with differences in sample distributions. For instance, Jiang et al. [12] studied a conversion transfer learning (TL) algorithm, i.e., a transfer support vector machine (TSVM) [18], which has been used to identify epilepsy using EEG signals and has achieved good performance.

In addition, most machine learning methods have another problem. The training model is a black box model that is difficult to explain, and this is not conducive to medical diagnosis. To improve the interpretability of the training model and make it easy for people to understand, we use the essentially interpretable TSK fuzzy system as the initial model.

3 TSK fuzzy system model

There are two fuzzy system models. One is the Mamdani fuzzy system [20], and the other is the Takagi-Sugeno-Kang (TSK) fuzzy system [21 , 38]. The latter is adopted because of its convenience and scalability. This section introduces the specific composition of the TSK fuzzy system (TSK-FS). The next section uses the ensemble principle to optimize the model and achieve high classification performance.

3.1 Concepts and principles

The TSK fuzzy system is a model that uses rules for classification or regression. In this model, there is a typical rule R^k for describing fuzzy systems.

$\begin{matrix} IF x_{1} is A_{1}^{k} \land x_{2} {is A}_{2}^{k} \land \dots x_{d} {is A}_{d}^{k} THEN \\ f^{k} {(x) = p}_{0}^{k} {+ p}_{1}^{k} x_{1} + \dots {+ p}_{d}^{k} x_{d} \end{matrix}$ (1) where $p^{k}$ is the consequent parameter corresponding to the kth fuzzy set, d is the number of features of the input samples, $A_{1}^{k}$ is the fuzzy set, and ∧ is the fuzzy operator. x = [x₁, x₂, ⋯ , x_d] ^T is a real input. The real-valued output of the TSK fuzzy system is: $\hat{y} = \sum_{k = 1}^{K} \frac{μ^{k} (x) f^{k} (x)}{\sum_{k^{'} = 1}^{K} μ^{k^{'}} (x)} = \sum_{k = 1}^{K} \tilde{μ} k (x) f^{k} (x)$ (2.a) where $μ^{k} (x) = \prod_{i = 1}^{d} μ_{A_{i}^{k}} (x_{i})$ (2.b) and ${\tilde{μ}}^{k} (x) = μ^{k} (x) / \sum_{k^{'} = 1}^{K} μ^{k^{'}} (x)$ (2.c)

$μ_{A_{i}^{k}} (x_{i})$ is the membership grade of x_i on $A_{i}^{k}$ [17, 22]. This model uses the Gaussian membership function: $μ_{A_{i}^{k}} (x_{i}) = \exp (\frac{- (x_{i} {- c}_{i}^{k})^{2}}{{2 δ}_{i}^{k}})$ (2.d)

where $c_{i}^{k} = \sum_{j = 1}^{N} u_{jk} x_{ji} / \sum_{j = 1}^{N} u_{jk}$ (2.e)

and

$σ_{i}^{k} = η \cdot \sum_{j = 1}^{N} u_{jk} (x_{ji} {- c}_{i}^{k})^{2} / \sum_{j = 1}^{N} u_{jk}$ (2.f)

N is the number of input samples, u_jk is the fuzzy membership degree obtained by the fuzzy c-means (FCM) [30] algorithm, and η is a range parameter, which is generally set manually or optimized through cross-validation.

Through the following five formulas:

$x_{e} = (1 {, x}^{T})^{T},$ (3.a) ${\tilde{x}}^{k} = {\tilde{μ}}^{k} (x) x_{e},$ (3.b) $x_{g} = (({\tilde{x}}^{1})^{T}, ({\tilde{x}}^{2})^{T}, \dots, ({\tilde{x}}^{K})^{T})^{T},$ (3.c) $p^{k} = (p_{0}^{k}, p_{1}^{k}, \dots, p_{d}^{k})^{T},$ (3.d) $p_{g} = ((p^{1})^{T}, ((p^{2})^{T}, \dots, (p^{K})^{T}))^{T},$ (3.e)

(2.a) can be expressed as [17, 22]: $\hat{y} = p_{g}^{T} x_{g} .$ (3.f)

3.2 Learning algorithm of the TSK-FS model

Given an EEG signal dataset D_S = {x_i, y_i} for training, the following parameter P_g is optimized by the least-squares method, and the objective function is:

$\begin{matrix} \min_{P_{g}} J_{TSK} (P_{g}) = \frac{1}{2} \sum_{j = 1}^{C} \sum_{i = 1}^{Ns} {∥ P_{g, j}^{T} x_{gi} - y_{ij} ∥}^{2} \\ + \frac{λ_{1}}{2} \sum_{j = 1}^{C} P_{g, j}^{T} P_{g, j}, \end{matrix}$ (4) where p_g,j is the subsequent parameter corresponding to the jth category, x_gi is the input vector after Gaussian kernel mapping, and y_ij is the label of the ith sample. We use one-hot encoding to express the sample. (If the sample belongs to category j, it is marked as 1; otherwise, it is marked as 0). This is done to prepare for the interpretability of the final model and has no effect on the performance of the experiment. λ₁ > 0 is a regularization parameter that determines the degree of tolerance of the model to noise data.

When the derivative of J_TSK with respect to each p_g,j is zero, the minimum value of J_TSK (p_g) is obtained. The optimal solutions of the subsequent parameters are: $p_{g, j} = (λ_{1} E + \sum_{i = 1}^{N_{s}} x_{gi} x_{gi}^{T})^{- 1} (\sum_{i = 1}^{N_{s}} x_{gi} y_{ij}) .$ (5) where E is an identity matrix prepared for regularization purposes.

The following is a summary of the TSK fuzzy system (TSK-FS) modeling method.

TSK-FS Modeling Method
Input:	Original dataset D = {x_i, y_i}, rule number K, regularization parameter λ₁.
Step 1:	Obtain the antecedent parameters used to separate the training data through fuzzy clustering or other division methods.
Step 2:	Use (3.a)-(3.c) to construct a new dataset $\tilde{D} = {x_{gi}, y_{i}}$ .
Step 3:	Use (5) to find the subsequent parameters of the TSK-FS model, and then construct the decision function (3.f).
Output:	Prediction results of unlabeled samples ${\hat{y}}_{i}$ .

4 Sub-Domain adaptation and semi-supervised learning

In this section, we introduce two algorithms for integration with the basic classifier TSK-FS.

4.1 Subdomain adaptation learning (SDA)

The process of learning a differentiated model when there is a difference between the distribution of the training data and the distribution of the testing data is called domain adaptation or transfer learning [44 –46].

The previously developed domain adaptation methods [47 –49] mainly learn the global domain offset, that is, the distributions of the source domain and the target domain are homogeneous without considering the relationship between the two subdomains of each of the two domains. Therefore, different categories of data in a domain are confused, so good feature structures cannot be learned. In other words, the unique feature information of each category may be lost. As shown in Fig. 1 (left), although the data distributions seem to be similar, different types of subdomain data are still mixed together, so it is difficult to perform accurate classification. To solve this problem, we propose a subdomain adaptation learning method that uses relevant subdomains to accurately align the source domain and target domain. As shown in Fig. 1 (right), when the sample distributions of the subdomains are the same, the global sample distributions also tend to be the same.

Fig. 1

left: Overall domain adaptation may confuse the feature information of samples from the same class. Right: the same-category subdomain alignment can use its own characteristics to capture the unique feature information of each category.

To achieve the correct alignment of subdomains, a local maximum mean square deviation (LMMD) method is proposed in this paper. It measures the spatial distances between the sample distributions of related subdomains in the source domain and the target domain.

By minimizing the LMMD, we can not only reduce the differences between the data distributions of the source domain and target domain but also learn highly precise feature representations. The literature [12] has proven that the use of transfer learning can have a significant effect on the classification of EEG signals. We use the same technique to improve the performance of the TSK-FS model.

The labeled data samples are taken as the source domain, D_S = {x_i, y_i}, and the unlabeled data samples are taken as the target domain D_T = {x_i}. We define the spatial distance between two domains as [12, 18]: $\begin{matrix} distance (P_{source}, P_{t arg et}) \\ = \sum_{j = 1}^{C} ∥ \sum_{x_{i}^{s} \in D_{s}} P_{g, j}^{T} ω_{i, sc} φ (x_{i}^{s}) - \sum_{x_{j}^{t} \in D_{T}} P_{g, j}^{T} ω_{j, tc} φ (x_{j}^{t}) ∥ H 2 \\ = \sum_{j = 1}^{C} (\sum_{i = 1}^{N_{S}} \sum_{j = 1}^{N_{S}} P_{g, j}^{T} ω_{i, sc} x_{gi, S} x_{gj, S}^{T} ω_{j, tc}^{T} P_{g, j} \\ + \sum_{i = 1}^{N_{T}} \sum_{j = 1}^{N_{T}} P_{g, j}^{T} ω_{i, tc} x_{gi, T} x_{gj, T}^{T} ω_{j, tc}^{T} P_{g, j} \\ - 2 \sum_{i = 1}^{N_{S}} \sum_{j = 1}^{N_{T}} P_{g, j}^{T} ω_{i, sc} x_{gi, S} x_{gj, T}^{T} ω_{j, tc}^{T} P_{g, j}), \end{matrix}$ (6) where x_gi,S is the ith sample in the training data, x_gi,T is the ith sample in the testing data, and $P_{g}$ is the expected prediction value of the TSK-FS model. $ω_{i, sc}$ and $ω_{j, tc}$ represent the weights of sample $x_{i}^{s}$ and sample $x_{j}^{t}$ that may belong to class c, where $\sum_{i = 1}^{N_{S}} ω_{i, sc}$ and $\sum_{j = 1}^{N_{T}} ω_{j, tc}$ are both equal to 1, and $\sum_{x_{i} \in D} w_{i}^{c} φ (x_{i})$ is the weighted sum corresponding to category c. We calculate the weight ω_i,c of sample x_i as: $ω_{i, c} = \frac{y_{ic}}{\sum_{(x_{j} {, y}_{j}) \in D} y_{jc}}$ (7) where D denotes either the source domain D_S or the target domain D_T, and y_ic is the cth item of the vector y_i. For samples in the source domain, we use the real label $y_{i}^{s}$ to calculate the respective weight of each sample belonging to a certain class. However, in unsupervised adaptation without labeled data in the target domain, $y_{j}^{t}$ cannot be directly used to calculate (6). We have found that the output of the TSK-FS model is a probability distribution, which characterizes the probability that a sample belongs to each of the C classes. Therefore, for a target domain D_T without a label, it is reasonable to use ${\hat{y}}_{i}^{t}$ as the probability of assigning $x_{i}^{t}$ to class C. Then, the weight ω_j,tc for each target sample is calculated and (6) can be obtained. Let $\begin{matrix} Ω = ω_{sc} x_{g, S}^{N_{s} \times N_{s}} x_{g, S}^{T} ω_{sc}^{T} + ω_{tc} x_{g, T} [1]^{N_{T} \times N_{T}} x_{g, T}^{T} ω_{tc}^{T} \\ - ω_{sc} x_{g, S} [1]^{N_{s} \times N_{T}} x_{g, T}^{T} ω_{tc}^{T} - ω_{tc} x_{g, T} [1]^{N_{T} \times N_{S}} x_{g, S}^{T} ω_{sc}^{T} . \end{matrix}$ (8)

Finally, (6) can be rewritten as $distance (P_{source}, P_{t arg et}) = \sum_{j = 1}^{C} P_{g, j}^{T} Ω P_{g, j} .$ (9)

4.2 Semi-supervised learning (SSL)

For offline classification, even if the data have no labels, SSL can be used to obtain hidden feature information from the data, thus further improving model performance. Figure 2 shows the method used to realize this idea; it is based on clustering theory, where similar sample distributions belong to the same category. Therefore, using the above heuristic method to further improve the performance of the model, we design a new SSL method similar to the fuzzy c-means method [30] to perform label clustering.

Fig. 2

Illustration of SSL with label clustering.

$min J_{SSL} (U) = \sum_{j = 1}^{C} \sum_{i = 1}^{N_{T}} μ_{ij}^{m} {∥ P_{g, S}^{T} x_{gi, T} {- θ}_{j} ∥}^{2}$ (10) where C is the number of clusters obtained by clustering, N_T is the number of samples in the testing dataset (target domain), and m is the fuzzy index in the fuzzy c-means method, which is set manually. μ_ij represents the membership degree of the ith unlabeled sample belonging to the jth class, and μ_ij is computed as:

$μ_{ij} = {(\frac{1}{{∥ p_{g, S}^{T} x_{gi, T} - θ_{j} ∥}^{2}})}^{\frac{1}{m - 1}} / \sum_{k = 1}^{C} {(\frac{1}{{∥ p_{g, S}^{T} x_{gi, T} - θ_{k} ∥}^{2}})}^{\frac{1}{m - 1}}$ (11) where x_gi,T is the ith unlabeled sample in the testing data, p_g,S is the consequent parameter of the TSK system based on the training data, and θ = [0, ⋯ , 0, 1, 0, ⋯ , 0] ^T is the label vector of the training data sample. U is a fuzzy partition matrix of C × N_T size composed of the $μ_{ij}$ .

4.3 Learning algorithm of the SDA-SSL-TSK-FS model

In this section, we introduce the integrated objective function used to train the SDA-SSL-TSK-FS model: $\begin{matrix} min_{P_{g}} J_{SDA - SSL - TSK} = \frac{1}{2} \sum_{j = 1}^{C} \sum_{i = 1}^{Ns} {∥ P_{g, j}^{T} x_{gi} - y_{ij} ∥}^{2} \\ + \frac{λ_{1}}{2} \sum_{j = 1}^{C} P_{g, j}^{T} P_{g, j} + λ_{2} \sum_{j = 1}^{C} P_{g, j}^{T} Ω P_{g, j} \\ + λ_{3} \sum_{j = 1}^{C} \sum_{i = 1}^{N_{T}} {\hat{μ}}_{ij}^{m} {∥ P_{g, S}^{T} x_{gi, T} {- θ}_{j} ∥}^{2} \end{matrix}$ (12) where λ₁ > 0, λ₂ > 0 and λ₃ > 0 are all regularization parameters used to control the sensitivity of the model to noise data. In Eq. (11), the first two items are related to the regularized TSK-FS model based on source domain data. The third item is related to SDA, and the fourth item is related to SSL.

Suppose $\hat{U} = [{\hat{U}}_{1}, \dots, {\hat{U}}_{C}] \in R^{1 \times {CN}_{T}}$ is the membership value of the unlabeled sample, where ${\hat{U}}_{j} = [μ_{1 j}, \dots, μ_{ij}, \dots, μ_{N_{T} j}] \in R^{1 \times N_{T}}$ , and $\hat{U}$ is transformed into $\overset{⌢}{U}$ by a diagonal matrix: $\overset{⌢}{U} = diag (\hat{U}) \in R^{{CN}_{T} \times {CN}_{T}}$ . V is a transformation matrix composed of C identity matrices V = [E, ⋯ , E] ∈ R^N_T×CN_T, where the size of each identity matrix is N_T. Q is a conversion matrix, Q = [q₁, ⋯ , q_C] ∈ R^C×CN_T, where all the values in the jth row of q_j ∈ R^C×N_T are 1, and the rest are 0. The final objective function (11) can be expressed as: $\begin{matrix} min_{P_{g}} J_{SDA - SSL - TSK} \\ = \frac{1}{2} tr ((p_{g}^{T} x_{g, S} - y_{S}) (p_{g}^{T} x_{g, S} - y_{S})^{T}) + \frac{λ_{1}}{2} tr (p_{g}^{T} p_{g}) \\ + λ_{2} tr (p_{g}^{T} Ω p_{g}) \\ + λ_{3} tr ((p_{g}^{T} x_{g, T} V - Q) \overset{⌢}{U} (p_{g}^{T} x_{g, T} V - Q)^{T}) \end{matrix}$ (13) here $\begin{matrix} Ω = ω_{sc} x_{g, S} [1]^{N_{s} \times N_{s}} x_{g, S}^{T} ω_{sc}^{T} + ω_{tc} x_{g, T} [1]^{N_{T} \times N_{T}} x_{g, T}^{T} ω_{tc}^{T} \\ - ω_{sc} x_{g, S} [1]^{N_{s} \times N_{T}} x_{g, T}^{T} ω_{tc}^{T} - ω_{tc} x_{g, T} [1]^{N_{T} \times N_{S}} x_{g, S}^{T} ω_{sc}^{T} . \end{matrix}$ (14)

By setting the derivative of J_SDA-SSL-TSK w.r.t. p_g to zero, the optimal result parameter can be calculated, and the solution is:

$\begin{matrix} p_{g} = {(\begin{matrix} x_{g, S} x_{g, S}^{T} + λ_{1} E + 2 λ_{2} Ω \\ + 2 λ_{3} x_{g, T} V \overset{⌢}{U} V^{T} x_{g, T}^{T} \end{matrix})}^{- 1} \\ \times (x_{g, S} y^{T} + 2 λ_{3} x_{g, T} V \overset{⌢}{U} Q^{T}) . \end{matrix}$ (15)

The following is a summary of the SDA-SSL-TSK fuzzy system modeling method.

SDA-SSL-TSK-FS Modeling Method
Input:	Original dataset D = {x_i, y_i}, rule number K, regular parameters λ₁, λ₂, and λ₃, and fuzzy index m.
Step 1:	Obtain the antecedent parameters used to the separate training data through fuzzy clustering or other division methods.
Step 2:	Use (3.a)-(3.c) to construct a new dataset $\tilde{D} = {x_{gi}, y_{i}}$ .
Step 3:	Use (5) to find the subsequent parameters p_g,S of the TSK-FS model.
Step 4:	Use (7) and the known p_g,S to obtain the feature transfer parameters ω_i,c, namely, the subdomain category weights.
Step 5:	Use (12) and the known p_g,S to obtain the knowledge transfer parameters, namely, the label memberships $μ_{ij}$ .
Step 6:	Use (14) to update the subsequent parameters p_g of the target domain and construct the new SDA-SSL-TSK model.
Output:	Prediction results of unlabeled samples ${\hat{y}}_{i}$ .

5 Experimental study

In this section, the new SDA-SSL-TSK-FS model is evaluated by classifying EEG signals collected during the seizures of healthy subjects and epileptic patients. We compare six classic machine learning methods, two transfer methods, and two semi-supervised methods. The experimental results are shown in Table 1.

Table 1
Description of the eeg data

Subjects Groups Size of groups Descriptions of datasets

Healthy A 100 EEG signals measured from healthy people with their eyes open

B 100 EEG signals measured from healthy people with their eyes closed

Epileptic C 100 EEG signals obtained in the hippocampal formation of the opposite

hemisphere of the brain during seizure-free intervals

D 100 EEG signals obtained from within the epileptogenic zone during

seizure-free intervals

E 100 EEG signals measured during seizures

Subjects	Groups	Size of groups	Descriptions of datasets
Healthy	A	100	EEG signals measured from healthy people with their eyes open
	B	100	EEG signals measured from healthy people with their eyes closed
Epileptic	C	100	EEG signals obtained in the hippocampal formation of the opposite
			hemisphere of the brain during seizure-free intervals
	D	100	EEG signals obtained from within the epileptogenic zone during
			seizure-free intervals
	E	100	EEG signals measured during seizures

To ensure the fairness of the experiment, we conduct comparative experiments with the same feature extraction method and the same dataset. The experimental environment and experimental data are prepared as follows.

5.1 Experimental setup

For this experiment, all the algorithms were implemented in MATLAB on a computer with an Intel core i5-8500U CPU at 1.70 GHz and 12 GB RAM. The following are the specific experimental settings.

1) Methods for comparison

Non-transfer learning-based methods: NB [7, 9], LDA [5 , 10], DT [9], KNN [7], SVM [6], TSK-FS.

Transfer learning-based or semi-supervised learning-based methods: transfer support vector machine (TSVM) [33], semi-supervised support vector machine (S4VM) [34], transfer learning with a graph co-regularization method (GTL2) [35], and large-margin transductive transfer learning method (LMPROJ) [18]. Among them, the TSVM and S4VM methods are semi-supervised learning methods, and LMPORJ and GTL2 are transfer learning methods.

2) Data Sources

The EEG data used in this study are the same as those in [12] and can be downloaded from http://epileptologie-bonn.de/cms/front_content.php?Idcat=193&lang=3&changeling=3 [41]. The dataset contains five groups of data (Groups A to E), and each group contains 100 single-channel EEG segments with 23.6 s durations. The average sampling rate for all the datasets was 173.6 Hz. Table 1 gives a detailed description of the five groups. Figure 3 shows some typical EEG signals in each group.

Fig. 3

Typical EEG signals in Groups A-E.

3) Feature extraction method for EEG signals

We used the following feature extraction methods to process the original EEG data:

Wavelet packet decomposition (WPD)

Short-time Fourier transform (STFT)

Kernel principal component analysis (KPCA)

4) Performance evaluation measures

To evaluate our proposed method, we adopted the following two common performance metrics:

Accuracy: The number of correctly predicted test data divided by the total number of test data.

Statistical significance: Friedman test and Holm’s post-hoc test [36, 37].

5) Specific parameter settings

For the NB, LDA, DT, KNN, SVM, and TSK-FS methods, we set the experimental parameters according to [12].

For the TSVM, S4VM, LMPROJ and GTL2 methods, we adopted the experimental settings in [33, 34] and [35], respectively.

For the TSK-FS and SDA-SSL-TSK-FS methods, the number of fuzzy rules was selected from 5, 10, 15, 20, 25, 30, the regular parameters λ₁, λ₂, and λ₃ were selected from {10^-3, 10^-2, ⋯ , 10², 10³}, the transfer balance parameter η was selected from 0, 0.1, 0.2, 0.3, ... ,0.9, 1, and the fuzzy index m was selected from 1.1, 1.5, 2, 2.5. Five-fold cross-validation was applied on the training data for all the methods.

5.2 TL scheme for EEG epilepsy recognition

We followed the two TL scenarios used in [12] to configure the dataset. In Scenario 1, the source domain data and target domain data have the same data distribution; in Scenario 2, the data distributions of the source domain data and target domain data are different, as shown in Table 2.

Table 2
Dataset Configurations for Epileptic eeg Signal Recognition

Scenario Datasets Source domain (training dataset) Target domain (testing dataset) Number of samples in the source domain Number of samples in the target domain

Scenario 1: Same distribution D1 Groups A and E Groups A and E 150 50

D2 Groups A, B and E Groups A, B and E 150 50

Scenario 2: Different distributions D3 Groups A, E Groups A, C 50 50

D4 Groups A, E Groups A, D 50 50

D5 Groups A, C and E Groups B, C and E 75 75

D6 Groups A, D and E Groups B, D and E 75 75

Scenario	Datasets	Source domain (training dataset)	Target domain (testing dataset)	Number of samples in the source domain	Number of samples in the target domain
Scenario 1: Same distribution	D1	Groups A and E	Groups A and E	150	50
	D2	Groups A, B and E	Groups A, B and E	150	50
Scenario 2: Different distributions	D3	Groups A, E	Groups A, C	50	50
	D4	Groups A, E	Groups A, D	50	50
	D5	Groups A, C and E	Groups B, C and E	75	75
	D6	Groups A, D and E	Groups B, D and E	75	75

5.3 Analysis of experimental results

1) Identification performance analysis

To achieve a comprehensive comparison, we have made the following arrangements. We use three feature extraction methods, namely, WPD, STFT and KPCA. Under the same extraction method, six classical methods and four composite methods are compared with our new method. The accuracy of each classifier is shown in Tables 3 –5.

Among the six classic non-transfer methods, the TSK-FS model has the best comprehensive performance. The results show that TSK-FS is suitable for EEG signal processing. In addition, compared with black box methods such as support vector machines, this method also has the advantage of interpretability.

In Scenario 1, the classic (non-transfer) machine learning methods can also obtain good accuracy. However, for the data in Scenario 2, the performances of these methods drop significantly. This is because the data distributions of the source domain data and the target domain data are not the same, so these non-transfer methods have difficulty distinguishing between data samples.

Different from non-transfer learning methods, methods that use transfer learning or semi-supervised learning generally achieve satisfactory classification performance. This shows that transfer learning and semi-supervised learning can indeed have a positive effect on EEG signal recognition.

Compared with the four composite TL methods or SSL methods, the SDA-SSL-TSK-FS model has the highest classification accuracy for datasets with different source and target data distributions. This is because TSVM and S4VM are only based on semi-supervised learning, LMPROJ and GTL2 are only based on transfer learning, and our method uses both. Specifically, the new method uses subdomain adaptation (SDA) to learn feature information common to the same category to reduce the deviation from the data distribution. At the same time, SSL makes full use of unlabeled data to obtain a large amount of potential feature information.

To assess whether the methods used are statistically significant, we use Friedman’s test and Holm’s test [36, 37]. The former test is used to calculate the average ranks of different methods and to determine whether the statistical differences between the methods are significant. If the p-value is less than 0.05, the null hypothesis (that is, there is no statistically significant difference) is rejected. We also perform a Holm post-hoc test to verify whether there is a statistically valid difference between the control method (that is, the method that achieves the best Friedman rank) and other methods.

Table 3
Performance of Each Classifier Based on Wpd

Datasets D1 D2 D3 D4 D5 D6 Avg

Part A: Non-TL methods

TSK-FS 0.916±0.002 0.917±0.008 0.796±0.025 0.812±0.011 0.793±0.004 0.801±0.004 0.839

SVM 0.918±0.004 0.923±0.001 0.511±0.010 0.536±0.005 0.854±0.021 0.806±0.033 0.758

NB 0.886±0.006 0.829±0.010 0.543±0.015 0.603±0.001 0.652±0.002 0.703±0.021 0.702

DT 0.913±0.010 0.934±0.008 0.585±0.014 0.671±0.010 0.855±0.015 0.817±0.010 0.795

KNN 0.871±0.025 0.892±0.020 0.515±0.003 0.554±0.001 0.805±0.024 0.788±0.010 0.738

LDA 0.913±0.012 0.912±0.034 0.616±0.001 0.636±0.006 0.834±0.012 0.802±0.011 0.786

Part B: TL methods

S4VM 0.989 ± 0.003 0.992 ± 0.033 0.826±0.007 0.866±0.020 0.869±0.006 0.826±0.036 0.895

TSVM 0.988±0.002 0.904±0.032 0.857±0.022 0.851±0.029 0.876±0.081 0.827±0.001 0.884

GTL2 0.920±0.020 0.903±0.029 0.912±0.028 0.923±0.030 0.860±0.006 0.844±0.022 0.894

LMPROJ 0.937±0.005 0.936±0.001 0.941±0.020 0.945±0.005 0.925±0.008 0.945±0.005 0.938

SDA-SSL-TSK-FS 0.951±0.004 0.951±0.003 0.966 ± 0.003 0.970 ± 0.010 0.971 ± 0.002 0.969 ± 0.004 0.963

Datasets	D1	D2	D3	D4	D5	D6	Avg
Part A: Non-TL methods
TSK-FS	0.916±0.002	0.917±0.008	0.796±0.025	0.812±0.011	0.793±0.004	0.801±0.004	0.839
SVM	0.918±0.004	0.923±0.001	0.511±0.010	0.536±0.005	0.854±0.021	0.806±0.033	0.758
NB	0.886±0.006	0.829±0.010	0.543±0.015	0.603±0.001	0.652±0.002	0.703±0.021	0.702
DT	0.913±0.010	0.934±0.008	0.585±0.014	0.671±0.010	0.855±0.015	0.817±0.010	0.795
KNN	0.871±0.025	0.892±0.020	0.515±0.003	0.554±0.001	0.805±0.024	0.788±0.010	0.738
LDA	0.913±0.012	0.912±0.034	0.616±0.001	0.636±0.006	0.834±0.012	0.802±0.011	0.786
Part B: TL methods
S4VM	0.989 ± 0.003	0.992 ± 0.033	0.826±0.007	0.866±0.020	0.869±0.006	0.826±0.036	0.895
TSVM	0.988±0.002	0.904±0.032	0.857±0.022	0.851±0.029	0.876±0.081	0.827±0.001	0.884
GTL2	0.920±0.020	0.903±0.029	0.912±0.028	0.923±0.030	0.860±0.006	0.844±0.022	0.894
LMPROJ	0.937±0.005	0.936±0.001	0.941±0.020	0.945±0.005	0.925±0.008	0.945±0.005	0.938
SDA-SSL-TSK-FS	0.951±0.004	0.951±0.003	0.966 ± 0.003	0.970 ± 0.010	0.971 ± 0.002	0.969 ± 0.004	0.963

Table 4

Performance of Each Classifier Based on Stft

Datasets	D1	D2	D3	D4	D5	D6	Avg
Part A: Non-TL methods
TSK-FS	0.986±0.010	0.992±0.011	0.798±0.006	0.797±0.012	0.802±0.005	0.816±0.002	0.865
SVM	0.978±0.003	0.982±0.002	0.526±0.002	0.536±0.005	0.854±0.021	0.860±0.011	0.789
NB	0.947±0.021	0.930±0.005	0.526±0.001	0.584±0.020	0.569±0.012	0.559±0.001	0.689
DT	0.989±0.020	0.955±0.003	0.674±0.030	0.682±0.001	0.812±0.012	0.788±0.005	0.816
KNN	0.990±0.036	0.890±0.019	0.705±0.028	0.698±0.015	0.802±0.011	0.815±0.002	0.816
LDA	0.983±0.024	0.918±0.002	0.702±0.019	0.753±0.016	0.697±0.001	0.689±0.001	0.790
Part B: TL methods
S4VM	0.996 ± 0.002	0.994±0.008	0.860±0.003	0.873±0.015	0.884±0.025	0.890±0.030	0.916
TSVM	0.992±0.010	0.998 ± 0.002	0.934±0.002	0.942±0.014	0.868±0.005	0.866±0.009	0.933
GTL2	0.977±0.026	0.993±0.037	0.910±0.027	0.930±0.012	0.931±0.025	0.924±0.026	0.944
LMPROJ	0.972±0.001	0.994±0.001	0.942±0.008	0.936±0.001	0.955±0.010	0.949±0.01	0.958
SDA-SSL-TSK-FS	0.994±0.009	0.996±0.005	0.965 ± 0.006	0.974 ± 0.005	0.971 ± 0.006	0.969 ± 0.001	0.978

Table 5

Performance of Each Classifier Based on Kpca

Datasets	D1	D2	D3	D4	D5	D6	Avg
Part A: Non-TL methods
TSK-FS	0.926±0.028	0.924±0.016	0.753±0.004	0.792±0.006	0.796±0.010	0.847±0.005	0.840
SVM	0.925±0.012	0.892±0.020	0.838±0.016	0.877±0.018	0.688±0.011	0.730±0.012	0.825
NB	0.824±0.052	0.710±0.030	0.625±0.015	0.570±0.010	0.652±0.002	0.679±0.022	0.677
DT	0.936±0.023	0.843±0.014	0.799±0.012	0.963±0.009	0.715±0.002	0.778±0.001	0.839
KNN	0.739±0.035	0.648±0.027	0.826±0.006	0.822±0.010.	0.759±0.010	0.779±0.013	0.761
LDA	0.899±0.005	0.837±0.012	0.771±0.017	0.757±0.020	0.675±0.014	0.684±0.009	0.771
Part B: TL methods
S4VM	0.985 ± 0.004	0.974 ± 0.022	0.919±0.011	0.938±0.020	0.842±0.012	0.895±0.021	0.926
TSVM	0.943±0.023	0.923±0.022	0.912±0.008	0.944±0.006	0.794±0.036	0.788±0.006	0.884
GTL2	0.962±0.016	0.948±0.020	0.930±0.030	0.940±0.012	0.952±0.030	0.946±0.021	0.946
LMPROJ	0.943±0.023	0.906±0.025	0.930±0.004	0.940±0.032	0.956±0.023	0.946±0.008	0.939
SDA-SSL-TSK-FS	0.974±0.008	0.960±0.016	0.936 ± 0.003	0.957 ± 0.005	0.962 ± 0.010	0.968 ± 0.002	0.960

The results of Friedman’s comparative experiment are shown in Tables 6 and 7. The results of the test show that the accuracy of the newly proposed method is significantly different from those of the other methods. Holm’s test shows that the SDA-SSL-TSK-FS model is significantly better than all six TL-based methods, but the performances of existing composite methods are different. Although the diagnostic method we proposed does not show statistical improvement this is not important for our more explanatory diagnostic method.

Table 6

Friedman Test Results for the Average Performance of Each Method (α = 0.05)

WPD Feature				STFT Feature
Algorithms	Friedman Rank	p-value	Hypothesis	Algorithms	Friedman Rank	p-value	Hypothesis
NB	10			NB	10.4167
DT	6.3333			DT	7.8333
KNN	10			KNN	7.75
LDA	6.8333			LDA	8.6667
SVM	7.6667			SVM	7.75
TSK-FS	6.5	0	Rejected	TSK-FS	6.5	0	Rejected
TSVM	5			TSVM	3.5833
S4VM	4			S4VM	3.75
GTL2	4.5			GTL2	4.6667
LMPROJ	2.5			LMPROJ	3.75
SDA-SSL-TSK-FS	1.5			SDA-SSL-TSK	1.3333
KPCA Feature				AVERAGE
Algorithms	Friedman Rank	p-value	Hypothesis	Algorithms	Friedman Rank	p-value	Hypothesis
NB	10.6667			NB	10.667
DT	6.5			DT	7.5833
KNN	8.5			KNN	9.5833
LDA	9.5			LDA	8.8333
SVM	7.6667			SVM	8
TSK-FS	6.6667	0	Rejected	TSK-FS	6	0	Rejected
TSVM	4.9167			TSVM	4.5
S4VM	3.3333			S4VM	3.3333
GTL2	2.9167			GTL2	3.1667
LMPROJ	3.5			LMPROJ	2.8333
SDA-SSL-TSK-FS	1.83333			SDA-SSL-TSK	1.5

Table 7

Holm Test Results for the Average Performance Results of the Friedman Test (α= 0.05)

Holm’s Post-hoc Comparison of the SDA-SSL-TSK-FS Model and the Compared Models
i	Algorithms	z = (R₀ - R_i)/SE	p	Holm = α/i	Hypothesis
10	NB	1.787136	0.000002	0.000909	Rejected
9	KNN	4.221383	0.000024	0.000926	Rejected
8	LDA	3.829708	0.000128	0.001	Rejected
7	SVM	3.394514	0.000688	0.001042	Rejected
6	DT	3.176917	0.001488	0.001136	Rejected
5	TSK-FS	2.350048	0.018771	0.001471	Rejected
4	TSVM	1.566699	0.117185	0.002	Not Rejected
3	S4VM	0.957427	0.338352	0.002941	Not Rejected
2	GTL2	0.870388	0.384088	0.003333	Not Rejected
1	LMPROJ	0.696311	0.486234	0.005	Not Rejected

In summary, the experimental results show that the new composite method can be applied to the recognition of EEG signals from epileptic patients: the classification performance of the model is excellent and superior to those of most existing methods, and our method has high interpretability, which is highly conducive to medical diagnosis.

2) Analysis of the interpretability of the models

This subsection analyzes the interpretation function of the SDA-SSL-TSK-FS model to prove the superiority of the proposed method. The parameters of the model generated from dataset D4 are shown in Table 8.

Table 8

The Sda-Ssl-Tsk Model Trained on Dataset d4 with Wpd Features

Base fuzzy rules
TSK fuzzy rule R^k: IFx₁ is $A_{1}^{k} (c_{1}^{k}, σ_{1}^{k}) \land$ x₂ is $A_{2}^{k} (c_{2}^{k}, σ_{2}^{k}) \land$ ⋯∧ x_d is $A_{d}^{k} (c_{d}^{k}, σ_{d}^{k})$ ,
THEN $f^{k} (x) = p_{0}^{k} + p_{1}^{k} x_{1} + \dots + p_{d}^{k} x_{d}$ .
No. of rules	Antecedent parameters (Gaussian membership function parameters)	Consequent parameters (linear function parameters)
k	$c^{k} = (c_{1}^{k}, \dots, c_{d}^{k})^{T}, σ^{k} = (σ_{1}^{k}, \dots, σ_{d}^{k})^{T}$	p_k = (p_k0, p_k1, ⋯ , p_kd) ^T
1	c¹= [32.2122, 32.3627, 32.6364, 32.4924, 28.7415, 14.5652]	p₁= [0.0076, 0.0441, 0.0382, 0.0058, – 0.0764, – 0.0433, – 0.0012;
	σ¹= 0.8548, 1.2597, 1.3404, 1.7737, 1.1440, 12.6465]	0.0071, – 0.0094, – 0.0099, 0.0011, 0.0518, – 0.0255, 0.0013]
2	c²= [30.9945, 29.5745, 29.6139, 28.4427, 25.6835, 0.6634]	p₂= [0.0738, – 0.0063, – 0.0524, 0.2361, – 0.1841, 0.0501, – 2.5192;
	σ²= 0.9639, 1.3651, 0.9443, 0.6899, 0.5633, 1.4953]	– 0.1302, – 0.1970, 0.4392, – 0.4387, 0.0878, – 0.0654, 7.7283]
3	c³= [31.5338, 31.4164, 30.9535, 30.0501, 26.1195, 3.6272]	p₃= [0.0431, 0.0301, – 0.310, 0.222, 0.0175, – 0.0217, 0.2435;
	σ³= 0.8778, 1.5236, 0.9096, 0.7799, 0.4778, 2.1699]	– 0.5989, 0.8546, 0.0517, – 0.472, 0.3290, – 0.5528, – 1.1422]
4	c⁴= [29.9033, 32.0668, 30.9709, 29.5672, 26.5526, 6.5274]	p₄= [– 0.2228, 0.0201, 0.1778, 0.0580, 0.0496, – 0.1306, – 1.6012;
	σ⁴= [0.4247, 1.0648, 0.5692, 0.3950, 0.4898, 1.7447]	0.2046, – 0.2245, 0.2687, – 0.3525, – 0.0377, – 0.0021, 4.8562]
5	c⁵= [28.3938, 26.8380, 27.3497, 26.7357, 24.3204, 0.3860]	p₅= [0.0347, – 0.3575, 0.1489, 0.1612, 0.0886, 0.0554, – 1.0392;
	σ⁵= 0.6078, 0.3236, 0.2981, 0.2550, 0.3302, 0.3360]	0.0321, 0.2020, – 0.0473, – 0.2564, 0.0793, 0.0482, 0.0941]

Fig. 4 shows the membership function (MF) of the corresponding fuzzy set under each fuzzy rule. We use the level of signal frequency (Low, Slightly low, Medium, Slightly high, High) as the natural-language description of each MF. Considering the fact that doctors have different understandings of the same semantics, the descriptions of the rules will be different.

For the data in the first row of Fig. 4, the antecedent parameters (the center c and variance σ of band 1), i.e., (32.21, 0.85) for the 1st fuzzy rule, (30.99,0.96) for the 2nd fuzzy rule, (31.53,0.88) for the 3rd fuzzy rule, (29.9,0.42) for the 4th fuzzy rule, and (28.39,0.61) for the 5th fuzzy rule are represented by generating five MFs, which are recorded as band 1. The five MFs can express “Low", “Slightly low", “Medium", “Slightly high” and “High” in natural language, arranged in ascending order with regard to the central value. By analogy, other features (bands) also have their own descriptions generated by five MFs. Finally, the language expression of the “IF” part of the fuzzy system is given. Then, the corresponding linear functions of the five fuzzy rules generated according to the WPD features can be described in natural language as follows:

Fig. 4

The membership function of the fuzzy subset of the antecedent of the fuzzy rule in the fuzzy system and its possible natural-language interpretation. The fuzzy system uses WPD features. *The preceding parameters (c¹, σ¹) of the first feature (dimension) of the data. ** is a possible explanation for the obtained fuzzy set.

The 1st Fuzzy Rule:

IF the signal frequency of band 1 is “High”, the signal frequency of band 2 is “High”, the signal frequency of band 3 is “High”, the signal frequency of band 4 is “High”, the signal frequency of band 5 is “High”, and the signal frequency of band 6 is “High”,

THEN, the decision function of this rule is given by: $\begin{matrix} f^{1} (x) = [\begin{matrix} 0.0076 + 0.0441 x_{1} + 0.0382 x_{2} + 0.0058 x_{3} - 0.0764 x_{4} - 0.0433 x_{5} - 0.0012 x_{6}, \\ 0.0071 - 0.0094 x_{1} - 0.0099 x_{2} + 0.0011 x_{3} + 0.0518 x_{4} - 0.0255 x_{5} + 0.0013 x_{6} \end{matrix}] . \end{matrix}$

The 2nd Fuzzy Rule:

IF the signal frequency of band 1 is “Slightly high”, the signal frequency of band 2 is “Slightly low”, the signal frequency of band 3 is “Slightly low”, the signal frequency of band 4 is “Slightly low”, the signal frequency of band 5 is “Slightly low”, and the signal frequency of band 6 is “Slightly low”,

THEN, the decision function of this rule is given by: $f^{2} (x) = [\begin{matrix} 0.0738 - 0.0063 x_{1} - 0.0524 x_{2} + 0.2361 x_{3} - 0.1841 x_{4} + 0.0501 x_{5} - 2.5192 x_{6}, \\ - 0.1302 - 0.1970 x_{1} + 0.4392 x_{2} - 0.4387 x_{3} + 0.0878 x_{4} - 0.0654 x_{5} + 7.7283 x_{6} \end{matrix}] .$

The 3rd Fuzzy Rule:

IF the signal frequency of band 1 is “Medium”, the signal frequency of band 2 is “Medium”, the signal frequency of band 3 is “Medium”, the signal frequency of band 4 is “Medium”, the signal frequency of band 5 is “Medium”, and the signal frequency of band 6 is “Medium”,

THEN, the decision function of this rule is given by: $f^{3} (x) = [\begin{matrix} 0.0431 + 0.0301 x_{1} - 0.310 x_{2} + 0.2222 x_{3} + 0.0175 x_{4} - 0.0217 x_{5} + 0.2435 x_{6}, \\ - 0.5989 + 0.8546 x_{1} + 0.0517 x_{2} - 0.472 x_{3} + 0.3290 x_{4} - 0.5528 x_{5} - 1.1422 x_{6} \end{matrix}] .$

The 4th Fuzzy Rule:

IF the signal frequency of band 1 is “Slightly low”, the signal frequency of band 2 is “Slightly high”, the signal frequency of band 3 is “Slightly high”, the signal frequency of band 4 is “Slightly high”, the signal frequency of band 5 is “Slightly high”, and the signal frequency of band 6 is “Slightly high”,

THEN, the decision function of this rule is given by: $f^{4} (x) = [\begin{matrix} - 0.2228 + 0.0201 x_{1} + 0.1778 x_{2} + 0.0580 x_{3} + 0.0496 x_{4} - 0.1306 x_{5} - 1.6012 x_{6}, \\ 0.2046 - 0.2245 x_{1} + 0.2687 x_{2} - 0.3525 x_{3} - 0.0377 x_{4} - 0.0021 x_{5} + 4.8562 x_{6} \end{matrix}] .$

The 5th Fuzzy Rule:

IF, the signal frequency of the band is “Low”, the signal frequency of band 2 is “Low”, the signal frequency of band 3 is “Low”, the signal frequency of band 4 is “Low”, the signal frequency of band 5 is “Low”, and the signal frequency of band 6 is “Low”, THEN, the decision function of this rule is given by: $f^{5} (x) = [\begin{matrix} 0.0347 - 0.3575 x_{1} + 0.1489 x_{2} + 0.1612 x_{3} + 0.0886 x_{4} + 0.0554 x_{5} - 1.0392 x_{6}, \\ 0.0321 + 0.2020 x_{1} - 0.0473 x_{2} - 0.2564 x_{3} + 0.0793 x_{4} + 0.0482 x_{5} + 0.0941 x_{6} \end{matrix}] .$

Similarly, fuzzy systems constructed with WPD features, STFT features or KPCA features can also be explained accordingly. Figure 5 shows an example that details how the SDA-SSL-TSK-FS model uses rules to recognize epileptic signals. In Fig. 5, we use the SDA-SSL-TSK fuzzy system to diagnose EEG signals processed by using the WPD method. Using one-hot encoding to represent the output of the system, [1, 0] denotes healthy people, and [0, 1] denotes epilepsy patients. When a patient’s sample is obtained, such as Data = [29.8734,28.7518,28.376,27.5648,24.659,0.0423] in Fig. 5, we can input these data into our built SDA-SSL-TSK-FS model, and then the corresponding decision function values of each rule are computed, such as f2 = [–0.0154,0.7065] in Fig. 5. Finally, using Equation (3.f), the SDA-SSL-TSK-FS model obtains the output value Y = [–0.0154, 0.7065]. According to the “winner takes all” principle, we end up with Y = [0,1], which means that this person is a patient with epilepsy.

The below illustration proves that the proposed fuzzy system is an explanatory model for identifying patients with epilepsy based on the generated fuzzy rules.

3) Computational complexity

In this section, the computational complexity of the new method is compared with those of the other two methods. Table 9 reports the running times of the three observations.

The substitution of the TSK fuzzy system can solve the problem of long operating times based on TL or SSL.

The results show that the new method is better than the existing methods for epilepsy classification, not only in terms of recognition accuracy but also in terms of real-time performance.

Fig. 5

An example showing how to use the generated fuzzy rules and fuzzy system to identify epileptic patients, where “+” represents the merging operation, and “Max” means setting the maximum element in Y to 1 and the other elements to 0.

Table 9

Comparison of computational complexity for the SDA-SSL-TSK method and the SSL-based and TL-based methods on dataset D4

Feature Extraction Method\Algorithms	TL-based	SSL-based	SDA-SSL-TSK
WPD	0.163	1.613	0.009
STFT	0.140	1.595	0.009
KPCA	0.157	1.555	0.010

5 Conclusions

This paper proposes a new model for epilepsy classification that combines subdomain adaptation learning, semi-supervised learning and the TSK fuzzy system model to improve the accuracy and

interpretability of the classifier. Through comparative experiments, the results obtained by the new algorithm are better than those of most existing advanced algorithms. Moreover, in the experiment, only 50 labeled samples and 50 unlabeled samples are needed as training samples to achieve satisfactory accuracy. This shows that the method extracted in this article is effective for the classification of epileptic EEG signals. Using practical examples, the average accuracy of the proposed method is over 96%. Future research will focus on reducing the computational cost of the algorithm and extending it to other related applications, including brain-computer interfaces.

Footnotes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61702225, 61772241, and 61873321, in part by the 2018 Six Talent Peaks Project of Jiangsu Province under Grant XYDXX-127, in part by the Science and Technology Demonstration Project of Social Development of Wuxi under Grant WX18IVJN002, and in part by the Youth Foundation of the Commission of Health and Family Planning of Wuxi under Grant Q201654.

References

Iasemidis

L.D.

, et al., Adaptive epileptic seizure prediction system, IEEE Trans. Biomed. Eng 50(5) (2003), 616–627.

Benbadis

S.R.

and Hauser

W.A.

, An estimate of the prevalence of psychogenic non-epileptic seizures, Seizure 9(4) (2000), 280–281.

Srinivasan

, Eswaran

and Sriraam

, Artificial neural network based epileptic detection using time-domain and frequency-domain features, J Med Syst 29(6) (2005), 647–660.

Talevi

A.A.

, Cravero

M.S.

, Castro

E.A.

and Bruno-Blanch

L.E.

, Discovery of anticonvulsant activity of abietic acid through application of linear discriminant analysis, Bioorganic Med Chem Lett 17(6) (2007), 1684–1690.

Dorai

and Ponnambalam

, Automated epileptic seizure onset detection, in Proc Int ConfAuto Intell Syst (2010), 1–4.

Cortes

and Vapnik

, Support-vector networks, Mach Learn 20(3) (1995), 273–297.

Iscan

, Dokur

and Demiralp

, Classification of electroencephalogram signals with combined time and frequency features, Expert Syst Appl 38(8) (2011), 10499–10505.

Subasi

and Gursoy

M.I.

, EEG signal classification using PCA, ICA, LDA and support vector machines, Expert Syst Appl 37(12) (2010), 8659–8666.

Tzallas

A.T.

, Tsipouras

M.G.

and Fotiadis

D.I.

, Epileptic seizure detection in EEGs using time–frequency analysis, IEEE Trans Inf Technol Biomed 13(5) (2009), 703–710.

10.

Peng

and Lu

B.-L.

, Immune clonal algorithm based feature selection for epileptic EEG signal classification, in Proc 11th Int Conf Inf Sci Signal Process Appl (2012), 848–853.

11.

Guler

and Ubeyli

E.D.

, Multiclass support vector machines for EEG-signals classification, IEEE Trans Inf Technol Biomed 11(2) (2007), 117–126.

12.

Yang

, Deng

, Choi

K.-S.

, Jiang

and Wang

, Transductive domain adaptive learning for epileptic electroencephalogram recognition, Artif Intell Med 62(3) (2014), 165–177.

13.

Pan

S.J.

and Yang

, A survey on transfer learning, IEEE Trans Knowl Data Eng 22(10) (2010), 1345–1359.

14.

, Lance

B.J.

and Parsons

T.D.

, Collaborative filtering for braincomputer interaction using transfer learning and active class selection, PLoS ONE 8(2) (2013), e56624.

15.

Pan

S.J.

, Tsang

I.W.

, Kwok

J.T.

and Yang

, Domain adaptation via transfer component analysis, IEEE Trans Neural Netw 22(2) (2011), 199–210.

16.

Deng

, Jiang

, Chung

F.-L.

, Ishibuchi

and Wang

, “Knowledgeleverage-based fuzzy systemand itsmodeling,” IEEE Trans. Fuzzy Syst., vol. 21, no. 4, pp. 597–609, Aug. 2013.

17.

Deng

, Jiang

, Choi

K.-S.

, Chung

F.-L.

and Wang

, Knowledgeleverage-based TSK fuzzy system modeling, IEEE Trans. Neural Netw Learn Syst 24(8) (2013), 1200–1212.

18.

Quanz

and Huan

, Large margin transductive transfer learning, in Proc 18th ACM Conf Inf Knowl Manage (2009), 1327–1336.

19.

Azeem

M.F.

, Hanmandlu

and Ahmad

, Generalization of adaptive neuro-fuzzy inference systems, IEEE Trans Neural Netw 11(6) (2000), 1332–1346.

20.

Mamdani

E.H.

, Application of fuzzy logic to approximate reasoning using linguistic synthesis, IEEE Trans Comput C-26 (12) (1977), 1182–1191.

21.

Takagi

and Sugeno

, Fuzzy identification of systems and its applications to modeling and control, IEEE Trans Syst Man Cybern Syst SMC-15(1) (1985), 116–132.

22.

Deng

, Choi

K.-S.

, Chung

F.-L.

and Wang

, Scalable TSK fuzzy modeling for very large datasets using minimal-enclosing-ball approximation, IEEE Trans Fuzzy Syst 19(2) (2011), 210–226.

23.

Litt

, et al., Epileptic seizures may begin hours in advance of clinical onset: A report of five patients, Neuron 30(1) (2001), 51–64.

24.

Vivaldi

E.A.

and Bassi

A.A.

, Frequency domain analysis of sleep EEG for visualization and automated state detection, in Proc 28th Annu Int Conf IEEE Eng Med Biol Soc (2006), 3740–3743.

25.

Fong

G.C.Y.

, et al., Childhood absence epilepsy with tonic-clonic seizures and electroencephalogram 3–4-Hz spike and multispike–slow wave complexes: Linkage to chromosome 8q24, Amer J Hum Genet 63(4) (1998), 1117–1129.

26.

Blanco

, Kochen

, Rosso

O.A.

and Salgado

, Applying timefrequency analysis to seizure EEG activity, IEEE Eng Med Biol Mag 16(1) (1997), 64–71.

27.

Zhang

, Kawabata

and Liu

Z.Q.

, EEG analysis using fast wavelet transform, in Proc IEEE Int Conf Syst Man Cybern (2000), 2959–2964.

28.

Meng

, Yao

, Sheng

, Zhang

and Zhu

, Simultaneously optimizing spatial spectral features based on mutual information for EEG classification, IEEE Trans Biomed Eng 62(1) (2015), 227–240.

29.

Daly

J.J.

and Huggins

J.E.

, Brain-computer interface: Current and emerging rehabilitation applications, Arch Phys Med Rehabil 96(3) (2015), S1–S7.

30.

Bezdek

J.C.

, Pattern Recognition With Fuzzy Objective Function Algorithms, NewYork,NY, USA: Plenum, (1981).

31.

Ito

and Nakano

, Optimizing support vector regression hyperparameters based on cross-validation, in Proc Int Joint Conf Neural Netw (2003), 2077–2082.

32.

Jiang

, Chung

F.-L.

, Ishibuchi

, Deng

and Wang

, Multitask TSK fuzzy system modeling by mining intertask common hidden structure, IEEE Trans Cybern 45(3) (2015), 534–547.

33.

Chapelle

, Sindhwani

and Keerthi

S.S.

, Optimization techniques for semi-supervised support vector machines, J Mach Learn Res 9(6) (2008), 203–233.

34.

Y.-F.

and Zhou

Z.-H.

, Towards making unlabeled data never hurt, in Proc 28th Int Conf Mach Learn (2011), 1081–1088.

35.

Long

, Wang

, Ding

, Shen

and Yang

, Transfer learning with graph co-regularization, IEEE Trans Knowl Data Eng 26(7) (2014), 1805–1818.

36.

Demšar

, Statistical comparisons of classifiers over multiple data sets, J Mach Learn Res 7 (2006), 1–30.

37.

García

and Herrera

, An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comons, J Mach Learn Res 9 (2008), 2677–2694.

38.

Jiang

, Deng

, Chung

F.-L.

and Wang

, Multi-task TSK fuzzy system modeling using inter-task correlation information, Inf Sci 298 (2015), 512–533.

39.

Jiang

, Chung

F.-L.

, Wang

, Deng

, Wang

and Qian

, Collaborative fuzzy clustering from multiple weighted views, IEEE Trans Cybern 45(4) (2015), 688–701.

40.

, Lance

and Lawhern

, Transfer learning and active transfer learning for reducing calibration data in single-trial classification of visually-evoked potentials, in Proc IEEE Int Conf Syst Man Cybern (SMC) (2014), 2801–2807.

41.

Andrzejak

R.G.

, Lehnertz

, Rieke

, Mormann

, David

and Elger

C.E.

, Indications of nonlinear deterministic and finitedimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Top 64(6) (2011), 061907.

42.

Dhulekar

, Nambirajan

, Oztan

and Yener

, Seizure prediction by graph mining, transfer learning, and transformation learning, in Proc Int Conf Mach Learn Data Mining (2015), 32–52.

43.

Zhang

, Zhou

, Jin

, Wang

and Cichocki

, L1-regularized multiway canonical correlation analysis for SSVEPbased BCI, IEEE Trans Neural Syst Rehabil Eng 21(6) (2013), 887–896.

44.

Jialin Pan

and Yang

, A survey on transfer learning, IEEE Trans Knowl Data Eng 22(10) (2010), 1345–1359.

45.

Zhuang

, Cheng

, Luo

, Pan

S.J.

and He

, Supervised representation learning: Transfer learning with deep autoencoders, in Proc IJCAI (2015), 4119–4125.

46.

Zhuang

, et al., A comprehensive survey on transfer learning, (2019), arXiv:1911.02685. [Online]. Available: http://arxiv.org/abs/1911.02685.

47.

Long

, Cao

, Wang

and Jordan

, Learning transferable features with deep adaptation networks, in Proc. ICML (2015), 97–105.

48.

Sun

and Saenko

, Deep CORAL: Correlation alignment for deep domain adaptation, in Proc. ECCV (2016), 443–450.

49.

Ganin

, et al., Domain-adversarial training of neural networks, J Mach Learn Res 17(1) (2016), 2030–2096.