Research on arrhythmia recognition by using convolutional neural network in ECG images

Abstract

Background

Determining the type of arrhythmia is crucial for prevention and early diagnosis of cardiovascular diseases.

Objective

This aims to address potential information loss caused by preprocessing, improve model performance, and accurately identify multiple types of arrhythmias.

Methods

This study proposes the use of wavelet transform denoising and convolutional neural network (CNN) model to classify and identify six types of arrhythmias. The original electrocardiosignal was transformed into a two-dimensional gray image by construction, and the data were amplified by fixed template clipping. Then, six arrhythmias were identified using an improved two-dimensional CNN model.

Results

The classification accuracy, sensitivity, and specificity of the proposed method reached 90.50%, 81.70%, and 97.16%, respectively, and six types of arrhythmias were accurately identified.

Conclusions

The results showed that the wavelet transform as a preprocessing method can effectively improve the classification accuracy of the multiple types of arrhythmias. The method proposed in this study can provide a new reference for clinicians in diagnosing arrhythmia.

Keywords

arrhythmia classification convolutional neural network deep learning electrocardiogram

1 Introduction

The incidence and mortality rates of cardiovascular diseases in the global society have increased rapidly year by year, and have surpassed cancer as the “first killer” of human health threats. The situation regarding cardiovascular disease prevention and control in China remains grim.¹ In most patients with cardiovascular diseases, the disease is often accompanied by arrhythmia, and different types of cardiovascular diseases have different arrhythmia characteristics; therefore, in the diagnosis of cardiovascular disease it is crucial to accurately determine the type of arrhythmia.²

An electrocardiogram (ECG) records the changes in electromotive force, caused by the beating of the heart, on the surface of the human skin and visually reflects the working condition of the patient's heart in image form. Therefore, the electrocardiogram is the most intuitive and convenient tool for diagnosing arrhythmia.³ The recognition and diagnosis using traditional ECGs mainly rely on the personal experience of doctors, and manual diagnosis is time-consuming and laborious.

With time, using machine learning technology to assist doctors in electrocardiogram diagnosis can quickly process large amounts of ECG data, reducing the burden on doctors’ diagnoses.⁴ Common machine learning methods include support vector machines (SVM),⁵ random forest,⁶ and neural networks.⁷ Acharya et al.⁸ preprocessed the ECG signals of seven patients with coronary heart disease, 148 patients with myocardial infarction, and 52 controls from the PTB Diagnostic ECG database and the St Petersburg INCART 12-lead ECG database and extracted features using discrete wavelet transforms, empirical mode decomposition, and discrete cosine transform. Combined with the nearest neighbour algorithm for classification, they obtained classification accuracies, sensitivities, and specificities of 98.50%, 99.70%, and 98.50%, respectively. Mohebbanaaz et al.⁹ utilised ECG signals to extract seven temporal features and seventeen morphological features, and combined decision trees, optimised decision trees, and adaptive enhanced optimised decision trees to distinguish six types of heartbeats. The experimental results verified that the adaptively enhanced optimised decision tree performed better than the other two methods, achieving a classification accuracy of 98.77%. These studies conducted tedious calculations on the fixed waveform features of the ECG signals, resulting in good classification effects. In traditional machine-learning methods, the impact of early signal analysis and complicated feature extraction on the results is crucial. However, these methods rely on the professional knowledge of special groups and have certain limitations.¹⁰

With the rapid development of artificial intelligence, deep learning has gradually become a popular research topic in cardiovascular disease diagnosis both at home and abroad. Common algorithms include convolutional neural networks (CNN),¹¹ long short-term memory network (LSTM),¹² recurrent neural networks,¹³ and regularized autoencoders.¹⁴ Ribeiro et al.¹⁵ constructed a deep neural network model trained on a large number of labelled signals and used it to classify arrhythmia in 12 lead electrocardiograms. The results showed that it was useful to cardiologists in identifying six types of abnormal heart rhythms, with the F1 score exceeding 80% and specificity exceeding 99%. Zhang et al.¹⁶ proposed a novel wavelet multiresolution convolutional neural network, which avoids the complex process of extracting features of the target signal; an average recognition rate of 93.5% was achieved for classifying the presence or absence of severe heart disease on eight ECG datasets. Kiranyaz et al.¹⁷ proposed an adaptive one-dimensional CNN that integrates the feature extraction and classification of ECGs into a single learning body. A targeted CNN model is trained using individual patient data and is suitable for most ECGs owing to its simplicity and parameter invariance. Yildirim et al.¹⁸ proposed a novel deep learning method for detecting cardiac arrhythmias based on long-term ECG signal analysis, achieving an overall recognition rate of 91.33% for classifying 17 types of cardiac arrhythmias using 10 s ECG signals. Warrick et al.¹⁹ combined CNN and LSTM models and adopted aggregation, elimination, and normalisation techniques to improve the accuracy, achieving an accuracy rate of 82%. Yildirim et al.²⁰ proposed a nonlinear compression structure based on a convolutional autoencoder that utilises electrocardiogram features to automatically identify arrhythmias and deeply encodes these features with a convolutional autoencoder network. It was verified that using an LSTM network for data analysis could significantly reduce the calculation time of the model and achieve an accuracy of 99.0%. In addition, there were studies that have specifically designed to symbolise ECG signals for processing beat classification in different patients. By jointly representing the shape and rhythm of the beats and combining with baseline correction, the differences in beats between different patients are alleviated. A multiview convolutional neural network is used to classify five arrhythmias in the Massachusetts Institute of Technology Beth Israel Hospital (MIT-BIH) database, with an overall accuracy of 96.40%.²¹ These studies indicate that the use of deep-learning technology has significant social value in electrocardiogram diagnosis. However, in existing research, there are few enhancement methods for imbalanced public data, and there are generally problems such as low waveform restoration and loss of denoising information, resulting in poor generalisation ability of the model. Additionally, inappropriate segmentation can easily change the original meaning of the signal.

In this study, one-dimensional ECG signals were converted into two-dimensional images and combined with an improved deep learning model to accurately identify six arrhythmias and assist clinicians in efficiently diagnosing patients with arrhythmias. The main content and research framework of this study are shown in Figure 1.

Figure 1.

Research framework.

The main work of this study is summarized as follows:

The validity of classification of five arrhythmias using the wavelet transform combined with a one-dimensional CNN model was verified. The effect of wavelet transform pre-treatment on the five arrhythmia classifications was investigated, and the classification accuracies of the model before and after signal pre-treatment were 81.08% and 99.43%, respectively.

The effectiveness of converting one-dimensional ECG signals into two-dimensional grayscale images was confirmed. A data enhancement method was proposed to overcome the problem of loss of useful information in the original denoised signal and better reflect the potential feature pattern in the original one-dimensional time series to mine the characteristics of cardiac abnormalities in patients with cardiovascular disease from multiple angles.

An improved two-dimensional CNN model combined with the ECG grayscale image method was proposed to classify six types of cardiovascular diseases, effectively solving the problem of heartbeat segmentation. A classification accuracy of 90.49% was achieved using the MIT-BIH database.

The remainder of this paper is organised as follows: Section 2 introduces the data and methods used in this study. Section 3 reports the experimental results. Section 4 discusses the results, and the findings of the study are summarised in Section 5.

2 Methods

2.1 Data

Data were obtained from the Massachusetts Institute of Technology Beth Israel Hospital (MIT-BIH) Arrhythmia database.²² This database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings with a sampling rate of 360 Hz. Twenty-three recordings were chosen randomly from a set of 4000 24-h ambulatory ECG recordings. Each record stores ECG data in three different formats, including a header file (. HEA), an ECG data file (. DAT) and an annotation file (.ATR). The ECG signal was clipped by manual annotation of the .ATR file. Using the R wave point as the reference point, 99 and 200 signal points were intercepted before and after the R point, respectively, to form a complete heartbeat. Meanwhile, other relevant personnel in the laboratory who have experience in data interception also conduct data checks to ensure the validity of the data.The average length of the heartbeats after sampling was 300, with a duration of approximately 0.8 s In addition, a backpropagation (BP) neural network was used to locate the ECG waveform and a db1 wavelet was used to decompose the signal. The number of network training iterations was 1000 times, the minimum error of the training target was set to 0.0002, and the learning rate was set to 0.0003.

Table 1 lists the details of the five arrhythmia beats according to the (AAMI) EC57 criteria. ‘N’ is Normal Sinus Rhythm (NOR)’, V’ is Premature Ventricular Contraction (PVC)’, A’ refers to Atrial Premature Contraction (APC)’, R’ refers to the Right Bundle Branch Block (RBBB)’, L’ is the Left Bundle Branch Block (LBBB), and ‘P’ is the Paced Beat (PAB).²³ The number of samples of ‘N’, ‘V’, ‘A’, ‘R’, ‘L’ and ‘P’ types is 75051, 7130, 2545, 7258, 8074, and 7028, respectively.

Table 1.
MIT-BIH arrhythmia beat information.

Beat Record Sample number

N 100,101,103,105,108,112,113,114,115,117,121,122,123,202,203,205,219,230,234 75051

V 106,116,119,200,201,203,208,210,213,215,221, 228,233 7130

A 209,220,222,223,232 2545

R 118,124,212,231 7258

L 109,111,207,213 8074

P 102,104,107,217 7028

Total 107086

Beat	Record	Sample number
N	100,101,103,105,108,112,113,114,115,117,121,122,123,202,203,205,219,230,234	75051
V	106,116,119,200,201,203,208,210,213,215,221, 228,233	7130
A	209,220,222,223,232	2545
R	118,124,212,231	7258
L	109,111,207,213	8074
P	102,104,107,217	7028
Total		107086

2.2 Pretreatment

Wavelet transform is a time–frequency domain multi-resolution analysis method that can be applied to different fields. It is suitable for nonstationary signal denoising and parameter feature extraction and is an ideal method for ECG signal denoising.²⁴ The mathematical expression is given in Equation (1).

\begin{aligned} X_{w} (a, b) = \frac{1}{{| a |}^{1 / 2}} \int_{- \infty}^{\infty} x (t) \bar{ϕ} (\frac{t - b}{a}) d t \end{aligned}

(1)

where

x (t)

is the signal,

ϕ (t)

is the continuous mother wave, a is the scale factor, and b is the translation factor. The difference between the scale and shift factors can be used for continuous and discrete wavelet analyses. Wavelet transform processing includes both soft and hard thresholds. A soft-threshold filtering method is adopted in the denoising process.²⁵ The ECG signal has the characteristics of a weak signal with low amplitude, and the frequency range is usually 0.05 Hz–100 Hz. In this study, the db5 wavelet basis function was used to denoise the ECG signals. Detailed information on the wavelet transform is shown in Figure 2. The ‘pywt’ function is used in the time–frequency analysis of ECG signal. The time-domain signal is transformed into a frequency-domain signal by 9-scale transformation, and the db5 wavelet basis function is used to preprocess the signal. The scale coefficients of the wavelet transform are listed in Table 2.

Figure 2.

Wavelet transform diagram.

Table 2.

Scale coefficients of the wavelet transform.

Signal decomposition	Frequency range (Hz)
D1	90–180
D2	45–90
D3	22.5–45
D4	11.2–22.5
D5	5.6–11.2
D6	2.8–5.6
D7	1.4–2.8
D8	0.7–1.4
D9	0.3–0.7
A9	0–0.3

2.3 CNN model

A CNN is a classic deep learning framework mainly used in the fields of computer vision and natural language processing.^26–27 During operation, the CNN learns features through the convolutional and pooling layers and then inputs the features into the fully connected layer. After calculating the loss function, the model itself is adjusted through backpropagation of the features to ensure that the model reaches the most accurate state. A typical CNN framework, which includes a convolutional layer, an undersampling layer, and a fully connected layer, is shown in Figure 3. The convolution layer extracts the higher features from the image, and the convolution operation enhances the original signal and reduces noise. In addition, the weights of all the convolution kernels are shared, which significantly reduces the free parameters of the neural network. The subsampling layer extracts features from the feature map output of the convolution layer and then converts the extracted features into a one-dimensional vector as the input to the classifier. The loss function gradient of each weight is calculated using the chain rule and backpropagation algorithm, and the weight is then updated according to the gradient descent formula until the optimal weight and bias are obtained.

Figure 3.

Typical CNN structure.

2.3.1 Construction of one-dimensional CNN model

In this study, a one-dimensional CNN model was used to learn and process the features of the one-dimensional ECG signals. The four convolution kernels and fully connected layer structures consisting of a one-dimensional convolution layer and a pooling layer are sequentially connected. A structural diagram of the model is shown in Figure 4.

Figure 4.

Structure of one-dimensional CNN model.

The ECG data are fed into the construction model, the convolution operation with a step size of 1 is carried out through a 21 × 1 convolution kernel, and a 4 × 300 convolution feature map is obtained. After the convolution operation, the generated feature map is fed into a pooling layer with a kernel size of 3 × 1, and a feature map of 4 × 150 is obtained. Subsequently, the feature map, obtained after the four-layer convolutional block operation, is input into the fully connected layer for expansion, and the model is continuously optimised and adjusted by backpropagation. For the selection of the CNN model parameters, the batch size is set to 128, and the learning rate is set to 0.001. The model underwent 30 iterations, and the loss value was calculated using the loss function at each iteration. The detailed parameters of the CNN model are listed in Table 3. The formula for convolution is given by Equation (2).

\begin{aligned} χ_{n} = \sum_{k - 0}^{N - 1} y_{k} f_{n - k} \end{aligned}

(2)

where f represents the filter, y represents the signal, and N represents the number of signal elements.

Table 3.

Detailed parameters of one-dimensional CNN model.

Layer	Type	Kernel size	Stride	Output size
1	Conv1D	21 × 1	1	4 × 300
2	Pool	3 × 1	2	4 × 150
3	Conv1D	23 × 1	1	16 × 150
4	Pool	3 × 1	2	16 × 75
5	Conv1D	25 × 1	1	32 × 75
6	Pool	3 × 1	2	32 × 38
7	Conv1D	27 × 1	1	64 × 38
8	Fully-connected	–	–	38 × 64

In one-dimensional CNN network training, all the heartbeat signals were read into two one-dimensional lists of the CNN model. Each element label in the array corresponded to a 200-point ECG heartbeat. Abnormal data were excluded. Two one-dimensional lists were combined, out-of-order processing was performed while maintaining the original correspondence, and an initial dataset of 92193 heartbeats to be used for training was generated.

Among the evaluation indicators, the accuracy (Acc), sensitivity (Sen), specificity (Spe), precision (Pre), and F1 were used to evaluate the effectiveness of the model. The formulas for calculating each indicator are given in Equations 3–7.

\begin{aligned} Acc & = \frac{T P + T N}{T P + T N + F P + F N} \times 100 % \end{aligned}

(3)

\begin{aligned} Spe & = \frac{T N}{T N + F P} \times 100 % \end{aligned}

(4)

\begin{aligned} Sen & = \frac{T P}{T P + F N} \times 100 % \end{aligned}

(5)

\begin{aligned} Pre & = \frac{TP}{TP + FP} \times 100 % \end{aligned}

(6)

\begin{aligned} F 1 & = \frac{2 * Pre * Sen}{Pre + Sen} \end{aligned}

(7)

where TN represents the number of ECG samples in the dataset whose true class is positive and is correctly classified by the model. FP represents the number of ECG samples in the dataset whose true class is positive but are misclassified by the model. TP represents the number of ECG samples in the dataset whose true class is negative and are correctly classified by the model. FN represents the number of ECG samples in the dataset whose true class is negative but are incorrectly classified by the model.

2.3.2 Two-dimensional CNN model

To further improve the performance of the model, this study proposes method for converting a one-dimensional raw signal to a two-dimensional image which is combined with a two-dimensional CNN model to classify six types of arrhythmias. The research framework is illustrated in Figure 5.

(1)
One-dimensional signal to two-dimensional image method

Figure 5.
Block diagram of two-dimensional CNN model.

When converting one-dimensional ECG signals into two-dimensional images, Matlab and Python3.8 are used for visual conversion. The image is constructed with time as the X-axis variable and the amplitude corresponding to a given time as the Y-axis variable. When one-dimensional ECG signal data are converted into a two-dimensional matrix, the x-direction is arranged according to the original one-dimensional sampling points. The corresponding amplitude of the sampling point is obtained and the two-dimensional matrix is filled with the corresponding position 1 in the y-direction and 0 in the remaining positions. Thus, the conversion of one-dimensional signal to a two-dimensional image is completed.

To ensure the integrity of each cardiac image, the manually labelled R peaks in the MIT-BIH database are used as reference points for cropping, and 20 data points before and after the middle point of the two R peaks are discarded. The remaining data points were ECG cardiogram images. To expand the data set, six types of heartbeats, including ‘N’, ‘V’, ‘A’, ‘R’, ‘L’, and ‘P’ types, are selected. The paced beat (PAB) type is added. Using the above cutting method, 105,923 128 × 128 single-cardiogram images were obtained. (2)
Data enhancement

In the MIT-BIH database, the number of heartbeats for different disease types is unbalanced. To avoid the influence of this imbalance on classification, except for ‘N’ type, the other types of beat images were clipped in a specific way to achieve the purpose of data amplification. The original 128 × 128 size image was divided nine times according to the 96 × 96 size format, and the heartbeat images of the nine parts were obtained: top left, top middle, top right, middle left, middle right, bottom left, bottom middle, and bottom right. It was then scaled to 128 × 128 pixels to achieve data amplification and maximum feature retention.

After converting the signal into a two-dimensional image and amplifying the heartbeats, the heartbeats in each category were divided into training and testing sets in a 7:3 ratio by stratified sampling, resulting in 108332 image data points for the deep learning network. (3)
Construction of two-dimensional CNN model

The improved two-dimensional CNN model used in this study is shown in Figure 6. The original one-layer convolutional kernel is transformed into three convolutional kernel modules. Each convolution kernel is promoted from a single convolution layer and a pooled layer to a structure of two two-dimensional convolution layers and pooled layers. The fully connected layer structure at the end of the model is unchanged, the features extracted after three two-dimensional convolution layers are fully connected at the end, and the effect of model operation is adjusted by feedback.

Figure 6.
Improved CNN model.

In the convolution operation, the two-dimensional graph is regarded as a numerical matrix, and the whole image is scanned by the two-dimensional convolution kernel and the interpretable features are explored. When the input image is square and the size is n × n, the output matrix A is given by equation (8).
$\begin{aligned} A = [\frac{n + 2 p - f}{s} + 1] \times [\frac{n + 2 p - f}{s} + 1] \end{aligned}$
(8)

The pooling layers reduce the overall size of the model by pooling, while retaining important features. The internal feature maps of the pooling and convolutional layers are connected, and the number of feature maps for both layers is the same. When the input matrix size of the pooling layer is $n_{H} \times n_{W} \times n_{C}$ , the filter is $f \times f$ , and the step is s, the output matrix B is as shown in Equation (9).
$\begin{aligned} B = [\frac{n_{H} - f}{s} + 1] \times [\frac{n_{w} - f}{s} + 1] \times n_{c} \end{aligned}$
(9)

In this study, ECG images with dimensions of 128 pixels × 128 pixels are fed into a 2D CNN model. In the first layer of the convolution block, 3 × 3 convolution checks were performed for the convolution operation of the 2D ECG images, and the pool layer convolution kernel size was set to 2 × 2. The steps of the 2D convolution and pooling layers are set to 1 and 2, respectively. The CNN network was selected with 10 iterations, the model learning speed was accelerated by the GPU operation, and the model learning rate was set to 0.001. The convolution and pooling layers together constitute the feature extractor of the 2D CNN. The detailed parameters of the 2D CNN used are listed in Table 4.

Table 4.
Detailed parameters of 2D CNN model.

Layer Type Kernel size Stride Kernel Intput size

1 Conv2D 3 × 3 1 64 128 × 128 × 1

2 Conv2D 3 × 3 1 64 128 × 128 × 64

3 Pool 2 × 2 2 – 128 × 128 × 64

4 Conv2D 3 × 3 1 128 64 × 64 × 64

5 Conv2D 3 × 3 1 128 64 × 64 × 128

6 Pool 2 × 2 2 – 64 × 64 × 128

7 Conv2D 3 × 3 1 256 32 × 32 × 128

8 Conv2D 3 × 3 1 256 32 × 32 × 256

9 Pool 2 × 2 2 – 32 × 32 × 256

10 Fullyconn-ected – – 2048 16 × 16 × 256

11 Out – – 8 2048

3 Results

Layer	Type	Kernel size	Stride	Kernel	Intput size
1	Conv2D	3 × 3	1	64	128 × 128 × 1
2	Conv2D	3 × 3	1	64	128 × 128 × 64
3	Pool	2 × 2	2	–	128 × 128 × 64
4	Conv2D	3 × 3	1	128	64 × 64 × 64
5	Conv2D	3 × 3	1	128	64 × 64 × 128
6	Pool	2 × 2	2	–	64 × 64 × 128
7	Conv2D	3 × 3	1	256	32 × 32 × 128
8	Conv2D	3 × 3	1	256	32 × 32 × 256
9	Pool	2 × 2	2	–	32 × 32 × 256
10	Fullyconn-ected	–	–	2048	16 × 16 × 256
11	Out	–	–	8	2048

This study was conducted using a TensorFlow framework based on deep learning. The working environment of the neural network was the high-level packaged interface Keras software in Python 3.8, Tensorflow 2 installed under Windows 10, and the front-end operation was in Pycharm software. The computer CPU used in this experiment was an Intel Core i9 CPU with a main frequency of 3.6 GHz, 64 GB of memory, and two NVIDIA GTX 1080Ti GPU graphics cards.

3.1 Pretreatment results

The first 1500 signal points intercepted before and after the pretreatment were compared, and the results are shown in Figure 7. Compared with the original ECG signal, the preprocessed signal was smooth and free of burrs. It can be seen that wavelet threshold method can effectively remove noise in the ECG signal and obtain pure physiological signal.

Figure 7.

Signal comparison before and after pretreatment. (a) The original ECG signal; (b) Pre-processed ECG signals.

3.2 Classification results

3.2.1 Classification results based on one-dimensional CNN model

In the experiment, 92193 pieces of data were divided into training and test sets in the ratio of 7:3. The training accuracy and loss rate of the model are shown in Figure 8. The classification confusion matrix of the proposed method for the five types of arrhythmias is presented in Table 5, and the classification results are listed in Table 6.

Figure 8.

Accuracy and loss rate curves of the model. (a) Accuracy curve of the model; (b) Loss rate curve of the model.

Table 5.

Confusion matrix of one-dimensional CNN model.

	Prediction
Real	NOR	APC	PVC	LBBB	RBBB
NOR	21375	28	40	4	1
APC	49	556	6	0	0
PVC	14	1	2093	3	1
LBBB	3	0	2	1950	0
RBBB	4	1	1	0	1526

Table 6.

Classification results of one-dimensional CNN model.

	Sensitivity(%)	Specificity(%)	Precision(%)	F1(%)	Accuracy(%)
NOR	99.66	98.87	99.67	99.67	99.43
APC	91.00	99.89	94.88	92.90	–
PVC	99.10	99.81	97.71	98.40	–
LBBB	99.74	99.97	99.64	99.69	–
RBBB	99.61	99.99	99.87	99.74	–
Mean	97.82	99.71	98.35	98.08

In the experimental results, wavelet transform denoising combined with QRS group recognition was used to classify the heartbeats. The classification accuracy, sensitivity, and specificity of the one-dimensional CNN model were 99.43%, 97.82% and 99.71%, respectively.

To verify the effect of wavelet transform denoising on the accuracy of the CNN model in recognising multiclass arrhythmias, the original 92193 heartbeat signals were input into the CNN model; the classification accuracy was 81.08%, and the model loss was approximately 0.7505. The experimental results showed that the classification accuracy of the five types of arrhythmias improved by 18.35% using wavelet transforms in combination with a CNN model. The experiment showed that the classification accuracy of the five types of arrhythmias improved by 18.35% using the wavelet transform in combination with the CNN model. The results indicate that the proposed model can significantly improve the classification of five types of arrhythmia.

3.2.2 Classification results based on 2D CNN network model

Because of the large sample size of the dataset, the model was selected for 10 iterations, and the training time was 6 h. The classification accuracy, specificity, and sensitivity of the 2D model were 90.50%, 91.01%, and 89.96%, respectively. The results showed that the six types of arrhythmias could be effectively be classified using the constructed two-dimensional ECG image combined with the improved 2D CNN model. Tables 7 and 8 list the confusion matrix and classification accuracy of the 2D CNN model, respectively.

Table 7.
Confusion matrix of 2D CNN model.

Prediction

Real NOR APC PVC LBBB RBBB PAB

NOR 69954 1189 1329 741 553 257

APC 1160 2036 51 9 635 22

PVC 930 37 6153 18 36 44

LBBB 362 18 894 7161 22 232

RBBB 607 37 5 138 6443 244

PAB 516 18 71 17 95 6298

	Prediction
NOR	69954	1189	1329	741	553	257
APC	1160	2036	51	9	635	22
PVC	930	37	6153	18	36	44
LBBB	362	18	894	7161	22	232
RBBB	607	37	5	138	6443	244
PAB	516	18	71	17	95	6298

Table 8.

Classification results of 2D CNN model.

	Sensitivity(%)	Specificity(%)	Precision(%)	F1(%)	Accuracy(%)
NOR	94.50	89.58	95.14	94.82	90.50
APC	52.03	98.76	61.05	56.18	–
PVC	85.25	97.68	72.36	78.28	–
LBBB	82.41	99.07	88.58	85.38	–
RBBB	86.21	98.67	82.77	84.45	–
PAB	89.78	99.21	88.74	89.26	–
Mean	81.70	97.16	81.44	81.40

4 Discussion

In this study, one-dimensional ECG signals were converted into two-dimensional images and the data were amplified. After the dataset was input into the constructed 2D CNN model, classification accuracy, specificity, and sensitivity of 90.50%, 97.16% and 81.70%, respectively, were obtained. It was proven that the proposed ECG signal transformation method combined with the 2D CNN model has a good effect on the recognition of the six types of arrhythmias. The data are not subjected to noise removal or manual marking operations before conversion, which simplifies the calculation process of ECG signal analysis and processing, reduces the possibility of the one-dimensional ECG data losing some information owing to filtering operations, and ensures the integrity of the ECG data when entering deep learning networks to the greatest extent. The training time of the constructed two-dimensional CNN model was reduced to 6 h, which improved the efficiency of the system. The results show that the proposed method can improve the recognition of the six types of arrhythmias.

Because input image to the 2D CNN model is converted from the original ECG signal, to reduce variables, it is necessary to determine whether the direct conversion of the ECG signal to a two-dimensional ECG image is helpful in improving the classification. Therefore, the calculation results of two-dimensional ECG image combined with the 2D CNN model (six-classification accuracy of 90.50%) and original one-dimensional ECG signal combined with the CNN model (five-classification accuracy of 81.08%) were compared. The results showed that the classification accuracy of the improved 2D CNN model increased by 9.42%, whereas the classification types increased after the ECG signals were graphed, which proved that the method of converting one-dimensional ECG signals into two-dimensional images combined with the improved CNN model in this study was helpful in improving the recognition results of six types of arrhythmias.

It can be seen from Table 8 that the 2D CNN model used in this study achieved good classification results for most heartbeat signals. However, APC has a high confusion rate and yields poor classification results. This may be owing to the small number of APC data samples in the original dataset; the more unbalanced the dataset, the smaller the probability that the network model will correctly classify the beat with a small number of samples.²⁸ Simultaneously, it is also possible that the waveforms are highly similar to the NOR, which leads to classification errors in the model. Table 5 also shows that 8.1% (49) of the heartbeats were mistaken for the NOR type. The objective of this study was to explore the classification results of the original heartbeat type under near-actual medical conditions. To improve the classification performance of the model, it is necessary to further enhance the data and add different data samples.

The results of this study were compared with those of other relevant studies based on the MIT-BIH database, and the comparison is presented in Table 9. All experimental data in the table were obtained from the MIT-BIH arrhythmia database. In related studies, Murat et al.²⁹ identified five types of arrhythmias by constructing a CNN and LSTM using 10,022 cardiac beat signals in the arrhythmia database with an accuracy rate of 99.26%. In this study, the normalisation method was used for processing, and the ability of the model to identify the original data was still weak. Hui et al.³⁰ proposed using the 7-layer mixed model of a CNN kernel to extract local features of the beat and combined it with an extreme learning machine to perform 10-fold cross-validation classification of four types of arrhythmias, achieving a classification accuracy of 99.16%. There were relatively few arrhythmia types in this study, and there is a possibility that the model could learn the data in advance in the ten-fold cross-validation. Liang et al.³ used a wavelet transform to decompose ECG signals and them combined with recursive maps to convert them into two-dimensional texture images. They adjusted the weight coefficients of the visual converter to identify five types of arrhythmias with an accuracy rate of 97.38%. Zhou et al.³¹ proposed a dual-threshold design for the detection of the QRS complex, combined with the CNN method to extract features, and used an extreme learning machine to classify ECG signals, which not only achieved high accuracy but also had good computational efficiency. However, the feature extraction ability of the network is limited; thus, the generalisation ability of the model is not strong. Compared with the above studies, this study uses the CNN model combined with the wavelet transform to classify and identify five types of arrhythmias, and the classification accuracy can reach 99.43%, which is significantly better than that of existing methods. Simultaneously, the one-dimensional signal was transformed into a two-dimensional image to improve the model performance and enhance the model recognition ability, and six types of arrhythmias were identified by the improved CNN model, with a classification accuracy of 90.50%. Using experiments the methods proposed in this study have been compared with several mainstream algorithms, which demonstrates the effectiveness of the proposed methods and can provide practical reference values for clinicians and researchers.

Table 9.
Comparison with existing models.

Author, Year Classification type Pretreatment Classification model Result

Murat²⁹, 2020 Five class (NOR:75020, LBBB:8072, RBBB:7255, APB:2546, PVC:7129) Normalization CNN + LSTM Acc: 99.26%

Zhou³¹, 2020 Four class (N: 90053, S: 2781, V: 7023,F:802) CNN Extreme learning machine Acc: 98.77% Sen: 98.87% Spe: 96.35%

Xiong Hui³⁰, 2021 Four class (N: 90126, S: 2779, V: 7043, F:814) Hybrid nuclear CNN Extreme learning machine Acc: 99.16% Sen: 99.85% Spe: 98.89%

Han Liang³, 2022 Five class (N: 40000, S: 2777, V: 7226, F: 802, Q: 8031) Wavelet transform Improved vision converter Acc: 97.38%

Lv Hang², 2023 Four class (N: 90042, S: 7007, V: 2779, F: 802) Fusion feature CNN model Acc: 94.67%

This research Five class (NOR:75051, LBBB:8074, RBBB:7258, APC:2545, PVC:7130) Wavelet transform CNN model Acc: 99.43%Sen: 97.82%Spe: 99.71%

Six class(NOR:75051, LBBB:8074, RBBB:7258, APC:2545, PVC:7130, PAB:7028) Two-dimensional image Improved CNN model Acc: 90.50%Sen: 89.96%Spe: 91.01%

Author, Year	Classification type	Pretreatment	Classification model	Result
Murat²⁹, 2020	Five class (NOR:75020, LBBB:8072, RBBB:7255, APB:2546, PVC:7129)	Normalization	CNN + LSTM	Acc: 99.26%
Zhou³¹, 2020	Four class (N: 90053, S: 2781, V: 7023,F:802)	CNN	Extreme learning machine	Acc: 98.77% Sen: 98.87% Spe: 96.35%
Xiong Hui³⁰, 2021	Four class (N: 90126, S: 2779, V: 7043, F:814)	Hybrid nuclear CNN	Extreme learning machine	Acc: 99.16% Sen: 99.85% Spe: 98.89%
Han Liang³, 2022	Five class (N: 40000, S: 2777, V: 7226, F: 802, Q: 8031)	Wavelet transform	Improved vision converter	Acc: 97.38%
Lv Hang², 2023	Four class (N: 90042, S: 7007, V: 2779, F: 802)	Fusion feature	CNN model	Acc: 94.67%
This research	Five class (NOR:75051, LBBB:8074, RBBB:7258, APC:2545, PVC:7130)	Wavelet transform	CNN model	Acc: 99.43%Sen: 97.82%Spe: 99.71%
	Six class(NOR:75051, LBBB:8074, RBBB:7258, APC:2545, PVC:7130, PAB:7028)	Two-dimensional image	Improved CNN model	Acc: 90.50%Sen: 89.96%Spe: 91.01%

This study has several limitations. First, the classification accuracy of the proposed method can be further improved, and the factors affecting the accuracy of the model should be further studied and analysed to reduce the degree of classification confusion for different types of heartbeats. In this study, the classification of heartbeat types was based only on manual identification in the database, and unsupervised learning technology can be applied to the model in subsequent research. Second, the MIT-BIH database should be used to further explore the situation within and between the patient groups. In addition, the database is old and the information is limited; therefore, more information of subjects in different processes, from health to cardiovascular disease, is needed to verify the effectiveness of this method and to analyse the influence of different patients’ diseases and medication histories on the study results.

5 Conclusions

Under the premise of imbalanced data, this study designed a combination of wavelet transform and a CNN model to achieve efficient classification of five types of arrhythmias, with classification accuracy, sensitivity, and specificity of 99.43%, 97.82%, and 99.71%, respectively. To maximise the preservation of effective information in the original data and improve the classification performance of the model, a method for converting a one-dimensional raw ECG signal to two-dimensional grayscale image conversion method was proposed. Simultaneously, six arrhythmias were classified using the improved 2D CNN model, with a classification accuracy, sensitivity, and specificity of 90.50%, 89.96%, and 91.01%, respectively. The problems of single recognition type, inconsistent data beat scale, and low classification accuracy were effectively solved. Compared to algorithms in the mainstream literature in recent years, the effectiveness of the proposed method has been proven, indicating that it has potential application value in assisting doctors in the diagnosis of cardiovascular diseases.

Footnotes

Acknowledgment

We would like to thank all volunteers and researchers who participated in this study.

Authorship contribution statement

Huan Zhang: Writing – review & editing, Writing – original draft,Validation, Software, Methodology. Yu Zang: Writing – review& editing, Methodology, Formal analysis, Data curation. Liping Li: Writing – review & editing, Methodology. Chunhui Wang: Validation, Supervision, Conceptualization. Yanjun Li: Formal analysis. Liang Jiang: Software, Validation.

Ethical statement

The data of this study are from the public database and conducted in accordance with the Declaration of Helsinki.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Science Foundation of Shandong Province (Nos.ZR2023QF146); and the Traditional Chinese Medicine Science and Technology Foundation of Shandong Province (Q2022052).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

All data generated or analysed during this study are included in this article. Further enquiries can be directed to the corresponding author.

References

Leasure

Jain

Butchy

, et al. Deep learning algorithm predicts angiographic coronary artery disease in stable patients using only a standard 12-lead electrocardiogram. Can J Cardiol. 2021; 37: 1715–1724.

Wang

Liu

, et al. Variability of cardiac electromechanical delay with application to the noninvasive detection of coronary artery disease. IEEE Access 2019; 7: 53115–53124.

Liu

, et al. Evaluation of left ventricular systolic function using synchronized analysis of heart sounds and the electrocardiogram. Heart Rhythm 2020; 17: 876–880.

Wang

Qiao

Liu

, et al. Automated ECG classiﬁcation using a non-local convolutional block attention module. Comput Methods Programs Biomed. 2021; 203: 106006.

Dong

Wang

, et al. Non-destructive detection of CAD stenosis severity using ECG-PCG coupling analysis. Biomed Signal Process Control. 2023; 86: 105328.

Jiang

Wang

, et al. Random forest clustering for discrete sequences. Pattern Recognit Lett. 2023; 174: 145–151.

Chen

Bai

, et al. Review of image classification algorithms based on convolutional neural networks. Remote Sens 2021; 13: 4712.

Acharya

Fujita

Adam

, et al. Automated characterization and classification of coronary artery disease and myocardial infarction by decomposition of ECG signals: a comparative study. Inf Sci (Ny). 2017; 377: 17–29.

Mohebbanaaz

LVR

Kumari Sai

. Classification of ECG beats using optimized decision tree and adaptive boosted optimized decision tree. Signal Image Video Process., 2022, 16: 695–703.

10.

Mak

Cheung

. Towards end-to-end ECG classification with raw signal extraction and deep neural networks. IEEE J Biomed Health Inform. 2019; 23: 1574–1584.

11.

Chang

Cadaret

Liu

. Machine learning in electrocardiography and echocardiography: technological advances in clinical cardiology. Curr Cardiol Rep. 2020; 22: 61.

12.

Yildirim

Baloglu

Tan

, et al. A new approach for arrhythmia classification using deep coded features and LSTM networks. Comput Methods Programs Biomed. 2019; 176: 121–133.

13.

Cossu

Carta

Lomonaco

, et al. Continual learning for recurrent neural networks: an empirical evaluation. Neural Netw. 2021; 143: 607–627.

14.

Xiong

, et al. Stacked convolutional denoising auto-encoders for feature representation. IEEE Transactions on Cybernetics 2017; 47: 1017–1027.

15.

Ribeiro

Paixão

GMM

, et al. Automatic diagnosis of the 12-lead ECG using adeep neural network. Nat Commun. 2020; 11: 1760.

16.

Zhang

Zhou

Zeng

. HeartID: a multiresolution convolutional neural network for ECG-based biometric human identification in smart health applications. IEEE Access 2017; 5: 11805–11816.

17.

Kiranyaz

Ince

Gabbouj

. Real-Time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Trans Biomed Eng. 2016; 63: 664–675.

18.

Yildirim

Plawiak

Tan

, et al. Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput Biol Med. 2018; 102: 411–420.

19.

Warrick

Homsi

. Ensembling convolutional and long short-term memory networks for electrocardiogram arrhythmia detection. Physiol Meas. 2018; 39: 114002.

20.

Yildirim

Baloglu

Tan

, et al. A new approach for arrhythmia classification using deep coded features and LSTM networks. Comput Methods Programs Biomed. 2019; 176: 121–133.

21.

Niu

Tang

Sun

, et al. Inter-Patient ECG classification with symbolic representations and multi-perspective convolutional neural networks. IEEE J Biomed Health Inform. 2020; 24: 1321–1332.

22.

Ramkumar

Lakshmi

Rajasekaran

, et al. Multiscale laplacian graph kernel features combined with tree deep convolutional neural network for the detection of ECG arrhythmia. Biomed Signal Process Control. 2022; 76: 103639.

23.

Ecar

. Recommended practice for testing and reporting performance results of ventricular arrhythmia detection algorithms. Assoc Adv Med Instr (AAMI) 1987; 6: 69–81.

24.

Yang

. Research on Extraction of ECG signal characteristic parameters based on Wavelet Transform. Chengdu: University of Electronic Science and Technology of China, 2020.

25.

Zhang

. Study on the Denoising Methods of Surface Electromyography Signal. Baotou: Inner Mongolia University of Science&Technology, 2023.

26.

İni˙k

Altıok

Ülker

, et al. MODE-CNN: a fast converging multi-objective optimization algorithm for CNN-based models. Appl Soft Comput. 2021; 109: 107582.

27.

Chen

Xia

Chen

. Lightweight image super resolution reconstruction network based on transformer-CNN. J Comput Applic. 2024; 44: 292–299.

28.

. Research on Arrhythmia Classification Algorithm Based on Multi-branch Convolutional Neural Network. Changchun: Jilin University, 2023.

29.

Murat

Yildirim

Talo

, et al. Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput Biol Med. 2020; 120: 103726.

30.

Xiong

Liang

Liu

. Arrhythmia classification algorithm based on convolutional neural network hybrid model. J Harbin Inst Technol 2021; 53: 33–39.

31.

Zhou

Tan

. Electrocardiogram soft computing using hybrid deep learning CNN-ELM. Appl Soft Comput. 2020; 86: 105778.

Research on arrhythmia recognition by using convolutional neural network in ECG images

Abstract

Background

Objective

Methods

Results

Conclusions

Keywords

1 Introduction

2.1 Data

3.1 Pretreatment results

3.2.1 Classification results based on one-dimensional CNN model

Table 7. Confusion matrix of 2D CNN model. Prediction Real NOR APC PVC LBBB RBBB PAB NOR 69954 1189 1329 741 553 257 APC 1160 2036 51 9 635 22 PVC 930 37 6153 18 36 44 LBBB 362 18 894 7161 22 232 RBBB 607 37 5 138 6443 244 PAB 516 18 71 17 95 6298

Footnotes

Acknowledgment

Authorship contribution statement

Ethical statement

Funding

Declaration of conflicting interests

Data availability

References

Table 7.
Confusion matrix of 2D CNN model.

Prediction

Real NOR APC PVC LBBB RBBB PAB

NOR 69954 1189 1329 741 553 257

APC 1160 2036 51 9 635 22

PVC 930 37 6153 18 36 44

LBBB 362 18 894 7161 22 232

RBBB 607 37 5 138 6443 244

PAB 516 18 71 17 95 6298