Abstract
The Autonomous Nervous System (ANS) controls the nervous system and Heart Rate Variability (HRV) can be used as a diagnostic tool to diagnose heart defects. HRV can be classified into linear and nonlinear HRV indices which are used mostly to measure the efficiency of the model. For prediction of cardiac diseases, the selection and extraction features of machine learning model are effective. The available model used till date is based on HRV indices to predict the cardiac diseases accurately. The model could hardly throw light on specifics of indices, selection process and stability of the model. The proposed model is developed considering all facet electrocardiogram amplitude (ECG), frequency components, sampling frequency, extraction methods and acquisition techniques. The machine learning based model and its performance shall be tested using the standard BioSignal method, both on the data available and on the data obtained by the author. This is unique model developed by considering the vast number of mixtures sets and more than four complex cardiac classes. The statistical analysis is performed on a variety of databases such as MIT/BIH Normal Sinus Rhythm (NSR), MIT/BIH Arrhythmia (AR) and MIT/BIH Atrial Fibrillation (AF) and Peripheral Pule Analyser using feature compatibility techniques. The classifiers are trained for prediction with approximately 40000 sets of parameters. The proposed model reaches an average accuracy of 97.87 percent and is sensitive and précised. The best features are chosen from the different HRV features that will be used for classification. The present model was checked under all possible subject scenarios, such as the raw database and the non-ECG signal. In this sense, robustness is defined not only by the specificity parameter, but also by other measuring output parameters. Support Vector Machine (SVM), K-nearest Neighbour (KNN), Ensemble Adaboost (EAB) with Random Forest (RF) are tested in a 5% higher precision band and a lower band configuration. The Random Forest has produced better results, and its robustness has been established.
Keywords
Introduction
Cardiovascular diseases (CVDs) are the leading causes of cardiac death worldwide. Coronary Artery Disease (CAD) is one of the leading causes of CVD death [1]. Sino Atrial (SA) produces an electrical signal called Electrocardiography (ECG) and an important operation to study and analyse the heart state. It should be identified in a timely manner so that the subject can receive adequate care. This is manually identified by the physicians. Yet often symptoms are so vulnerable and unidentifiable that the diagnosis of congestive heart failure (CHF) is a heart attack due to inability of the heart to pump enough blood to the body and may be asymptomatic in its early stages [2, 3]. The main purpose of monitoring of the patient’s ECG is to prepare the doctor ready to plan and take appropriate clinical action. The ECG must be regularly checked to substantially reduce the mortality rate due to heart attack. The ECG signal consists of several characteristics that delineate the waves P, QRS and T. Heart rate (HR) is a significant factor in the diagnosis of cardiac safety. The HR is measured by the RR (difference from two consecutive R peaks). The RR interval is the most prominent feature of the Heart Rate Variability (HRV), but it depends on other functions either directly or indirectly. For time regions, geometric domains and frequency domains, the components of the HRV can be determined. HRV is possible and approved for diagnostic purposes [4, 5, 6] with a non-ECG signal. Biological organisms have a role to play in regulating the internal and external stimulation of the environment, which is underpinned by a control mechanism. Human heart rate is not normal and varies in duration and size, also known as cardiac variability (HRV). HRV is a time span between consecutive heart beats (or QRS complexes) also referred to as Inter Beat Interval (IBI). The ability to discern pathological and physiological patterns of Beat to Beat Interval (BBI) is essential for the development of new diagnostic methods [6, 7, 8].
Approaches are effective for the detection and analysis of cardiac disorders by means of extraction techniques e.g. wavelet, neural network, genetic algorithm [9], neural convolution, iterative algorithm. CNN has been widely used for the diagnosis and use of heart disease in recent years [8, 9]. It is also used for the treatment of arrhythmia [10, 11] myocardial infarction [12, 13] and atrial fibrillation. It was also studied in the distinction between ECG signals observed in coronary artery disease and normal signals [14, 15] and in the detection of shock absorbing vs. non-shockable ventricular arrhythmias [16]. In addition, this tool can be used as an external resource to help clinicians provide better and more precise diagnostic information of patients [17, 18]. Current diseases like Congestive Heart Failure, Sudden Cardiac Attack, Bradycardia and Tachycardia, Ventricular are identified with set pre-treatment, extraction and selection of features [6, 19]. Techniques have so far been able to offer output when viewed in isolation. It is evident that the performance of the classifier is 97% (accuracy) whether the input data is mono-characteristic or balanced. Nevertheless, output for input data mixing is not guaranteed. The HRV indices are fed to a machine learning model that is classified in time, frequency, non-linear, geometric domain [12]. They generate appropriate results for input data of a fixed nature or a well-predicted nature. The results of the classification are sufficient if the data input is one-dimensional or one-disease group for K-Nearest Nighbor(kNN), Ensemble Adaboost, Support Vector Machine (SVM), Random Forest (RF) [20, 21]. Various databases such as MITBIH, PTDB, Arrthymias, and Sudden Cardiac Diseases are categorized using one or more classifiers. Nevertheless, the results are not linear for the mixture input and the fixed set of HRV indices. The set spectrum of HRV indices typically has its own limitations and can resolved as deep structured network [22, 23, 24].
Intelligent support for decisions on the treatment of heart disease is part of the proposed work. Previous research has proven useful for the identification of cardiac disorders in the proposed papers. Different machines have been used in this analysis to study predictive models such as logistic regression, k-close neighbour, ANN, SVM, decision-making tree [25, 26], Naive Bays and random forest, deep learning and auxiliary types [27, 28, 29]. But little effort is seen to address and monitor the model with inputs of varying dimensions and features. The machine learning algorithm base model detects cardiac diseases with good accuracy, with unique features of extraction and index selection [30, 31]. The proposed model results are equivalent to the current binary model. The proposed model should not only be stable, but should also be flexible in terms of training and testing parameters. Methods and models developed so far have been tested for accuracy, sensitivity, specificity, etc. For a better analysis of the model, a total of 24 HRV indices (extract parameters) such as time domain, frequency, non-linear, geometric features (indices) as mentioned earlier.
The main contributions of the proposed research work are as follows:
All the efficiency of the classifiers has been tested for maximum features in terms of classification, accuracy and time of execution. The device is intelligent in the sense that the transition step is reported at the highest or lowest accuracy. Compared to three classifiers and previous studies, the proposed diagnostic system achieved excellent classification results. This study suggests that the best classifier is suitable for the selection of features and robustness.
Subjects database information
Cardiac diseases identification model for Cardiac diseases.
The databases used in this work have been obtained from the PhysioBank MIT-BIH Natural Sinus Rhythm (NSR) (nsrdb), BIDMC CHF data collection (chfdb), CHF RR interval database (chf2db). Detailed information is given in Table 1. Therefore, a total of 417 ECG reports of benchmark database subjects and 460 patient signals are included in this analysis.
The structure of the paper specifically explains the theory and mathematical context of machine learning, selection of features and algorithms as historical evidence for cardiac disease. It also addresses methods of cross-validation and calculation. The key steps of this process are shown in the figure. The data used consisted of ECG recordings from the database MITH Arrhythmia [32, 33, 34]. Each record was extracted from ECG recordings in one minute. The total samples collected are when the subjects were mostly seated with in-house hardware built for a particular purpose (database 2 and database 3). Database 1 is a non-ECG instrument type. It is known as PPA (Peripheral Pulse Analyzer) but has been used for HRV analysis, such as ECG-based approaches. The model was equally tested on both ECG and non-ECG signals for HRV base cardiac analysis [35, 36, 37].
Methodology and implementation
Figure 1 shows the steps taken in the proposed process. The proposed method breaks down HRV signals with the collection of features and checks consistency. Features are derived from the HRV signals components. Eventually, the classification is done with the classification unit. The classifier is used here for inspection and checking earlier. The best classifier is selected based on the Random Forest classification parameters.
The approach proposed comprises two phases: Enrolment Database Processing (PEP) and Prediction and Identification (PI). All available data samples and ECG signals obtained by the units are fed for analysis as shown in Fig. 1.
Dataset used
The ECG signals (Cardiac and Standard) are collected from a web site of PhysioBank with an open access database and a rhythm database of normal sinus. This is also derived from the tool it has been made. The following series contains ECG signals from patients with heart disease and average people. Cardiac patients live for the ages of 18 to 89 and monitoring subjects for the ages of 20 to 50. The sampling rate is 256 Hz and 128 Hz for the cardiac and normal ECG signals. In order to ensure uniformity between class and standard class [36, 38, 39], ECG signals were sampled at 128 Hz for this analysis.
Database of congestive heart failure (CHF)
The collection consists of 15 ECG records (11 men between the ages of 22 and 71 and 4 women, between the ages of 54 and 63). A 12-bit resolution of
Database of arrhythmia (MITDB)
The report contains ECG information for 47 subjects and 29 subjects with regular sinus rhythms. CDs with an 11-bit resolution of 10 mV were digitized at 360 channel samples per second.
Sudden heart death holter (SDDB) database
This database includes 23 complete records of Holter and 18 patients with the underlying rhythm of the sinus. All patients had ventricular tachyarrhythmia’s and most of them had a real cardiac arrest. This database includes both men and women over the age of 25. The scale is 250 Hz.
Ventricular arrhythmia database (VENT-ARRTHY)
Two databases are available, such as Cu Ventricular Tachyarrhythmia Database (cudb) and MIT-BIH Supraventricular Arrhythmia Database (sup). Cudb has a collection of 35 subjects and 78 subjects. They are being sampled at 250 Hz.
Pre-processing
EKG extraction plays a key role in preprocessing and selection in the isolation of selected frequencies from data and the extraction of objects. The ECG signal frequency range is between 0.05 and 100 Hz. ECG Noise sources include muscle noise due to processing electrode operation, power line interference (60 Hz), wandering baseline and high-frequency T waves over the QRS network. Arrthymia has noise power line problems which could impact the ECG signal’s P and Q waves and cause errors. If you check the QRS or QT intervals, 60 Hz noise will cause errors, which are critical diagnostic parameters, by distorting the ECG.
Feature extraction
All HRV features are grouped into three domains: time, space and frequency. The correlation of various HRV characteristics in two levels of importance (95% and 99%) is predicted in various domains. The high correlation factor HRV parameters are selected while other parameters are discarded. In various contexts, the ANOVA method is used to compare three types of ECG databases [40]. Variations in average HRV parameters are made for different datasets using the ANOVA test s. Bio-signal analyzes are often used to detect abnormalities.
Classification
To distinguish between natural and heart subjects with K-Nearest Neighbour (KNN) and Support Vector Machine (SVM). Ensemble AdaBoost (EAB) and Random Forest (RF). In this analysis, regular and abnormal HRV signals were automatically separated using a classifier from each other. The user will pick the feature with the best output value for the input classification [41, 42].
Experimental findings
This paper offers a new approach to the classification by ECG data collection research system of cardiac patients. The most distinctive feature of the proposed model can be selected using a modern feature selection methodology developed around a random forest classification. For research purposes, Congestive Heart Failure, Arrhythmia, Sudden Cardiac Death, Ventricular arrhythmia Congestive Heart Failure Database (CHF).
Machine learning based model
In order to analyse & understand the performance of machine learning have considered the following methods: Support Vector machine (SVM), Random Forest (RF), KNN and EAB. The performance classifiers will determine the efficiency of model or algorithm. The choice of classifier is data dependent, classes and complexity [28, 32]. Table 2 shows comparison for performance and current research activity. The best suited for the classifier turns out be Random Forest so the analysis in depth analysis of robustness is done on the basis of Random Forest.
Performance of model vs features
Performance of model vs features
To measure the efficiency of our intra-group selection model, the results are described below as accuracy, recall, scores, ROC value and overall accuracy.
where TP denotes true positive, FP denotes false positive, TN denotes true positive and FN denotes false positives. In these formal descriptions of precision, recall, fscore and overall accuracy false negatives. The ROC Curve can be drawn by the true positive rate (TPR) against the false positive. The corresponding protection model limit (FPR). The additional features (indices) are shown which are included along with traditional features in the Table 3.
Model testing on HRV feature
The confusion matrix shows both the correctly expected and incorrectly expected classification values. In the uncertainty matrix, the sum of TP and TN is the number of correctly identified entries by classifier as shown in Fig. 2.
Performance confirmation of a proposed model.
Two models with low precision and high reminder are difficult to compare or vice versa. We use F-Score to make them equivalent. F-score helps to simultaneously assess recall and accuracy. Table 4 offers criteria for evaluating and comparing the system. Calculating a confusion matrix gives you a better understanding of how the classification model works and what kinds of errors it makes.
Performance parameters for confusion matrix
As an experimental calculation, the Table 5 shows best of the RF(20) for trees. i.e. {35 45 55}. The Evaluation performance parameters are shown. All the important evaluation tests parameters like Accuracy, sensitivity, specificity, F score, and others are shown in Table 5.
Comparision of Rf(35), RF(45), RF(55) performance evaluation
Here, evaluating the robustness of a model for making predictions on real time non stationary data using cross-validation and multiple cross-validation where we used classification accuracy benchmark parameter. Classification accuracy alone is typically not enough information to make this decision. The accuracy of the RF(20)_45 is highest i.e. 96.7949% as shown in Table 1 with sensitivity and specificity at its best. The graphically shown in Fig. 4. The accuracy is often is not good enough parameter to analysed the model performance so other evaluation parameters are been considered like F1 score, Kappa Kohens and Mathews Correlation coefficient (MCC) as shown in Fig. 3.
Analysis of model via performance evaluation parameters (1).
Analysis of model via performance evaluation parameters (2).
Analysis of model via performance evaluation parameters (3).
Combination generator model for extracted parameter.
The Fig. 3 shows that for RF(20)_45 the value of MCC and F score is in the good range as well. The precision, recall is shown in Table 3. The precision is in the range of 95% for RF(20)_45 which again help in predicting better class. After selecting the significant features, error-correction are applied on the selected feature set to categorize the patients into four classes of subclasses of cardiac diseases shown in Fig. 5. Here, we have worked out in two phase manner Machine learning.
The block diagram of the proposed method is shown in Fig. 6. The parameters are fed and histogram is plotted for top 10% band of combination for accuracy for that combination and lower 10% band of combinations band. This has enabled author to predict the maximum accuracy. This has enabled us to see the variations before maximum accuracy and after it. It also gives and mean, standard, maximum, stand deviation parameters for further analysis of the set of combinations. Here, the same structure can be carried out for 5% band and results for the same is shown in figures.
The Fig. 6 shows also data is combined from sweep of parameters so that model performance can be predicted. The Combination generators sets which is very difficult tails in real time Biosignal processing. All 24 parameters are fed to generator box which uses the mathematical Eq. (5)
The above combination equation will give rise to set of combinations and numbers within the set. For example, for any 5 parameters are selected or extracted then combination equation would be
RF(45) Accuracy sweep for upper 5% band for (24C20) combination.
Similarly, for any number of selected set of parameters, model establishes relation between input data, extracted parameters and combinations sets to be provided to classifier. To analyse the performance of the above-mentioned classifiers with the same set of data by applying Heart Rate Variability (HRV) techniques is carried out in grouping and careful selection of features. Further it is found that collected information on various features extracted from the ECG signal and included 6 features from various sources which in further improving automated cardio analysis using HRV methodology. In addition to standard time and frequency 12 features, nonlinear features are most considered for more robust analysis and testing the model. The features commonly used in HRV Analysis: all the 24 features/parameters are shown in Fig. 6 in the block of extracted parameters. The model is tested for Support Vector Machine (SVM), K Nearest Neighbour (KNN), Ensemble AdaBoost (EAB) and Random Forest (RF). Here, RF classifier has been taken as reference as its performance is better as shown in Table 2 and Fig. 3. Here, RF model is tested for 20 parameters and for 10626 combinations. The purpose of the experimentation is to have accuracy of classifier for set of combination and testing of classifier, the accuracy graph can be plotted for other values as well in Fig. 7. Figure 8 shows with a scale and in one frame performance of random Forest at selected trees. This has help authors to predict the randomness associated with model, behaviour pattern, accuracy (mean, max, and min).
Histogram of minimum accuracy band (5%).
Histogram of features in maximum accuracy band (5%).
The Figs 8 and 9 are indicate that impact of individual feature on performance of the classifier as well how many times it is appearing for the set of input and classifier
Training of the system with a good dataset is the key to achieving a high classification efficiency. Therefore, in this analysis, after training, the classifier was fed with all the combination of features in the absolute test process. The ANOVA was used once only to test input and correlation factors for suitability. Under time domain, frequency domain and nonlinear domain, the proposed classification approach is tested by fixed set and accuracy was found to be much less than Intra Group selection methods.
The best feature, selection process and extraction were evaluated before, and run for multiple times to test robustness, based on computational time and the highest precision metrics. The model’s versatility, intra community selection and adaptability for such varied and complex data is one of the unique features and is shown here using evaluation parameters. These results show that if a good dataset trains a classifier system, it gives higher performance. One of the major advantages of the proposed approach is that the use of combination set with standard machine learning will obtain a high classification efficiency. The limitation of the proposed method is that due to the number of complex data inputs, the training takes time. While the ML technology has many advantages, it is not flawless. In some ways, the following variables hinder their capacity.
ML algorithms are proven to be effective in predicting all scenarios. Another problem is the right interpretation of the tests of ML algorithms. The high sensitivity to errors is another drawback of ML algorithm. A particular dataset will determine good results most of the time. This is also a big problem for the field of choosing successful or sorting algorithms. In order to collect data sets, ML algorithms typically need large datasets. Many service and system are needed in the algorithms.
Despite numerous advantages of signal processing methods and artificial intelligence in recent years, identification of cardiac diseases remains the challenge due to complexity of data structure. The study shows that combination of features viz. linear, nonlinear, time and frequency domain along with added features as per Table 2 leads to the best possible solution in prediction of cardiac diseases. The aspects of machine learning were used and proved successful at random forest compared to other study classifiers. In other studies, the random forest classifier was checked for ECG mixture, Non-ECG signals as well as 97 per cent precision, which is exceptional in the way compared to established available tests. Machine learning aspects have increased the efficiency of learning and thus, increased the accuracy of classification. The classification accuracy for almost all combinations were tried for over 95 per cent on both. This is significant to show that model is robust in nature with random forest classifier. Therefore, the increase in classification accuracies does not appear large if SVM, KNN, EAB are preferred to follow RF due to input data, intra-selection method and combination sets. The proposed method is an efficient and robust method based on the results obtained here, and can be used to classify the ECG signals. The effectiveness of the proposed approach is proven in part of the paper’s experimental findings. The efficiency of the algorithm could be generalised by referring to the specific signal classification problem. The study has also pointed out scope for model development in non-orthodox machine way.
Footnotes
Acknowledgments
Authors have received great help from Phoenix Hospital, Mumbai (Maharashtra), India. The authors, therefore, gratefully acknowledge the hospital for encouraging author to carry out meaningful research.
