Developing an Integrated Model for Heart Disease Diagnosis (IM-HDD) using ensemble classification methods

Abstract

In present scenario, Heart Disease has become the vital cause of mortality and diagnosis of heart diseases is a great confrontation in the field of medical data analysis. Data Mining is an efficient technique for processing and analyzing larger databases for deriving hidden knowledge appropriately. Hence, it is incorporated in medical data analysis for assisting in effective decision making and disease predictions. With that concern, this paper concentrates on framing an Integrated Model for Heart Disease Diagnosis (IM-HDD) using the advanced data mining conceits. The model considers the significant features of patient data that are available in benchmark datasets. Here, the main objective of the proposed model is to enhance the classification accuracy of patient data on classes under NORMAL and ABNORMAL. For enhancing the classification accuracy, the proposed integrated model utilizes the algorithms such as Decision Tree Algorithm, Naive Baye’s Classification and Ensemble Classifiers called Random Forest and Bagging. Further, performance evaluation is performed for analyzing the proposed work. For that, images from UCI repository are utilized and the comparative analysis shows that the proposed work produces better results than the existing models compared.

Keywords

Heart disease diagnosis data mining classification accuracy decision tree algorithm Naive Baye’s classification random forest and bagging

1 Introduction

Medical Data Mining in Health Care Management and Disease Diagnosis is considered as more significant that has to be processed in efficient and accurate manner [1]. According to the report of World Health Organization (WHO), heart related diseases that may affect both men and women equally in all ages and also be the first leading death causing disease [2]. Based on the survey, it is determined that 76%of world deaths will occur by the 2030 due to the non-communicable diseases [3]. It is always complicated to detect the heart related diseases due to the causative risk aspects such as high Blood Pressure (BP), diabetes and so on. There are several methodologies on data mining and machine learning has been developed for determining the brutality of heart diseases. Classification Algorithms are also used for disease diagnosis in accurate manner [4, 5]. The combined methods of medical science and data mining are used for diagnosing many formats of cardiovascular diseases. As in common data mining operations, the medical data mining process includes clustering, rule mining, classification and so on.

There are many techniques used for knowledge representation using the methodologies of data mining for heart disease diagnosis. In this process, several experimentations are carried out for constructing an accurate diagnosis model by integrating efficient models. Moreover, Neural Network based detection techniques are used for predicting medical records such as Left or Right Bundle Branch Block, Atrial Fibrillation, Second Degree Block and so on for detecting the appropriate patient status in accordance with heart functions [6, 7]. It is stated that the Artificial Neural Networks are considered as the model that produces better results in predicting heart related diseases and brain abnormalities [8, 9].

There are great developments in Machine Learning methods incorporated with Internet of Things (IoT) for medical data mining [10]. Specifically, in medical data mining, appropriate decision making on disease diagnosis can be attained by combining the classification models and give automated training on the dataset through which the hidden patterns of medical data are evaluated and utilized for future disease predictions on patients. Hence, through medical data mining, the model is effective to provide the patient’s medical history and can also provide medical support in the complete treatment and analysis process. For further disease and clinical analysis, the knowledge patters are very much important. Here, the medical data mining uses the classification techniques that are very much significant for determining the heart diseases before the emergency occurs. Moreover, the used classification model can be trained and tested in such a manner to detect the patient’s health state and nature of being suffered by cardio vascular or heart related diseases.

In this proposed work, a new Integrated Model for Heart Disease Diagnosis is introduced. The major motive of this model development is to find the heart diseases of test samples with higher accuracy rate and precision. In many available models, there are several limitations on feature selection for execution conveniences. In a different way, the proposed model considers all the available features and Integrated Model is developed with base classification algorithms such as Decision Tree Algorithm and Naive Baye’s Classification, and Random Forest and Bagging classification techniques, for enhancing the result accuracy in predicting heart diseases when compared with available medical data mining models. The contributions of the proposed IM-HDD are listed below.

The medical data of patients that are acquired from the benchmark dataset are pre-processed.

Significant feature selection and feature reduction is performed for avoiding missing attributes and achieving better accuracy with minimal time.

Integrated Model is framed with the traditional and ensemble based classification models such as Decision Tree Algorithm and Naive Baye’s Classification, and Random Forest and Bagging.

Performance Evaluation is carried out for evaluating the proposed model efficiency in determining heart diseases.

The remainder of this paper is framed as follows: Section 2 deliberates the related models and survey works that are derived for disease diagnosis in Medical Data Mining. Section 3 discusses about the complete work process and operations involved in the proposed model with data flow. Section 4 contains the model results in data classification and also the comparative analysis with existing works. Lastly, the conclusion and future enhancement ideas are presented in the Section 5.

2 State of art

A comparative study [11] analyzed about the works processed with bagging with C4.5 and Naive Baye’s Algorithm in heart disease detection. In this, the only ensemble technique used was bagging, by using additional classifiers might increase the result accuracy. Further, the results obtained for Naive Bayes, K-Nearest Neighbour and Decision Tree algorithm has been done [12] and [13] used Naive Baye’s Classification for heart disease detection. The works have not considered about the computational speed on disease diagnosis, which was the major factor in disease prediction. The authors [14] have developed an Intelligent Heart Disease Prediction System (IHDPS) using Artificial Neural Networks, Naive Bayes model and decision trees. The model utilized only the traditional classification techniques. In the work [15], the authors stated that the Random Tree Algorithm model produced higher rate of accuracy in heart disease prediction and classification. Moreover, the model utilized multilayer filtering process that may increase the processing time.

Random Forest Classifier [16] has been used for classifying Coronary Heart Disease through the events related to Coronary Heart Diseases. K-Nearest Neighbour (KNN) [17] based classification has been used for training, in which the model has not considered any assumption based data distributions. Moreover, using KNN with voting based technique has been used. The mostly determined common class among the KNN has been assigned for new data class. Further, Sequential Minimal Optimization (SMO) [18] based classification model has been used. In the work [19], for detecting the heart diseases, 493 patient samples were acquired from cerebrovascular disease prevention program. The model used Bayesian classification, ANN and DT for disease detection. Coronary Artery Disease based evaluations are carried out with 303 data samples [20]. Further, feature enhancement has been employed and Information gain has been determined for the effective feature selection from the input images. Achieving cost effectiveness with increasing accurate prediction rate has been the future work of this model.

Based on heart rate signals, automatic disease detection method has been proposed and narrated [21]. For dimension reduction process, component analysis technique was used and the selected features were given into four kinds of classifiers such as ANN, KNN and SVM. In a different way, a genetic algorithm based classification model [22] for classifying the data with disease factors. The model has no declaration about the cost effectiveness and processing speed in diagnosis. A Weighted Fuzzy rule based heart disease diagnosis [23] model has been developed based on the two following phases:

Automated Weighted fuzzy rule generation process

Proposing Fuzzy rule based decision support model

Further, a comparative analysis experiment has been performed with various mining methods for predicting heart diseases using 15 attributes with neural networks [24]. The authors derived K-Means and Apriori algorithm based detection model in [25]. A real-time based prototype has been developed with modified Apriori algorithm [26] for defining the new method for disease diagnosis. Pruning Classification Association Rule (PCAR) [27] was derived from Apriori algorithm that involved in deleting the minimal occurred data and duplicated content. But, the model considered basic features for classification, and that may decrease the accuracy rate of disease prediction.

Least Square- Support Vector Machine (LS-SVM) based model that used binary decision tree for data classification in detecting diseases on cardiotocogram inputs [28]. A comparative evaluation has been carried out between the statistical and data mining analysis. Multilayer Perceptron (MLP) [29] based model has been utilized for classifying the input data under five major cardiac diseases. Moreover, the model used back propagation model for sample training. And, the model is not cost effective.

The work in [31] proposes a suitable Machine Learning (ML) technique that is commonly adopted and optimized depending on the requirements to predict the cardiac diagnosis based on imaging factors. Non-imaging predictors are frequently incorporated into the ML model and generally increase model performance. The study in [32] uses Weighted Associative Rule Mining to predict cardiac disease based on the scores of key aspects. A set of important part scores and criteria for identifying heart illness were discovered, and cardiologists were engaged to ensure the validity of these guidelines. Experiments on the UCI open dataset, which is widely utilized in heart disease research, provided the highest probability value of 98 percent in heart disease prediction.

3 Work flow of proposed model in accurate heart disease diagnosis

In the proposed Integrated Model-Heart Disease Diagnosis (IM-HDD), the amalgamation of four significant classification techniques are utilized that belongs to Apriori approach and predictive model that evaluates the attributes of heart related disease from the UCI repository dataset. Moreover, in diagnosing heart related diseases, accurate and timely diagnosis is very much important. But, when the traditional techniques are used in classification, it is found that results are insufficient in adequate disease diagnosis and prediction. Therefore, the proposed model utilizes the back propagation neural networks for processing with the 13 basic coronary disease based attributes as input. The obtained classification results are further compared with existing traditional methods for disease prediction and classification. The functions and possibilities of medical data mining are presented in Fig. 1.

Fig. 1

General flow for medical data mining.

The classification process becomes complex, when there is an increase in the number of factors that are considered for disease diagnosis. After diagnosing disease and decision making, proficient and effective treatment process is required. Hence, for providing higher accuracy in disease diagnosis, data mining methods are combined with medical data processing. Specifically, the proposed model is effectively used for predicting abnormalities of patient data that are obtained from heart function monitoring modules, ECG data and so on. The complete work process of the proposed model is diagrammatically represented in Fig. 2. Furthermore, the phases involved in the proposed work are given as follows,

Data Pre-Processing

Feature Extraction

Implementation of Integrated Model

Result Classification

Performance Evaluation

Fig. 2

Work flow of proposed integrated model in heart disease diagnosis.

In this paper, the data from Cleveland UCI repository is utilized for training, since that provides simple visual representation of samples. And, the machine learning process initiated at pre-processing phase and further, entropy based feature selection is processed, performance analysis on classified results with enhanced precision. The process of feature extraction includes feature selection and reduction that has been performed with several iterations for several combinations of data features.

Here, the clear and defined information about the heart related attributes that are considered in the work in Table 1. Further, the performance of classification model is framed based on the significant features and the machine learning techniques that are utilized for each process.

Table 1

UCI repository dataset attributes for IM-HDD

Attributes	Type	Description	Range
Age	Numeric	Patient’s Age in Completed Years	25–69
Gender	Nominal	Gender in Number	Male=0, Female=1
CP	Nominal	Four Types of Chest Pain	typical angina=1, atypical angina=2, non-anginal pain=3, asymptomatic=4
Trestbps	Numeric	Resting Blood Pressure (in mm/Hg)	92–200
SerumCho	Numeric	Serum Cholestrol in mg/dl	126–564
FBS	Nominal	Blood Sugar Level on Fasting	Yes=1, No=0
RestECG	Nominal	Resting Electrocardiographic Results	Normal=0, ST-T wave abnormality Presented=1, showing probable or definite left ventricular hypertrophy=2
Thalach	Numeric	Maximal Rate of Heart	82–185
Exang	Nominal	Angina induced by exercise	Yes=1, No=0
Oldpeak	Numeric	Exercise-Induced ST Depression	71–202
PeakSlope	Nominal	slope of the peak exercise ST segment	Unsloping=1, Flat=2 Downsloping=3
NumVessels	Numeric	Fluoroscopy coloured major vessels	0–3
Thal	Nominal	Type of Heart Defects	3=normal, 6=fixed defect, 7=reversible defect
Disease	Nominal	Identification of a heart disease	Yes=1, No=0

3.1 Data pre-processing

The data obtained from the dataset repository is pre-processed. The data pre-processing involves in removing the missing values of patient records. In this model, the obtained patient dataset contains 303 patient records, in that, 5 records are having some missing attributes. And, those records are removed from the training dataset, further; the remaining records are used for further processing. The attributes that are listed in Table 1 is processed with binary classification technique for analyzing the occurrence of heart disease. In the obtained data samples, when there is a presence of heart disease, the range is set as 1; otherwise, the range is set as 0, which resembles the absence of heart disease in the dataset. Moreover, in this section, the medical data that are collected with respect to the 3 aforementioned attributes are converted into diagnosis ranges for effective machine learning.

3.2 Feature extraction

This section of feature extraction comprises Feature Selection and Reduction. Here, there are 13 major attributes are considered from the dataset and among that, two feature are about the personal data of students such as age and gender representation. Those attributes are not significant for disease diagnosis; hence, the remaining 11 features are considered as significant and taken for feature extraction. Here, feature extraction is performed with Information Gain evaluation for analyzing attributes. In this, each attribute (A) of defined dataset are taken for gain calculation (IG) that corresponds to patient class (PC). And, the computation is given as follows, $IG (A) = H (PC) - H_{A} (PC)$ (1) Where, ‘H (PC) ’ is the entropy of class ‘PC’, which is computed as per the equation given in (2). $H (PC) = \sum_{n} - P (b_{n}) {log}_{2} P (b_{n})$ (2)

For the patient class ‘n’, P (b_n) denotes the probability value of b_n, which is the overall features. When ‘ $H_{A} (PC) = \sum_{j = 1}^{k} P_{j} H_{A} (P C_{j})$ ’ computes the entropy value of attribute ‘A’. ‘P_j’ is the probability of number of samples and ‘PC_j’ denotes the number of ‘j’th division of class. Hence, the Information Gain Ratio (R) is computed to penalize the propagation of instances. The Information Gain Ratio is computed as follows,

R (A) = \frac{IG (A)}{H (PC)}

(3)

Based on that, the significant features are selected and given for the training process in the Integrated Model that combines efficient classification techniques for Heart Disease Detection.

3.3 Implementation of integrated model

The Integrated Model for Heart Disease Diagnosis (IM-HDD) combines the classification process of Decision Tree, Random Forest, Bagging Mechanism and Bayesian Model. From the previous works and discussions, the aforementioned models provided better results in classification. Here, clustering of dataset features is performed based on the attribute variables and the measures of features attained with decision tree. Further, the classification techniques are employed in each clustered results for performance determination. The process is explained below.

3.3.1 Decision Tree (DT) algorithm

For training the input samples ‘S’, the trees are generated according to the high entropy inputs. Here, the trees are generated based on the top down process. And, tree pruning is determined for eliminating the irrelevant data samples on ‘S’. The pseudo code is given in Table 2. $Entropy (patient data) = - \sum_{j = 1}^{m} p_{ij} {log}_{2} p_{ij}$ (4)

Table 2

Decision Tree Algorithm in IM-HDD

Input: Dataset S-attributes with Target Class

Output: Dataset Divisions {s₁, s₂, s₃, … s_n}

Begin

For all Attributes do

For each data_sample do

Execute DT process

End for

Detect feature space f₁, f₂, f₃, … f_m

End for

Identify leaf nodes l₁, l₂, l₃, … l_n

Split dataset-S into s₁, s₂, s₃, … s_n

3.3.2 Random forest and bagging (Ensemble Classifier) implementation

Random Forest is the ensemble classifier that develops several decision trees for providing better results. For tree learning, bagging is the technique used here. For the given data sample, A = {a₁, a₂, a₃, … a_n} with responses B = {a₁, a₂, a₃, … a_n} that iterates the process of bagging from x=1 to X. The unknown samples a’ is processed by identifying the mean as, $\sum_{x = 1}^{X} fx (a^{'})$ from every individual DT on a’. The mean rate is given as, $M = \frac{1}{X} \sum_{x = 1}^{X} fx (a^{'})$ (5)

The uncertainty of detection on DT is obtained with the standard deviation, which is provided in (6). $SD = \sqrt{\frac{{(\sum_{x = 1}^{X} fx (a^{'}))}^{2}}{X - 1}}$ (6)

By incorporating bagging with this, the accuracy rate can be effectively improved of the unstable learning process. In bagging, the dataset is divided into several replicates. Each replicate is independently strained from the S with replacement and the process is performed repeatedly. The bagging based classifier trained from weak learner at each step is integrated into better classifier for attaining higher accuracy, which can be noted as, $sign (\sum_{t = 1}^{l} f_{t} (x))$ , where ‘l’ is the number of iterations.

3.3.3 Naive Baye’s classification in IM-BDD

In this section, the training model applies Bayesian classification protocols though the independent attributes. Each data instance of dataset S is assigned to the class that is having highest consequent occurrences. Gaussian function is used for training, which is given as, $P (A f_{1}, A f_{2}, A f_{3}, \dots, A f_{m} | PC) = \prod_{i = 1}^{m} P (A f_{i} | PC)$ (7)

$\begin{matrix} P (Af | P C_{i}) = \frac{P (P C_{i} | Af)}{P (P C_{i})}, \\ PC \in {NORMAL, ABNORMAL} \end{matrix}$ (8)

Finally, the testing data or the new patient samples can be classified using the association probability, which is given as follows, $P C_{nx} = arg max P (P C_{k}) \prod_{i = 1}^{m} P (A f_{i} | PC), for k = 1, 2$ (9)

Moreover, it is to be stated that the training process of data samples using Bayesian classification model is considerably faster than other works.

3.4 Performance evaluation

The proposed model uses UCI dataset is pre-processed and the defined 13 attributes that are related to heart diseases are selected for training. And, classification is carried out with IM that combines DT, RF, Bagging and NB. Moreover, the performance evaluation is carried out with confusion matrix that produces four results namely, True Positive, True Negative, False Positive and False Negative. Based on those values, the following factors are used for deriving the accuracy rate, sensitivity and specificity.

Sensitivity Rate is calculated as the expectation of the derived results to be positive; when accurate classification results are happened. The equation is presented below,

$\begin{matrix} Sensitivity Rate = \\ \frac{True Positive}{True Positive + False Negative} \end{matrix}$ (10)

Specificity can be termed as True Positive Rate (TPR) and the computation is presented below.

$\begin{matrix} Specificity Rate = \\ \frac{True Negative}{True Negative + False Positive} \end{matrix}$ (11)

Precision Rate of the classification result is computed as in (12) and can be defined as ratio of finding positive results from the retrieved instances.

$\begin{matrix} Precision Rate (PR) = \\ \frac{True Positive}{True Positive + False Positive} \end{matrix}$ (12)

Accuracy Rate is computed as the ratio of total number of correctly identified disease samples among total number of data_samples tested and the equation is presented as,

$\begin{matrix} Classification Accuracy = \\ \frac{True Positive + True Negative}{Total Samples} \times 100 \end{matrix}$ (13)

F-Score is computed as,

$F - Score = \frac{2 * PR * TPR}{PR + TPR}$ (14)

4 Results and discussions

For evaluating the efficiency of the proposed model, the implementation and model analysis is performed with WEKA environment. The obtained results are compared with the traditional classification methods such as Support Vector Machine (SVM), Neural Networks (NN) and Naive Baye’s Model (NB) for proving the proposed model efficiency. The dataset description is provided in the following section.

4.1 Dataset description

For evaluation, the dataset for HDD is obtained from UCI Machine Learning Repository [30]. Moreover, there are four databases, among that, Cleveland database is selected for processing that can be effectively utilized for Machine Learning based researches and comprises heart related attributes. The dataset characteristics are mentioned as Multivariate with 75 numbers of attributes for 303 data records. In that, 13 heart disease based attributes are presented and that information is presented in Table 1. The Cleveland UCI dataset comprises an attribute denoted as ‘num’ to depict the HDD in heart disease patients on different patterns as NORMAL and ABNORMAL. The data with absence of heart disease is classified under NORMAL and with some references of diseases are classified under ABNORMAL class.

4.2 Comparative analysis

Initially, the data attributes from UCI repository dataset is loaded for pre-processing. The subset data samples such as Age, gender, CP, Trestbps, SerumCho, FBS, RestECG, Thalach, Exang, Oldpeak, PeakSlope, NumVessels, Disease and Thal are given for pre-processing. The performance of the proposed work is evaluated based on the metrics defined in Section 3.4. The values obtained for True Positive, False Positive, Sensitivity, specificity, F-Measure and ROC is presented in the Table 3.

Table 3
Evaluation results obtained for HDD using various classification techniques

Model Patient sample Evaluation metrics

True positive False positive Sensitivity Specificity F-measure ROC

SVM 10–20 0.56 0.56 0.313 0.56 0.401 0.5

20–30 0.571 0.571 0.327 0.571 0.416 0.5

30–40 0.577 0.577 0.333 0.577 0.422 0.5

40–50 0.595 0.595 0.354 0.595 0.444 0.5

50–60 0.578 0.578 0.367 0.578 0.432 0.5

60–70 0.569 0.569 0.387 0.569 0.443 0.5

NN 10–20 0.714 0.311 0.714 0.714 0.712 0.846

20–30 0.698 0.328 0.696 0.698 0.696 0.832

30–40 0.712 0.321 0.709 0.712 0.708 0.811

40–50 0.667 0.377 0.662 0.667 0.663 0.816

50–60 0.678 0.378 0.667 0.778 0.632 0.851

60–70 0.669 0.369 0.687 0.769 0.643 0.759

NB 10–20 0.614 0.413 0.814 0.744 0.762 0.766

20–30 0.678 0.428 0.797 0.738 0.716 0.732

30–40 0.602 0.422 0.809 0.812 0.758 0.790

40–50 0.687 0.4778 0.772 0.757 0.723 0.716

50–60 0.778 0.479 0.787 0.788 0.762 0.751

60–70 0.769 0.467 0.797 0.769 0.813 0.812

IM–BDD 10–20 0.953 0.183 0.956 0.953 0.292 0.933

20–30 0.946 0.183 0.945 0.946 0.276 0.978

30–40 0.931 0.182 0.931 0.931 0.248 0.968

40–50 0.914 0.188 0.921 0.914 0.22 0.942

50–60 0.978 0.178 0.967 0.978 0.232 0.951

60–70 0.969 0.169 0.987 0.969 0.243 0.935

Model	Patient sample	Evaluation metrics
SVM	10–20	0.56	0.56	0.313	0.56	0.401	0.5
	20–30	0.571	0.571	0.327	0.571	0.416	0.5
	30–40	0.577	0.577	0.333	0.577	0.422	0.5
	40–50	0.595	0.595	0.354	0.595	0.444	0.5
	50–60	0.578	0.578	0.367	0.578	0.432	0.5
	60–70	0.569	0.569	0.387	0.569	0.443	0.5
NN	10–20	0.714	0.311	0.714	0.714	0.712	0.846
	20–30	0.698	0.328	0.696	0.698	0.696	0.832
	30–40	0.712	0.321	0.709	0.712	0.708	0.811
	40–50	0.667	0.377	0.662	0.667	0.663	0.816
	50–60	0.678	0.378	0.667	0.778	0.632	0.851
	60–70	0.669	0.369	0.687	0.769	0.643	0.759
NB	10–20	0.614	0.413	0.814	0.744	0.762	0.766
	20–30	0.678	0.428	0.797	0.738	0.716	0.732
	30–40	0.602	0.422	0.809	0.812	0.758	0.790
	40–50	0.687	0.4778	0.772	0.757	0.723	0.716
	50–60	0.778	0.479	0.787	0.788	0.762	0.751
	60–70	0.769	0.467	0.797	0.769	0.813	0.812
IM–BDD	10–20	0.953	0.183	0.956	0.953	0.292	0.933
	20–30	0.946	0.183	0.945	0.946	0.276	0.978
	30–40	0.931	0.182	0.931	0.931	0.248	0.968
	40–50	0.914	0.188	0.921	0.914	0.22	0.942
	50–60	0.978	0.178	0.967	0.978	0.232	0.951
	60–70	0.969	0.169	0.987	0.969	0.243	0.935

Based on the above results, the performance analysis of the proposed model is carried out and compared with existing techniques. In the graph presented in Fig. 3, the classification accuracy between the models is portrayed. It is explicitly shown in the Figure that the proposed IM-BDD produces higher rate of accuracy, 95.8%in average in processing 70 patient data samples. The effective amalgamation of base and ensemble classifier techniques provides better results than other works. Moreover, feature extraction process supports in a great manner for accurate result classification. Analyzing precision is very much significant in performance evaluation. And, the results are displayed in the Fig. 4. Based on the results, the proposed model enhances the precision rate for about 26%than compared works. The precision rate is reducing when the number of processing patient data samples are increasing.

Fig. 3

Accuracy rate comparisons.

Fig. 4

Evaluation based on precision rate.

The above mentioned Fig. 5 represents the Error Rate Analysis. Since, the proposed model produces higher True Positive Values and accurate results of classifying patient data; the error rate is minimal in the work. In average, the proposed integrated model produces 0.78%error rate, which is the lowest among compared works. Further, the results obtained for processing time for classification is presented in Fig. 6. The processing time for each model is evaluated with respect to the patient data from 10 to 70 samples and average time value is considered for comparison between models. And, it is obvious from the figure that the proposed work acquires lesser time for classification that other existing classification techniques. The computational complexity analysis and results are given in Fig. 7, evidences that the proposed model is effective than other compared works.

Fig. 5

Error rate analysis.

Fig. 6

Processing time for data classification.

Fig. 7

Computational complexity vs samples.

5 Conclusions and future work

Accurately detecting the heart diseases by processing the raw health care information is very much helpful in treating the diseases earlier and in effective manner and saves people life. For that, efficient data mining and machine learning techniques are incorporated and developed a new model called Integrated Model for Heart Disease Diagnosis (IM-HDD). The model integrates the classifiers namely, Random Forest, Decision Tree, Bagging and Bayesian Classification techniques for patient data classification. The model evaluation is carried out by processing with UCI repository dataset and the results show that the proposed model achieves better rate of classification accuracy with reduced error and processing time. Further, novel and effective feature selection techniques can be developed for defining significant data features for enhancing precision. The study can be further extended in desirable manner to point the research with real world data samples, instead performing simulations.

References

Liao

S.-C

and Lee

I.-N.

, Appropriate medical data categorization for data mining classification techniques, Med. Inform 27(1) (2002), 59–67.

World Health Organization. 2007 7-Febuary 2011]; Available from: http://www.who.int/mediacentre/factsheets/fs310.pdf.

Preventing Chronic Disease: A Vital Investment. World Health Organization Global Report (2005).

Durairaj

and Revathi

, Prediction of heart disease using back propagationMLPalgorithm, Int. J. Sci. Technol. Res. 4(8) (2015), 235–239.

Gavhane

, Kokkula

, Pandya

, Devadkar

, Prediction of heart disease using machine learning, in Proc. 2nd Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA) (2018), 1275–1278.

Devi

C.A.

, Rajamhoana

S.P.

, Umamaheswari

, Kiruba

, Karunya

, Deepika

, Analysis of neural networks based heart disease prediction system, in Proc. 11th Int. Conf. Hum. Syst. Interact. (HSI), Gdansk, Poland (2018), 233–239.

Banu

N.K.S.

, Swamy

, Prediction of heart disease at early stage using data mining and big data analytics: A survey, in Proc. Int. Conf. Elect., Electron., Commun., Comput. Optim. Techn. (ICEECCOT) (2016), 256–261.

Baccour

, Amended fused TOPSIS-VIKOR for classification (ATOVIC) applied to some UCI data sets, Expert Syst. Appl. 99 (2018), 115–125. doi: 10.1016/j.eswa.2018.01.025..

Kumar

P.S.

, Anand

, Kumar

V.U.

, Bhattacharyya

and Kim

T.-H.

, A computational intelligence method for effective diagnosis of heart disease using genetic algorithm, Int. J. Bio-Sci. Bio-Technol. 8(2) (2016), 363–372.

10.

Meidan

, Bohadana

, Shabtai

, Guarnizo

J.D.

, Ochoa

, Tippenhauer

N.O.

, Elovici

, Pro lIoT: A machine learning approach for IoT device identification based on network traffic analysis, in Proc. Symp. Appl. Comput. (2017), 506–509.

11.

My Chau Tu , Dongil Shin , Dongkyoo Shin , Effective diagnosis of heart disease through bagging approach, 2nd International Conference on Biomedical Engineering and Informatics (2009).

12.

Rajkumar

and Reena

G.S.

, Diagnosis of heart disease using data mining algorithm, Global Journal of Computer Science and Technology 10 (2010).

13.

Cheung

, Machine learning techniques for medical analysis, School of Information Technology and Electrical Engineering, B.Sc. Thesis, University of Queenland (2001).

14.

Sellappan Palaniappan , Intelligent heart disease prediction system using data mining techniques, IJCSNS International Journal of Computer Science and Network Security 8(8) (2008), 343–350.

15.

Mirpouya Mirmozaffari , Data mining classification algorithms for heart disease prediction, International Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) 4(1) (2017), 11–15.

16.

Sheik Abdullah , A data mining model for predicting the coronary heart disease using random forest classifier, International Conference on Recent Trends in Computational Methods, Communication and Controls 17 (2012), 22–25.

17.

Acharya

U.R.

, Sree

S.V.

, Chattopadhyay

, Yu

, Alvin

A.P.C.

, Application of recurrence quantification analysis for the automatic identification of epileptic EEG signals, International Journal of Neural Systems (2011).

18.

Idicula-Thomas

, Kulkarni

A.J.

, Kulkarni

B.D.

, Jayaraman

V.K.

, Balaji

P.V.

, A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on over expression in escherichia coli, Bioinformatics (2006).

19.

Yeh

D.-Y.

, Cheng

C.-H.

, Chen

Y.-W.

, A predictive model for cerebrovascular disease using data mining, Expert Systems with Applications (2011).

20.

Alizadehsani

, Habibi

, Hosseini

M.J.

, Mashayekhi

, Boghrati

, Ghandeharioun, A data mining approach for diagnosis of coronary artery disease, Computer Methods and Programs in Biomedicine (2013).

21.

Giri

, Acharya

U.R.

Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform, Knowledge-Based Systems (2013).

22.

Jabbar

M.A.

, Deekshatulu

LandChandra

Priti , Classification of heart disease using K-nearest neighbor and genetic algorithm, Procedia Technology (2013), 85–94.

23.

Anooj

P.K.

, Clinical decision support system: risk level prediction of heart disease using weighted fuzzy rules, Journal of King Saud University–Computer and Information Sciences (2012), 24–27.

24.

Nidhi Bhatla , Kiran Jyoti , An analysis of heart disease prediction using different data mining techniques, International Journal of Engineering Research & Technology (IJERT) 1(8) (2012), ISSN: 2278-0181.

25.

Shimpy Goyal , Dr. Rajender Singh Chhillar , A literature survey on applications of data mining techniques to predict heart diseases, International Journal of Engineering Sciences Paradigms and Researches (IJESPR) 20(01) (2015).

26.

Mohammed Abdul Khaleel , Sateesh Kumar Pradhan , Finding locally frequent diseases using modified apriori algorithm, International Journal of Advanced Research in Computer and Communication Engineering 2(10) (2013).

27.

Boshra Bahrami , Mirsaeid Hosseini Shirvani , Prediction and diagnosis of heart disease by data mining techniques, Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 2(2) (2015).

28.

Franck Le Duff , CristianMunteanb , CristianMunteanb , Marc Cuggiaa , Philippe Mabob , Predicting survival causes after out of hospital cardiac arrest using data mining method, Studies in Health Technology and Informatics 107(2) (2004), 1256–1259.

29.

Shantakumar Patil

Kumaraswamy

Y.S.

, Intelligent and effective heart attack prediction system using data mining and artificial neural network, European Journal of Scientific Research 31(4) (2009), 642–656.

30.

Dua

, Graff

UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2019).

31.

Martin-Isla CarlosM.

Campello Victor

Izquierdo Cristian

Raisi-Estabragh Zahra

Baeßler Bettina

Petersen Steffen

Lekadir Karim

Image-based cardiac diagnosis with machine learning: a review, Frontiers in Cardiovascular Medicine 7 (2020).

32.

Yazdani

, Varathan

K.D.

, Chiam

Y.K.

, et al. A novel approach for heart disease prediction using strength scores with significant predictors, BMC Med Inform Decis Mak 21 (2021), 194. https://doi.org/10.1186/s12911-021-01527-5.