Abstract
Cardiovascular disease (CVD) is a severe public health concern globally. Early and accurate CVD diagnosis is a difficult task but a necessary endeavour required to prevent further damage and protect patients’ lives. Machine Learning (ML)-based Clinical Decision Support Systems (CDSS) have the potential to assist healthcare providers in making accurate CVD diagnoses and treatments. Clinical data usually contains missing values (MVs); hence, the incorporated imputation techniques for ML have become a critical consideration when working with real-world medical datasets. Furthermore, removing instances with MVs will lead to essential data loss and produce incorrect results. To overcome these issues, this paper proposes an efficient and reliable CDSS with Ensemble Two-Fold Classification (ETC) framework for classifying heart diseases. The effectiveness of the proposed ETC framework using different supervised ML algorithms is evaluated with four distinct imputation methods for handling MVs over the standard benchmark dataset, viz., the University of California, Irwin (UCI). Experimental results show that our proposed ETC framework with the k-Nearest Neighbors(k-NN) imputation method achieves better classification accuracy of 0.9999 and a lesser error rate of 0.0989 compared to other imputation methods and classifiers with similar execution times.
Keywords
Introduction
One of the primary causes of death worldwide is cardiovascular disease (CVD) [1]. Globally, 17.9 million people died from CVD in 2019, accounting for 32% of all deaths. Food habits [62], lack of physical activity, high blood pressure, tobacco usage, obesity, cholesterol, alcohol abuse, pulse rate, diabetes, and hereditary risk factors are all associated with heart diseases [3–6]. Most CVD mortality occurs in low- and middle-income countries. According to [2], cardiovascular diseases accounted for 40% of the 19.5 million deaths related to non-communicable conditions in 2020. This makes it imperative to enhance cardiovascular disease diagnosis and treatments.
Machine Learning (ML) based on Clinical Decision Support Systems (CDSS) and other disease-specific decision support systems is becoming more popular, allowing healthcare professionals to improve the accuracy of diagnosis and further provide precise treatment with improved quality outcomes. CDSS is one of the most frequently used automated analytical model-building technologies in ML. It is used to find a typical pattern in observed clinical data, develop a classification model, and further to make the accurate decision for several diseases [14, 15]. Several scientific investigations have addressed the issue of early diagnosis of disease. Although more robust, more precise classification models have been developed and proposed, some factors reduce classification accuracy.
Real-world data are often deficient, inconsistent, inaccurate, and lacking in patterns of precise attributes. In large clinical datasets missing values (MVs), high complexity, and unbalanced classifications are common problems in the design of CDSS [7, 63]. Researchers find it hard to analyse and design CDSS when working with medical data that contain many MVs. The imputation method is a typical solution to deal with the MVs [59, 60]. The effectiveness of the presented imputation methods may differ depending on the dataset characteristics including sample size, the percentage of missing data, and the missing mechanism [9–13]. It’s tough to provide a generic answer for various scenarios requiring data imputation techniques.
A generic framework is required for dealing with MVs in datasets. This research aims to create an efficient and reliable CDSS with Ensemble Two-Fold Classification (ETC) framework for classifying CVD to address numerous problems in CDSS design. Further, this ETC framework aims to compare and contrast the outcomes of different imputation strategies. The proposed methodology is tested over the UCI benchmark heart disease dataset to evaluate the effectiveness of the suggested measures using four alternative imputation methods for handling missing data.
The rest of the paper is organised as follows: A brief overview of related articles and a description of the dataset are included in section 2. The methodology and implementation of the suggested framework using different classifiers with four distinct imputation methods are defined in section 3. Section 4 contains the result findings and a discussion of the experiments carried out. Section 5 concludes with the recommendation of further work directions.
Related works
Over the last few decades, several researchers have presented various CDSS for predicting diseases using different ML algorithms. Researchers find it hard to analyse and design CDSS when working with medical data that contain many MVs. There are several approaches to dealing with datasets with missing cases, list-wise deletion approach is the simplest; it removes any instance from the dataset with only one missing value in its variables. This strategy results in data loss and decreased classification accuracy [17, 18].
Another inefficiency of missing information approaches in the classification sector is that many of these techniques only deal with MVs in the training phase and cannot identify new data with MVs unless independent imputations are used [19, 20]. In this approach, the MVs will be estimated initially, and then the model will be taught in the training step. On the other hand, the predictive model cannot be used with data that contains MVs. As a result, additional imputation is required to classify these new data; nevertheless, imputation challenges such as selecting the appropriate sample size or the proper imputation method resurface. Table 1 summarizes multiple solutions for managing MVs.
Different strategies used in the literature to manage MVs
Different strategies used in the literature to manage MVs
Imputation is a method for handling MVs that involves replacing them with potential or estimated values in place of the MVs. The literature had suggested a number of conventional statistical and machine learning imputation techniques, including mean, regression, k-nearest neighbor, ensemble-based, etc., to handle MVs. This paper introduces an efficient and reliable CDSS with Ensemble Two-Fold Classification (ETC) framework for identifying heart disease with improved prediction accuracy and a lower error rate. The proposed ETC framework’s performance is evaluated using four distinct imputation approaches for managing MVs over the standard benchmark dataset: UCI.
In this research, the proposed ETC framework is evaluated using the Cleveland heart disease, collected from the UCI online ML repository [35]. From the literature, it is identified that 14 attributes (out of 76) were only considered for the design of the classification model. Table 2 lists the specifics of the dataset utilised in the experiment.
Attributes (Variables/ Features) of heart disease in the Cleveland dataset; the response variable is the last row (heart disease status)
Attributes (Variables/ Features) of heart disease in the Cleveland dataset; the response variable is the last row (heart disease status)
Inspired by the several CDSS previously proposed, this paper introduces an efficient and reliable CDSS with Ensemble Two-Fold Classification (ETC) framework for identifying heart disease with improved accuracy. The proposed ETC framework is depicted in Fig. 1.

Proposed ensemble two-fold classification (ETC) framework
This framework has three main phases: data gathering, pre-processing, and ETC application. In the pre-processing stage, feature selection and scaling are performed, class balancing is done, and MVs replaced by imputation methods. Using a standard scalar, all features’ coefficients are brought to the same value, ensuring each character has a mean of 0 and a standard deviation of 1. The different imputation methods are used for handling MVs with approximated values based on the values in the dataset. Those pre-processed datasets are converted into binary based on the attribute levels. The proposed ETC framework algorithm is presented in Algorithm 1. Further, a classification method is performed using Decision Trees (DT), Logistic Regression (LR), Naive Bayes (NB), Neural Networks (NN), Random Forest (RF), and Support Vector Machine (SVM) classifiers. Finally, a hybrid classifier model is constructed using the input of knowledge-based systems and clinical guideline standards. The details of the proposed ETC framework are discussed in the following sections.
Let Z i ∈ Z ⊆ T n ; i = 1, dotsc, n be the clinical dataset, where n represents the total number of samples (records /tuples/rows), and m represents the total number of features (attributes/variables). Let Z ij ∈ T, i = 1, dotsc, n and j = 1, dotsc, m be the i th and j th entry of the dataset under consideration. z ij is defined as the value of the i th attribute for the j th patient.
Select the significant features
A feature selection process reduces the data size by selecting the most significant features. It minimises the classification model’s time complexity, analysis, and design without affecting the performance [8, 16]. Clinical dataset issues are high dimensionality, partial or MVs, and a wide range of clinical characteristics and magnitudes. High-dimensional space must be mapped into a lower-dimensional space; for example:
Standard scale data pre-processing technique for training data to measure the value of each cell after transforming the dataset into an understandable format. Removing the mean and scale unit variance to standardise functions. The following formula is used to measure the average score of model x:
where ′σ′ means the total number of samples’ standard deviation, ′N′ signifies the total number of samples in the dataset,
Handling the MVs in medical datasets is one of the most challenging tasks that analysts face because making the correct decision about how to process them creates a robust data model. There is no unique rule to manipulate MVs in a specific way, the method that obtains a strong model with the best performance [36–38]. It is important to have domain knowledge about the dataset to provide an overview of pre-processing data and managing MVs. For an attribute, nullity values are MVs that are not recorded or not present. The z
ij
constructs data matrix z, where z
ij
is absent.
This approach removes any case with only one missing variable value entirely from the dataset. Deletion is the easiest method because it eliminates the need to assess value. This method loses valuable information, resulting in a reduction in classification accuracy [39, 42]. Because it removes all MVs during training, one of the critical advantages of this strategy is that it produces a resilient model. The main drawback of this system is the loss of useful information, which works poorly when the proportion of MVs exceeds the ratio of the entire dataset. The results may become biased when data are manually removed from an experiment [43].
Imputation using mean values
This method is relatively straightforward and widely utilised. Missing data is replaced by averaging all known values of an attribute and then independently replacing each column. Only numeric data can be used with it. In addition, mean imputation has a trivial effect on the correlation coefficient and does not affect the regression coefficient [44, 45]. Imputation based on Mean (Little and Rubin, 2002), a single value is simulated for all missing instances of a feature, regardless of the input data distribution. The mean is computed by dividing the total value of the samples by the total number of pieces. It is mathematically represented as follows:
Cover and Hart proposed the k-NN algorithm [46] for the first time in 1967. The instance-based, lazy-learning algorithm k-NN is widely utilised (Wu et al. 2008). Batista and Monard [47] were the first to provide k-NN imputation for dealing with MVs. The MVs were estimated by finding the k-NN with MVs and then attributing them using the observation’s non-MVs neighbours. Zhang [48] presented a grey k-NN imputation approach to estimate the MVs to deal with heterogeneous data iteratively. An imputation method based on k-NN was applied to several missing data cases using different mechanisms and missing data models [53]. In the k-NN algorithm, nearest neighbours of MVs are classified and used to attribute MVs using a distance measure between the neighbours [49]. For k-NN imputation, several distance measures can be used, including the Minkowski distance, Cosine distance, Manhattan distance, Hamming distance, Jaccard distance, and Euclidean distance. Still, the Euclidean distance is the most widely used due to its efficiency and productivity [50, 51]. The Euclidean distance can determine the similarity between records by measuring the distance between them. This method is adaptable, allowing it to be used with both discrete and continuous datasets and numerous missing datasets [49, 52].
Imputation using Naive Bayes algorithm
Naive Bayes Imputation (NBI) is a technique for filling MVs by substituting the probability estimate for attribute information. The NBI method divides all data into two groups: complete data and data with missing data. The technique is repeated for each absent attribute to create complete data for categorisation. In the imputation technique for the lost value, entire data is utilised. The dataset is expressed in vector form with the m sequence attribute z
i
= [zi1, zi2, zi3, …, z
im
] and the class is shown as t
j
consist T = {t1, t2, t3, …, t
j
}. Data with missing attributes declared with probability P(Z1 = z1, Z2 = z2 … Z
j
= ? … Z
d
= z
d
|y) [54]. NBI is used to anticipate the value of a variable that is missing on partial data, altering the probability calculation [55]. The following is the probability equation for describing the missing attributes:
In order to fill in MVs, the Naive Bayes imputation method is used. To determine each missing attribute, the following equation is used:
Algorithm 1 Proposed Ensemble Two-Fold Classification (ETC) Framework Algorithm
Procedure
number of null values (absent) of each column.
Naive Bayes imputation. Let matrix IV(1×b) have the imputation values.
Procedure ETC Algorithm
categorical_evaluation_attributes = {cp, thal, slope, restecg}
threshold_evaluation_attributes = {age,trestbps, chol, thalach, ca}
maximum_threshold_values = {age:55,trestbps:140, chol:240, thalach:165, ca: 1 and below}
For i from categorical_evaluation_attributes do
current_col = categorical_evaluation_attributes[i]
current_col_levels = D.split(current_col.getLevel())
For j in current_col
j = = current_col_levels: then
replace 1 at equalent column’s row value.
put 0 on remaining column’s row value.
For i from threshold_evaluation_attributes do
For each_cell_values from threshold_evaluation_attributes[i]: do
If each_cell_values >maximum_threshold_values[i]: then
State each_cell_values = 1
each_cell_values = 0
Divide the dataset into two partitions (Training data 70% and Testing data 30%) using sampling techniques
Procedure Ensemble Model
For i = 1 to n
The benchmark dataset for our proposed work has four data types of values. The first is an integer, the second is binary, the third is categorical (discrete), and the final one is continuous. All data types have different characteristics. The discrete type attributes have a certain level that entirely depends on the attribute’s types. The constant and integer type attributes have values within a specific range. To deal with these two categories, we are using two different approaches. Binary values are simple to create a rule with only two states, either 1 or 0, compared to discrete type values and continuous type values. One of this framework’s critical phases is converting the dataset into a binary format, which helps the ML algorithms in rules generation phases. The significant advantage of this framework is that it is fit for all types of ML algorithms. This ETC framework works based on the categorical binary conversion and threshold weight. The categorical variable concept is only applicable for discrete type attributes. Threshold evaluation is used for continuous and integer type attributes.
The categorical variable concept will apply to the following attributes: cp, restecg, slope, and thal. A single discrete type attribute column will be converted into multiple numbers of columns; it can be varied depending on its attribute level. Then which cell has a value that’s the corresponding row in the corresponding column will get the value as 1, rest of the value of the column as 0. For example, consider the scenario; the chest pain attribute has four levels. After applying the concept of the categorical variable, the chest pain (cp) column will be split into four columns cp_1, cp_2, cp_3, and cp_4. From row 1, it has a value of 1. So, cp_1 will get the value as 1. The remaining columns (cp_2, cp_3, cp_4) will get the value as 0. Similarly, it will work for all the samples. After completing this level, discrete-valued attributes were converted into binary values.
The continuous type and integer type attributes will be considered threshold evaluation techniques. It requires a maximum boundary value. Once the instance value crosses the maximum boundary limit, the patient may have the possibility of getting the disease; otherwise, the patient considering in a normal state (patient healthy). For example, let’s consider average blood pressure levels below 120/80 mm Hg and above 90/60 mm Hg are required for the above-said threshold evaluation. In such a way, all the remaining attributes are treated similarly. Accordingly, the following values were set as the maximum threshold values inside the algorithm based on medical references such as age - 55, trestbps - 140, chol-240, thalach - 165, ca - 0,1 are critical states, 2 and 3 are normal for evaluations to generate a fully binary dataset.
Validation schemes
One of the most critical processes in machine learning is model validation. The train-test hold-out validation is a data partitioning strategy for evaluating the produced model’s performance. The dataset is split into two sections: training data (70%) and testing data (30%) [56].
Ensemble model
Modelling with ensembles is a technique in which several diverse models are used to predict a result [61]. In predictive modelling, costumes are more accurate than individual models. The generalisation error of a prediction is reduced when ensemble models are used. The ensemble approach decreases prediction error when the underlying base models are diverse and independent.
To build a hybrid ensemble of n classifiers using a dataset D and a collection of classification algorithms N, each is formed by applying an algorithm selected in alternating G on a set of D data samples with bootstrap sampling [57]. By choosing algorithms randomly rather than alternating among them, one can train the hybrid ensemble using one of the algorithms in N with an equal probability. Consequently, according to prior knowledge, we can assign unequal probabilities to different algorithms. This process is illustrated in algorithm step 7.
A bootstrap sample is used to train diverse classifiers when constructing an ensemble from different datasets. The entry of the bootstrap sampling method is a D dataset, and the result is a D a dataset of data samples drawn by substituting D, |D a | = |D|. An ensemble of classifiers is made up of other datasets used to teach them; bootstrap sampling is the only source of diversity. In addition to training diverse classifiers, different classification algorithms are used, which offers a second source of diversity [58].
This trained model can be used to predict heart disease and help detect heart disease in patients. As a consequence, the number of tests is limited. Consequently, the condition will be cured at the right time, saving the lives of lakhs of people.
Experimental results and discussion
In this section, experimental results of the proposed ETC framework are discussed. Four different types of experiments are conducted on the UCI Cleveland dataset to evaluate the efficiency of the proposed ETC framework using six classification algorithms such as DT, LR, NB, NN, RF, and SVM. Several evaluation metrics assess the classifiers’ performances (accuracy, Mean Squared Error (MSE), precision, recall, f1-score, sensitivity, specificity, and ROC_AUC score). The details of four different experiment (E) results before and after applying the ETC framework are discussed in the following section, and the final result is presented in Tables 3–8. The overall comparison of four different imputation techniques is shown in Fig. 6.
Different imputation methods for DT classifiers with and without ETC framework
Different imputation methods for DT classifiers with and without ETC framework
Different imputation methods for LR classifiers with and without ETC framework
Different imputation methods for NB classifiers with and without ETC framework
Different imputation methods for NN classifiers with and without ETC framework
Different imputation methods for RF classifiers with and without ETC framework
Different imputation methods for SVM classifiers with and without ETC framework

Graphical representation of overall comparative analysis of accuracy and error rate of four different imputation methods.
The Experiment 1 explains the Imputation Using Listwise Deletion Method with and without ETC Framework. The Cleveland dataset has 76 attributes and 303 records. In this experiment, the record containing the MVs is removed, and six classification models such as DT, LR, NB, NN, RF, and SVM are constructed using with and without the ETC framework. The performance evaluation of with and without ETC framework for Experiment 1 is presented in Fig. 2. The experimental results without using the ETC framework show that the DT classifier achieves better classification accuracy of 0.9999 and a lower error rate of 0.2666 compared to other classification models with similar execution times.

Graphical representation of imputation using listwise deletion method with and without ETC framework.
The performance of the ETC framework identified that the RF algorithm achieved a better accuracy of 0.9999 and an error rate of 0.1, precision, recall, f1-score, sensitivity, specificity, and ROC-AUC score of 0.89, 0.87, 0.88, 0.8717, 0.9215, 0.8966 respectively. Mostly, the effectiveness of this ETC framework has improved prediction accuracy and minimized error rate. Imputation using the Listwise Deletion Method is described in Experiment 1 both with and without the ETC Framework. Also, it is inferred that the proposed ETC framework performs well primarily on all classification models when compared without the ETC framework.
The Experiment 2 describes the Imputation Using k-Nearest Neighbors (k-NN) method with and without ETC Framework. The performance of with and without ETC framework for Experiment 2 is presented in Fig. 3. In this experiment, the record containing the MVs is replaced by the k-NN imputation method, and six different classification models are constructed with and without the ETC framework. The imputation using k-Nearest Neighbors (k-NN) approach is discussed in Experiment 2 on both with and without the ETC Framework. Compared to other classification models with comparable execution times, the experimental results without using the ETC framework demonstrate that the DT and RF classifiers are comparatively robust and obtain a better classification accuracy of 0.9999 and a lower error rate of 0.2417.

Graphical representation of imputation using k-Nearest Neighbors (k-NN) method with and without ETC framework.
The ETC framework’s performance revealed that the DT and RF algorithms are relatively strong. As a result, when error rate and accuracy are taken into account as assessment measures, RF performance outperforms the DT method. Finally, the RF algorithm achieved improved accuracy of 0.9999, error rate of 0.0989, precision, recall, f1-score, sensitivity, specificity, and ROC-AUC score of 0.89, 0.87, 0.88, 0.875, 0.9215, and 0.8975, respectively. When compared to models without the suggested ETC framework, its effectiveness consistently outperforms all classification methods.
Figure 4 displays the performance of Experiment 3 with and without the ETC framework. In this experiment, six alternative classification models are built using both the with and without ETC framework. The Naive Bayes imputation approach substitutes the record storing the MVs. When accuracy is considered as the only parameter for evaluation, the experimental results without the ETC framework demonstrate that the DT classifier achieves greater classification accuracy of 0.9999. The performance of the NB classifier is superior to other algorithms when the error rate (0.1318) is considered as an evaluation parameter. The RF classifier is superior to other classification algorithms when accuracy and error rate are considered evaluation measures.

Graphical representation of imputation using naive bayes method with and without ETC framework
The performance of the ETC framework identified that the DT algorithm achieved better accuracy of 0.9999 and an error rate of 0.2637, precision, recall, f1-score, sensitivity, specificity, and ROC-AUC score is 0.65, 0.69, 0.67, 0.6857, 0.7678, 0.7267 respectively. Mostly, the effectiveness of this ETC framework has improved prediction accuracy and minimized error rate. Additionally, it can be deduced that, when compared to classification models without the suggested ETC framework, all classification models perform largely well.
The performance with and without the ETC framework for Experiment 4 is presented in Fig. 5. In this experiment, the record containing the MVs is replaced by the mean imputation method, and six different classification models are constructed using with and without the ETC framework. The experimental findings without the ETC framework show that the DT classifier obtains superior classification accuracy of 0.9999 when accuracy is utilised as the single metric for evaluation. When the error rate is considered as an evaluation parameter, the LR classifier performs better (0.1648) than other algorithms.

Graphical representation of imputation using mean values with and without ETC framework
The performance of the ETC framework identified that the DT and RF algorithms are relatively high. As a result, the RF performance surpasses the DT algorithm when error rate and accuracy are considered evaluation metrics. Finally, the RF algorithm achieved a better accuracy of 0.9999, and an error rate of 0.1648, precision, recall, f1-score, sensitivity, specificity, and ROC-AUC score is 0.82, 0.8, 0.81, 0.8048, 0.86, 0.8324 respectively. Additionally, it can be inferred that all classification models perform significantly well when compared to classification models without the suggested ETC framework.
From Table 3, the experimental results without ETC framework using DT classifier show that the k-NN imputation method performs well while considering accuracy and error rate compared to other imputation methods. The experimental results of using the ETC framework utilizing the DT classifier for all imputation methods are similar. Still, the k-NN imputation approach is superior when considering error rate as an extra parameter. From the observation, the evaluation metrics of the DT classifier using the ETC framework with all imputation methods performs well.
Table 4’s experimental results utilizing the LR classifier without the ETC framework demonstrate that the NB imputation method outperforms other imputation techniques in terms of accuracy and error rate. All imputation approaches’ experimental outcomes using the ETC framework and LR classifier are comparatively similar. Still, the NB imputation strategy is superior when the error rate is considered as an additional parameter. According to the observation, the performance of the evaluation metrics for the LR classifier utilizing the ETC framework and all imputation methods is good.
The NB imputation method outperforms other imputation strategies in terms of accuracy and error rate, according to the experimental results in Table 5, utilising the NB classifier without the ETC framework. While the experimental results of all imputation strategies using the ETC framework and NB classifier are generally comparable, the k-NN imputation technique is superior when the error rate is considered an extra parameter. The performance of the evaluation metrics for the NB classifier using the ETC framework and all imputation methods, according to the observation, is good.
The experimental results in Table 6 using the NN classifier without the ETC framework show that the k-NN imputation approach beats alternative imputation strategies in terms of accuracy and error rate. The mean imputation strategy is superior when the error rate is considered as an additional parameter, even if the experimental results of all imputation strategies employing the ETC framework and NN classifier are generally equivalent. According to the observation, the performance of the assessment metrics for the NN classifier utilizing the ETC framework and all imputation techniques is good.
According to the experimental results in Table 7 using the RF classifier without the ETC framework, the mean imputation method outperforms other imputation strategies in terms of accuracy and error rate. Even though the experimental results of all imputation strategies using the ETC framework and RF classifier are often equal, the k-NN imputation strategy is preferred when the error rate is considered as an additional parameter. According to the observations, the performance of the assessment metrics for the RF classifier using the ETC framework and all imputation procedures is good.
From Table 8, the experimental results without the ETC framework using the SVM classifier show that the listwise deletion imputation method performs well while considering accuracy and error rate compared with other imputation methods. The experimental results of using the ETC framework utilizing an SVM classifier for all imputation methods are similar. Still, the k-NN imputation approach is superior when considering error rate as an extra parameter. From the observation, the evaluation metrics of the SVM classifier using the ETC framework with all imputation methods performs well.
From Fig. 6 and Tables 3–8, the experimental results of four different imputation experiments and six different classification models with and without the ETC framework show that the k-NN imputation method with RF performs well when compared with other imputation methods and other classifiers. The experimental results of this ETC framework on four different imputation methods have improved prediction accuracy and minimized error rate. It also shows some interesting findings, such as the suggested ETC framework having a lot of potentials and serving as a model for the healthcare business in terms of CDSS architecture.
A novel CDSS with Ensemble Two-Fold Classification (ETC) framework for classifying cardiovascular diseases is proposed in this paper. This framework addresses the missing values in the clinical dataset. Further, the effectiveness of the proposed ETC framework using six classification algorithm models such as DT, LR, NB, NN, RF, and SVM is evaluated with four distinct imputation methods for handling MVs over the standard benchmark dataset, viz., UCI. Compared with past imputation methods with similar execution times, our proposed ETC framework with the k-NN imputation technique with an RF classifier achieves better classification accuracy of 0.9999 and a lower error rate of 0.0989. According to the analysis of the result, the proposed ETC framework outperformed individual classifiers for all imputation approaches in terms of accuracy, F1-Score, precision, recall (R), sensitivity, and other assessment metrics. In the future, the performance of this proposed Ensemble Two-Fold Classification (ETC) framework in diagnosing chronic diseases such as kidney disease, diabetes, breast cancer, liver disease, hepatitis, and all types of cancer will be evaluated using available datasets. In addition, the proposed framework can be expanded by utilising IoT devices to collect clinical parameters in real time. Moreover, a user-friendly application based on the suggested framework might be developed, allowing users to access it online and execute any query quickly and effectively.
