Abstract
Electronic Medical Records (EMR) carry important information about a patient’s journey. The past decade shows substantial use of Natural Language Processing (NLP)-based Information Retrieval (IR) techniques to extract insights such as symptoms, diseases, and tests from these unstructured records. The state-of-the-art shows that convolutional neural networks (CNN) make a significant contribution to the disease classification task.A significant improvement in precise knowledge mining is possible with precise feature extraction. Feature selection addresses undesirable, unneeded, or irrelevant features. This article proposes a Modified Rider Optimization Algorithm (MROA) to choose important features by selecting optimal weights from a pool of randomly generated weights based on high accuracy and less training time in the CNN algorithm. A modified approach is trained on 114 N2C2 patients’ records to extract symptoms, disease, and tests are performed on them to perform disease classification tasks. The proposed approach is found to be accurate, with 97.77% accuracy in the disease classification and treatment prediction task from EMR.
Keywords
Introduction
Feature selection techniques overcome challenges related to inappropriate, irrelevant, or unnecessary features. The process extracts the best features from the pool of candidate set features. When dealing with unstructured data, the selection of optimal features is a vital and challenging task. Especially in the healthcare domain, selecting appropriate features and removing redundant features improves the machine learning model’s performance to some extent [2]. Mathematically, a feature selection problem can be articulated as: Assume a dataset D contains d number of features. The feature selection mechanism selects relevant features among the available features. Given the dataset
The main contribution of this work is given as follows:
This research explains what feature selection challenges are and how to solve them, as well as the basics of meta-heuristic algorithms. In the literature, the problems, and challenges of redundant information in healthcare are listed. This is a very important contribution to the field of feature optimization. Preprocessed Patient Information and Discharge Summary datasets from the N2C2 repository [23]. Proposed CNN disease classification approach for optimal feature selection using Modified Rider Optimization. With a critical literature review comparing traditional supervised algorithms for disease prediction and treatment suggestion with modified CNN with feature selection, the results are compared with those of traditional algorithms. Finally, the research gaps and future work are also presented to enhance the research work.
This section presents a critical literature review of feature optimization methods. The section deep dives into the description of the feature selection problem and the comparative review of traditional classification algorithms.
In healthcare, applications of early detection and precision in disease prediction from EMR are challenging tasks. EMR carries abundant data, which, when preprocessed with clinical NER, extracts redundant features. This section studies the literature related to feature optimization to remove redundant features from extracted data and classification algorithms for disease prediction.
Feature optimization techniques
In medical data mining, there are many features in EMR with similarities among them, leading to the need for new methods to reduce redundant features. This section discusses various feature optimization techniques to determine important features. Feature selection is essential in classification and prediction techniques. The precision of disease classification can be improved by selecting a subset of unique features from the pool of extracted features. This section aims to provide a review of optimized feature selection algorithms.
Fisher’s discriminant ratio (FDR) for feature selection
The curse of dimensionality affects machine learning models adversely. FDR proposed a function to minimize the class overlap. At the same time, FDR tries to keep less variance within the class-covariance matrix while maximizing the class mean separation.
Rider optimization problem
The method performs optimal feature selection by accelerating the rider’s location towards the leading riders at each time step. In that case, a small local neighborhood, or the attacker, stops local minima from being reached. However, rapid convergence is achieved by a more global neighborhood, as demonstrated by the overtaker. Rather than having such benefits, this algorithm will often be stuck solving discrete optimization problems. To further improve its performance, the MROA’s pattern of gears and steering angle are modified by the updated pattern. It is determined based on the fitness function of the current solution [7].
Harris Hawks Search Strategy
The Harris Hawks Search Strategy (HHO) algorithm is a swarm intelligence optimization algorithm that is widely used in solving optimization problems. The main idea of the algorithm is derived from the cooperative behavior and chasing strategy of Harris’s hawk when catching prey in nature. This algorithm selects the best features from the list of features by using a fitness function. The state-of-the-art shows improving optimization by using machine learning algorithms to calculate a fitness value [10]. The process of feature optimization is performed in four phases: initialization, exploration, exploitation, and fitness value calculation. The algorithm works as follows [2, 3].
Initialization
The Sine Chaotic map is used to perform the initialization of the feature vectors. The first feature vector is given a random starting point, and then the Sine Chaotic map is applied to it so that all the subsequent feature vectors can be derived from the first. In addition to that, all the other parameters have some starting values assigned to them (population, number of iterations, and acceleration).
Exploration phase
During this part of the process, all the Harris Hawks are evaluated to determine which ones have the potential to solve the feature selection problem. The sharp eyes of Harris Hawks allow them to easily track down their prey, even if the prey is not constantly visible to them. Harris Hawks are among the most intelligent birds. The HHO works with a similar approach; each Harris Hawk is treated as a potential solution, and the Harris Hawk that produces the lowest value when it is fed into the objective function is deemed to be the prey. The weight values that give maximum classification accuracy are selected with the fewest possible features.
Exploitation
At this stage of the HHO, the surrounding areas of the feature vectors are evaluated for their potential use. The Harris Hawks carry out the surprise pounce by advancing in the direction of the target that was discovered during the reconnaissance phase. It employs strategies that are derived from the location of the chase that was defined in the stage that came before this one.
Fitness function
This part is for assessing the quality of a feature vector. Since the feature selection algorithm uses a learning algorithm, most of the literature uses a K-NN classifier to calculate the efficacy of a selected subset of features.
The creation of metaheuristic algorithms, which were designed to tackle a wide range of new and developing issues, was a hugely successful endeavor. Because of this, a significant number of academics grabbed the opportunity to utilize metaheuristic algorithms to resolve FS issues. However, we contend that the existing body of literature suffers from several shortcomings, including the following: getting stuck in low convergence and having to spend a lot of time computing. It’s possible that the dimensions of big data will influence how well the algorithm works. It requires some time in advance to train the algorithm’s parameters so that it can choose the most optimal configuration.
To locate the Optimal Feature Selection (OSF) and handle many of the restrictions found in earlier investigations, a hybrid approach is utilized as part of the search. To circumvent LO, the SA is combined with the Harris Hawks optimization method, which may be willing to accept a probability-based solution that is even less optimal.
Comparative analysis of classification algorithms
This section discusses various supervised learning approaches to predict future clinical events [3]. The algorithms under study are Support Vector Machine (SVM), Adaptive Network-based Fuzzy Inference System (ANFIS) [4], Computational Neural Network (CNN), and Recurrent Neural Network (RNN). The Table 1 shows the comparative review of algorithms studied to implement the proposed approach.
Empirical review of classification algorithms used
Empirical review of classification algorithms used
From the literature review, two primary goals are considered: maximizing accuracy and decreasing the number of selected features. It can be observed that several classification algorithms have been proposed using the swarm algorithms [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The literature clearly demonstrates the efficacy of improving classification performance through feature optimization. This paper mainly focuses on solving the feature selection problem using swarm-based variants of metaheuristic algorithms. The proposed approach selects the best features by assigning optimal weight in training CNN by using the Rider Optimization Algorithm (ROA). In addition to feature optimization, the literature also observed the performance metrics used to evaluate classification results. In the healthcare domain, the implications of balancing and feature selection on the classifier’s performance were investigated, and competitive results were evaluated with accuracy, MCC score, and f1-score. The outcome demonstrates a tangible evaluation with MCC and f1-score by overcoming overestimation of performance measures such as confusion matrix and accuracy [21, 22].
The review identified the following research challenges:
There is a high need for feature optimization in healthcare data analytics. From the pool of metaheuristic algorithms, the state-of-the-art emphasizes swarm-based algorithms. Prediction precision can be improved using a wrapper classifier with feature optimization techniques. The selection of evaluation metrics needs to consider the imbalanced nature of healthcare data.
The research work addresses enlisted challenges by proposing a Disease Prediction Architecture using Proposed Classification through MROA-CNN and evaluating the results with MCC and F1-score.
The organization of the paper is as follows:
Section 2 presents the issues and challenges due to redundant features and the state-of-the-art work in feature optimization using metaheuristic algorithms. Section 3 details the architecture of disease classification using CNN with the proposed Modified Rider Optimization Algorithm (MROA). Section 4 depicts the experiment analysis and the results of the proposed methodology in detail. Section 5 concludes the research work with limitations and suggestions for future work.
The proposed framework is capable of disease classification and treatment recommendation. The process begins with pre-processing datasets and extracting data headers. The Harris Hawk’s method selects optimal headers from the dataset. In the following phase, datasets are merged to preserve medical terms in relation to headers extracted in the previous phase. To convert text data into vectors, the clinical BERT model has been used. The cBERT model vectorizes text data with respect to discharge summary text as context information [23]. To improve the classification results of CNN, the paper proposed a disease prediction model with optimal feature selection using the Rider Optimization Algorithm. The proposed approach optimizes weight value using the Modified Rider Optimization Algorithm from Convolutional Neural Network (MROA-CNN). Figure 1 is the framework of the proposed architecture.
Disease prediction architecture using proposed classification through MROA-CNN.
The proposed system uses the Patient Information dataset and the Discharge Summaries dataset that have been scraped and prepared for analysis [20, 21]. In the proposed work, we have a patient info dataset and a Discharge Summary (DS) dataset. DS dataset contains features, Symptoms, Disease, Treatment, and Description. Description columns are unstructured discharge summaries of patients with details such as tests, family and social history, allergies, etc. The Symptom, Disease, and Treatment columns are entities extracted using a modified BERT algorithm [5]. The dataset contains symptoms and treatment information for 50 diseases, which are further clustered into four categories based on semantic similarity. There are 10,000 records in the training set and 5000 records in the test set.The Patient Info dataset holds I nformation about Age, Weight, Time_in_hospital, No_of_lab_procedures, Medical_speciality and Lab_procedures performed on the patient. The Patient Info dataset is merged with the Discharge Summary dataset after preprocessing and Medical Term Identification, as shown in Fig. 1. The Medical Term Identification dataset is a lookup dictionary containing medical terms. The dictionary contains a curated list of 647 medical terms. After pre-processing discharge summaries, the extracted medical features are retained by calculating cosine similarity with terms from the medical dictionary.
The work starts by extracting the medical features from datasets and performing disease identification from the symptoms. The features are chosen to reduce training time and improve classification accuracy. Feature selection techniques retain only medical terms by capturing the cosine similarity of extracted terms from discharge summaries with a medical term dictionary. This step selects optimized features by removing non-medical terms that are not useful for disease-treatment classification. Figure 2 shows the datasets used to implement the proposed study.
Datasets used for proposed architecture.
To provide a long-lasting record of a patient’s visit to a hospital, Discharge Summaries (DS) are highly essential. A proficient methodology of interaction between different hospital services and primary care providers is provided by this documentation. Certain methodologies are desired to extract related entities from medical records so that the data locked up in the free text in them can be utilized. Consequently, it is highly complicated to extract information automatically from unstructured texts that are as varied as clinical texts in English. To recognize the disease, data mining plays a significant role so that the issues can be rectified. Pattern data that is hidden in a huge amount of medical data is recognized by data mining with the aid of disease data. Thus, a disease status prediction framework utilizing clinical DSs is proposed here. (a) patient information data pre-processing; (b) header extraction; (c) header selection; (d) DSs data pre-processing; (e) medical term identification; (f) dataset merging, (g) word embedding; and (h) classification are the phases included in the proposed model.
The implementation of the proposed approach is divided into two major phases: Patient Information Phase and Discharge Summary Phase.
Patient Information Phase.
Initially, from the openly accessible dataset, the patient data is gathered. Next, to augment the performance, the information collected is pre-processed. Pre-processing is a significant stage in data mining. Dropping missing values and numeralization are pre-processing techniques utilized in the proposed mechanism.
Removing missing values
Here, to minimize the processing time, missing values like blanks, Nans, or other placeholders like “?” are eliminated as of the dataset. It happens whilst no value is stored for the specific feature in the dataset owing to an error in data collection. Therefore, by minimizing all rows that include null values, by dropping the column containing missing values, and by utilizing substituted values, the missing values can be eliminated. To enhance the model’s performance, the column containing the missing value can be completely dropped as of the dataset in the proposed methodology.
Numeralization
The input dataset’s numeralization is executed following the removal of the missing values. Both the numerical data and character data are included in the input dataset. The complication in data processing and classification is augmented owing to this inconsistency in data. Consequently, the numeralization is performed to integrate these data. In this process, the actual patient data in the dataset is transformed into numerical data; that is to say, the data’s symbolic features are transformed into integer form. The integer value ranges as of 1 to the number of varied values.
Header extraction
The header extraction phase is performed following the pre-processing step. The procedure utilized for extracting header information as of the dataset is mentioned as header extraction. It has the potency to manage required data during the occurrence of higher dimensionality problems. Patient_id(
In Eq. (1), the number of extracted features is specified as
As select optimal headers from dataset, the Harris Hawk’s Optimization (HHO) nature-inspired meta-heuristic algorithm have been implemented. The method is centered on the attacking along with food searching behaviour of Harris hawks. Regarding the prey’s escaping patterns along with the plots’ nature; hawks attack the prey from varied directions and utilize various chasing styles. Exploration and exploitation are the ‘2’ stages included in this. Initially, by utilizing the exploration along with the exploitation technique, hawks will be distributed randomly in a location whilst waiting for prey. The Harris hawks’ prey-catching techniques rely on the alteration in their energy. Nevertheless, owing to energy drop, HHO has the probability to be entangled in lower diversity, local optima, and unbalanced exploitation power. Consequently, in the previous HHO algorithm, the Gravitational Search Strategy was hybridized to enhance the HHO’s performance along with to augment the energy capacity. By the utilization of this search technique, the energy drop in HHO is prevented, and the optimal solution is enhanced. The detailed procedure is explained below [2, 3].
The algorithm works as follows:
Improved Header Selection using Harris Hawk’s Prey’s Energy Search Strategy
Meta Heuristic Feature Selection, it starts from a random token in search space. Recognize all neighbour tokens. Move to the token with the best functional value/fitness value
It is calculated based on the position of each individual token at each iteration. It signifies the random hawks selected from the initial population. Repeat until neighbouring tokens have lower functional values.
Discharge Summary Phase:
Here, the patient’s DS is obtained, and it is pre-processed for better evaluation. Tokenization, Stop word removal, and Lemmatization are the steps included in pre-processing.
Tokenization
In this, the manuscript’s stream is broken into phrases, words, symbols, or certain forms of crucial elements, which are termed tokens. The process of isolation of these tokens is mentioned as Tokenization. The words that are not reliable for evaluation are removed by this process. Here, the search process is reduced; in addition, the storage space needed to stock up the data is minimized. By replacing the actual data with a distinct value of the same format and length, the sensitive data are preserved.
Stop word removal
For information retrieval and text mining, the words that are utilized more frequently in English aren’t helpful. Such words are mentioned as stop words. These words that aren’t essential for text mining exist in every single file. There occurs some difficulty in learning the document content with the presence of a greater number of these words, and are, a, this, an, with, etcetera, are some of the commonly utilized stop words. Consequently, to advance the classification process along with decrease the classifier’s processing time, the stop words can be removed.
Lemmatization
The process of removing prefixes along with suffixes exists in the word called lemma; in addition, obtaining the root word as the word appears in the dictionary is termed Lemmatization. To recognize the contexts of text in text mining, this text simplification process is utilized so that the complications in word grouping are avoided. For exemplar, the words “studies” as well as “studying” are the other forms of the word “study”. Thus, in this process, these words are turned into “study”. In the same way, lemmatization is done for the words in the DS dataset.
Medical term identification
The medical terms regarding the disease and symptoms are recognized here. To recognize the disease status efficiently, it is obligatory to extract along with discovering the medical terms in the dataset since it may cause some complexities. So here from the dataset, the medical terms regarding the patient’s symptoms like decongestants, antihistamines, omeprazole, et cetera, are extracted along with detected. Consequently, the patient’s patient information modules and the DS modules are amalgamated regarding the patient id following the detection of medical terms.
Bidirectional encoder representations from transformers (BERT)
The paper implements clinical BERT model to create context vectors of merged dataset. The cBERT model is trained using MIMIC discharge summary dataset [26]. The cBERT model learns representation from clinical notes and hence can understand the clinical abbreviations and jargon. The cBERT is a masked language model which masks 15% of the tokens to overcome out of vocabulary challenge in language models. The model when retrained with discharge summary dataset creates context specific vector representation of data [23, 26].
Proposed classification through MROA-CNN
Classification is the last step in the proposed disease prediction technique. The Modified Rider Optimization Algorithm with Convolutional Neural Network (MROA-CNN) is utilized to perform classification. The disease together with the treatment details needed for that specific disease is classified according to the symptoms. The output of every single layer act as input to the subsequent layer in CNN. A series of filters of fixed size is utilized in the multi-convolution operation in which the outcome of every single layer is transformed by a non-linear weight function until the output layer. Nevertheless, the training will be continued if CNN’s loss function is lower. If it is higher, then the weights of CNN are updated utilizing the MROA to obtain an advanced classification outcome. This modification in CNN is termed MROA-CNN. The CNN architecture is demonstrated below.
Conventional CNN architecture [16].
Step 1: At first, the input
Where, the
Step 2: In this, the pooling operation is applied to mitigate the feature vector’s dimensionality. Average pooling is utilized by taking the feature vector’s neighboring average. It makes the feature smoother along with filters the noise present in the input vector. Subsequently, it is transferred the output layer via the flattened layer. The output of fully connected layer
Where, the number of layers in the network is specified as
Step 3: The output of MROA-CNN utilizing softmax function is,
Consequently, the disease type and its respective details are obtained from the output. After that, weight values optimization at some stage in maximum loss function utilizing the MROA methodology is illustrated below.
The thought of a set of riders riding to attain the goal is illustrated in Rider Optimization Algorithm (ROA). This methodology is developed regarding the riding techniques of bypass riders, over-takers, followers, along with attack riders. The lead rider’s initial position is updated randomly in this methodology. This random updating may get into a local optimal solution. A Nudged Elastic (NE) methodology is presented to update the leader’s position with an intention to conquer the challenges. During position updating, the force of riders in all directions linking initial and final states is computed by NE; thus, the optimal solution is enhanced.
Step 1: Initially, the group of riders (CNN network’s weight values) is initialized which is expressed in Eq. (6),
Where, the number of riders is illustrated as
Step 2: By determining the success rate, the bypass rider turns into the leader at this moment; the leader’s position can be updated by utilizing NE as,
In this equation, the spring force parallel to the riders is specified as
Step 3: After that, the follower position is identical to the bypass rider since the follower’s walking path relies on the bypass rider. The follower position can be expressed as,
Where, the co-ordinate sector is indicated as
Step 4: Likewise, the over-takers update their position by following their own position along with they relied on the coordinate sector, direction indicator, and relative successive rate. The updated position is formulated as,
In equation (27), the position of
Step 5: Next, the attacker rider is trying to reach the leader position and follow the technique like the follower. The attacker rider’s position updation is,
Here, the leader’s position is specified as
In this section, the proposed disease status prediction framework’s performance is correlated with existing methodologies regarding the performance metrics. From the openly accessible dataset, the input patient data along with the DS details are gathered.
Performance measures
This section evaluates the efficacy of the proposed approach [21, 22, 28].
Accuracy
A confusion matrix visualizes the performance of a classification model. It consists of True Positive (TP): Observation is positive and is predicted to be positive. False Negative (FN): Observation is positive but is predicted negative. True Negative (TN): Observation is negative and is predicted to be negative. False Positive (FP): Observation is negative but is predicted positive. Accuracy is calculated as:
False positive rate (FPR)
The false positive rate (FPR) is the number of people who do not have the disease but are identified as having the disease (all FPs), divided by the total number of people who do not have the disease (includes all FPs and TNs).
False negative rate (FNR)
The false negative rate (FNR) is the number of people who have the disease but are identified as not having the disease, divided by the total number of people who have the disease.
False rejection rate (FRR)
The FRR or False Rejection Rate is the probability that the model incorrectly rejects the patient not having a particular disease to the person having the disease, due to failing to match the input symptoms. The FRR is calculated as [book]:
Negative Predictive Value (NPV)
Specificity and sensitivity are measures of high specification and precision. In disease prediction task it is important to measure the confidence with which the predictions are negative. The mistakes in negative prediction may lead to life loss; hence to evaluate the model NPV is used. High value of NPV signifies low prevalence which means probably of positive class is low. The NPV is calculated using the formula:
Mathew’s correlation coefficient (MCC)
The state of art considered most reasonable performance metric to be the ratio between the number of correctly classified samples and the overall number of samples, this measure is called accuracy, and, by definition, it also works when labels are more than two (multiclass case). However, when the dataset is unbalanced (the number of samples in one class is much larger than the number of samples in the other classes), accuracy cannot be considered a reliable measure anymore, because it provides an over-optimistic estimation of the classifier ability on the majority class. An effective solution to overcoming the class imbalance issue comes from the MCC. MCC can be calculated as:
The higher the correlation between true and predicted values, the better the prediction. MCC considers all four values in the confusion matrix, and a high value (close to 1) means that both classes are predicted well, even if one class is disproportionately under- (or over-) represented.
The performance appraisal of the proposed approach is explained below.
Performance assessment of proposed MROA-CNN
The performance of the MROA-CNN classifier is correlated with the existing methodologies like Naive Baye’s (NB), Adaptive Neuro-Fuzzy Interference System (ANFIS), K-Nearest Neighbours (KNN), Recurrent Neural Network (RNN), and Support Vector Machine (SVM). The performance is analyzed regarding the performance metrics like precision, accuracy, recall, F-measure, specificity, sensitivity, False Positive Rate (FPR), False Negative Rate (FNR), False Rejection Rate (FRR), Mathew’s Correlation Co-efficient (MCC), along with Negative Predictive Value (NPV).
Performance Evaluation by Means of Accuracy, Precision, Recall and f-measure.
The MROA-CNN’s performance is correlated with the prevailing methodologies in Fig. 4. From the graph, it is noticed that an accuracy of 97.77% was obtained by the proposed classifier whereas the accuracy attained by the prevailing ANFIS, NB, RNN, KNN, and SVM methodologies are 92.13%, 93.81%, 95.85%, 86.45%, and 91.58%, respectively. Similarly, the precision, recall, and f-measure obtained by the proposed model are 95.77%, 98.77%, and 97.77%, which are higher than that of the prevailing methodologies. For exemplar, the precision, recall, and f-measure attained by the existing ANFIS are 93.30%, 94.71%, and 94.65%, which are lower than that of the MROA-CNN. Thus, it is confirmed that the patient’s disease status from the DSs is predicted efficiently by the proposed model than the prevailing methodologies. The MROA-CNN’s performance is appraised regarding certain other metrics is illustrated in Table 2.
Performance evaluation by comparing proposed model to the existing techniques based on Sensitivity, Specificity, FNR and FPR
Table 2 shows that the proposed model is more efficient than the prevailing methodologies. The MROA-CNN classifier’s effectiveness is shown by the higher percentage of sensitivity along with specificity and the lower percentage of FNR (0.01) along with FPR (0.02). Moreover, the FNR and FPR values attained by the prevailing RNN, KNN, ANFIS, SVM, and NB are 0.51, 0.12, 0.48, 0.46, 0.54 and 0.24, 0.26, 0.25, 0.23, 0.25, respectively, which are higher than that of the proposed model. Likewise, the proposed methodology obtains a sensitivity of 98.77% and specificity of 97.77%. On the other hand, the sensitivity and specificity attained by the prevailing RNN and KNN methodologies are 96.22%, 89.43% and 97.77%, 85.56%, respectively, which are lower than that of the proposed methodology. From this assessment, it is established that the proposed methodology is highly efficient with higher sensitivity along with specificity and lower FPR along with FNR. The performance analysis regarding FRR, NPV, and MCC are illustrated in Fig. 5.
Performance Analysis of Proposed Classifier using (a) MCC and NPV, (b) FRR.
Figure 5a shows the proposed classifier’s NPV and MCC, which obtains better outcomes like 95.55% and 97.77%. Similarly, a lower value of 0.01 is attained by the proposed method’s FRR, which is shown in figure (b).
Performance analysis of proposed classifier with and without weight optimization.
For a final comparative assessment, Fig. 6 shows the effect of weight optimization on CNN accuracy, precision, f-measure, and recall. The stable result for all metrics is due to the same number of false positives and false negatives predicted. The result shows improved precision with weight optimization. Thus, the proposed classifier classifies the disease status with a lesser negative prediction and obtains higher performance with a lower FRR and higher MCC along with NPV. Conversely, the existing classifiers attain a higher FRR and lower MCC along with a higher NPV. Thus, it is obvious that the proposed mechanism classifies better than the existing methodologies.
The proposed work uses MROA-CNN to process EMR to extract meaningful information and perform disease prediction and treatment suggestion with optimal feature selection. The clinical features are extracted using the clinical BERT method. The model’s efficacy is improved by optimal feature selection using MROA – CNN in the disease prediction task. The result enhances the disease prediction task over traditional classification algorithms. The obtained MCC score demonstrates the importance of the proposed system with imbalance data, and the NPV score demonstrates the efficacy of the proposed approach in the healthcare domain, where false negative predictions can lead to life-threatening situations. In addition to the proven results, the proposed work has room for improvement by incorporating relational knowledge. The {disease, treatment} prediction task is highly dependent on the co-occurrence of features such as symptoms. The proposed work can be improved by pre-calculating the co-occurrence matrix of symptoms and inducing it as an important feature with attention mechanisms. In the future, existing approaches will be modified to identify relationships between the extracted clinical entities and construct the healthcare knowledge graph, which can be used in determining patient similarities and the early detection of diseases.
Footnotes
Acknowledgments
The proposed work is partly funded by the University of Mumbai, No.APD/ICD/2020-21/80.
