Abstract
Cancers are genetically diversified, so anticancer treatments have different levels of efficacy on people due to genetic differences. The main objective of this work is to predict the anticancer drug efficiency for colorectal cancer patients to reduce the mortality rates and provides immune energy for the patients. This paper proposes a novel anti-cancer drug efficacy system in colorectal cancer patients. The input data gene is normalized with the Min–Max normalization technique that normalizes the data in distinct scales. Subsequently, proposes an improved entropy-based feature to evaluate the uncertainty distribution of data, in which it induces weight to overcome the issue of computational complexity. Along with this feature, a correlation-based feature and statistical features are also retrieved. Subsequently, proposes a Recursive Feature Elimination with Hybrid Machine Learning (RFEHML) mechanism for selecting the appropriate feature set by eliminating the recursive features with the aid of hybrid Machine Learning strategies that combine decision tree and logistic regression. Also, the Gini impurity is employed for ranking the feature and selecting the maximum importance score by eliminating the least acquired importance score. Further, proposes a hybrid model for predicting the drug efficiency with the trained feature set. The hybrid model comprises of Long Short-Term Memory (LSTM) and Updated Rectified Linear Unit-Deep Convolutional Neural Network (UReLU-DCNN) model, in which DCNN is modified by updating the activation function at the fully connected layer. Consequently, the learned feature predicts the drug efficacy of anti-cancer in colorectal cancer patients by determining whether the patient is a responder or non-responder of the drug. Finally, the performance of the proposed RFEHML model is compared with other traditional approaches. It is found that the developed method has higher accuracy for each learning percentage, with values of 60LP = 92.48%, 70LP = 94.28%, 80LP = 95.24%, and 90LP = 96.86%, respectively.
Keywords
Introduction
In oncologic pathology, colorectal cancer is the third most frequent kind of cancer. It is right now the most frequent malignant cancer in the gastrointestinal system and the second leading cause of death from cancer, affecting both men and women globally [17,31]. The clinical presentation of colorectal cancer patients is determined by the location, size, and presence or absence of metastasis. Symptoms include involuntary weight loss, changes in chronic bowel habits, nausea, abdominal distension changes in bowel motions, malaise, vomit, anorexia, and stomach discomfort [4,24]. Distal malignancies provide obvious rectal bleeding, but proximal cancers might create mixed blood with faeces, causing it to be occulted and, as a result, anaemia gets exhibited as a secondary symptom. Also, the atypical clinical features include peripheral lymphadenopathy, particularly the Virchow lymph node in the left supraclavicular region, hepatomegaly from hepatic metastases, and muscle mass loss due to cachexy [18,23]. The colorectal cancer carcinogenic approaches to colorectal cancer carcinogenesis comprise a mechanism known as a suppressor or traditional pathway, as well as a mutator pathway [26].
Patients with a comparable diagnosis respond variably to anticancer medications due to the inherent intricacy and variety of malignancies, making cancer therapy challenging and refractory. To customize cancer treatment, it is vital to understand the cancer patients’ medication reactions depending on their genetic and clinical characteristics [7,21]. For a solution to this, comprehensive patient drug screening is necessary to identify precise medication response patterns. Researchers have been pushed to acquire large-scale drug screening information regarding cancer cell lines due to the impossibility of addressing an enormous number of cancer patients [33]. Methods for predicting drug responses in silico from cancer cell lines fall into two categories: Machine Learning (ML) based techniques as well as network-based approaches [9]. ML techniques first gather characteristics from numerous molecular metrics before making predictions with classifiers or regressions such as Support Vector Machine (SVM), Elastic-net regression, and Random Forest (RF). Further, Deep Neural Networks (DNN) have recently gained traction in predicting medication responses across cell lines employing chemical attributes [19,34]. Network-based strategies construct drug-target interaction networks or similarity networks between cell lines and drug identifiers and then forecast drug responses using various network analysis methodologies. Nevertheless, network-based techniques are predicated on the notion that identical molecular properties as well as drug chemical properties result in similar pharmacological responses [11,22]. The drug concentration of each drug was sufficient to operate on their respective targets and provide efficacy due to the lack of relevant data on drug concentration ratios of drug combinations and the accompanying synergies [27]. Although some of the traditional ML models frequently used in pharmaceutical research also have this issue, one of the primary features of DL models that has restricted their application to biological and health-related problems in general is their absence of interpretability. Additionally, the quality of the input data itself can also affect the biological interpretability of model predictions [3].
Deep Learning (DL) approaches, which include Neural Networks (NN), have proven effective in numerous application fields because of their capacity to generate sophisticated and precise predictions through inference from training data. DL applications have recently developed in pharmaceutical studies as well as medication discovery [8,14]. In addition, the investigators merged the outputs of two Convolutional Neural Networks (CNNs), one aimed at analyzing genomic characteristics of cell lines and the other at processing chemical descriptors of medications, to forecast adverse effects of drugs [2]. Furthermore, presented two Deep Neural Networks (DNNs), one aimed at analysing gene expression information and the other for handling gene mutation data, and subsequently integrated the two networks to forecast drug reactions [25]. When DNNs are used to predict drug response, they face various challenges including a diversity of inputs due to data originating from several omics platforms, a small number of samples, high dimensionality of inputs, and an excessive amount of network parameters [20]. Combinations of drugs have shown encouraging therapeutic results in the treatment of cancer patients with lower toxicity and unfavorable side effects. Nevertheless, screening the vast search space of all conceivable drug combinations experimentally is not practicable. Thus, the scientific community has focused a lot of work on creating computational models to effectively and precisely discover possible anti-cancer synergistic medication combinations. To get rid of these challenges, this work proposes an innovative approach to anti-cancer drug efficacy prediction in colorectal cancer patients via the UReLU-DCCN approach.
The main contribution of this work is as follows:
Introduces a anti-cancer drug efficacy prediction model in which Min–Max normalization is used to preprocess the data.
Derives improved entropy-based feature along with correlation and statistical features, reduce the complexities characteristics of data.
Deploys RFEHML mechanism for selecting the appropriate feature set that involves the principle of hybrid ML model which uses decision tree and logistic regression.
Proposes a hybrid model that comprises two classifiers LSTM and UReLU-DCNN that trains the selected features.
The rest of the paper is organized as follows:
The reconsideration of traditional mechanisms corresponding to anti-cancer drug efficacy prediction is studied in Section 2. In Section 3, the model of anti-cancer drug efficacy prediction in colorectal cancer patients via the UReLU-DCCN approach is explained. Section 4 reveals the demonstration of the proposed model by comparing various measures, and Section 5 shows the summarized part of the proposed model.
Literature review
In 2020, JungHo Kong, et al. [18] framed a network-based ML to predict anti-drug efficacy in patients. The suggested work accurately recognized the drug responses of many colorectal cancer patients by treating cisplatin. Moreover, the biomarkers were confirmed with the external dataset of transcriptomic for preventing cancer cell lines. Also, this paper investigated biomarkers of somatic mutation and transcriptomic by assessing the concordance between them.
In 2023, Sharma, A., et al. [26] framed unique DL-based technique for predicting patient-specific anticancer drug response utilizing three categories of multi-omics data. Moreover, the suggested DeepInsight-3D technique was based on organized data-to-image transformation, which subsequently enabled the employment of CNNs that were especially resilient to the substantial dimensionality of inputs and maintained the ability to simulate extremely complicated connections between variables.
In 2019, Oyaga-Iriarte E, et al. [24] framed ML based approach to investigate drug toxicities. The proposed work investigated greater degrees of distinct toxicities. Moreover, a greater rate of leukopenia is investigated with high precision. Also, the suggested work accomplished greater precision by comparing various statistical measures.
In 2020, Christensen, T.D., et al. [7] framed an investigation of potential fulvestrant treatment of drug response predictor (DRP) and their outcomes were in accordance with mRNA, which was retrieved and assessed by employing an Affymetrix array. Further, the assessment of exploratory revealed that the performance of DRP was good while employing current DRP biopsies computation.
In 2022, Milad Mousavi, et al. [23] framed a three-dimensional estimation technique for encompassing angiogenesis, which was utilized to investigate VEGF concentration. Also, the proposed work presented endothelial cell angiogenesis and the technique of cancer cell proliferation, which showed précised prediction for drug treatment optimization. Moreover, the features including tumour cell number, tumour volume, and new vessel length were retrieved with ML approaches.
In 2020, Kim, Y., et al. [17] framed a Patient-derived tumour xenograft-based gene expression model (PDXGEM) for identifying the drug responses in cancer patients in accordance with the functionality of drug data and gene expression against Patient-derived xenografts (PDX) approaches. Also, the integration assessment among post-treatment fluctuations in tumour volume and gene expression was applied by recognizing drug-sensitive biomarkers. Thereby, the findings of the presented work provided significantly better performance in drug response prediction in cancer patients.
In 2023, Jing-Bo Zhou, et al. [33] framed a well-suited ML-based drug efficacy prediction system to accomplish better predictive performance. Moreover, the presented approach deployed 2D cell lines as well as 3D tumour slices for validating numerous other statistical metrics. Also, molecular features were integrated to recognize probable biomarkers and identify the relation between relevant molecular characteristics and drug targets with the utilization of interacted protein-protein system.
In 2021, Gustavo Carreno, et al. [4] has developed “on-demand” sustained thermo-responsive hydrogels for anti-cancer drugs. This scheme and concept was recognized by applying the N-isopropyl acrylamide (NIPAM) method. Moreover, the suggested research employed silico and Design of the Experiments (DoE) approaches, which predicted the drug efficiency perfectly when compared with various statistical measures.
In 2023, Sikander et al. [28] developed Deep learning techniques that categorize cancerlectin proteins to work well and quickly. For physical features, such as the protein structure and activities of cancer lectins and other chemicals, we employed a feature extraction model. For the purpose of predicting cancerlectin proteins, we suggest using a computational technique called cancerlectin two-dimensional convolutional neural networks (Lectin2D-CNN). Furthermore, we carry out the cross-validation tests. Furthermore, our paper suggests image-based classification using cancer lectin two-dimensional convolutional neural networks (Lectin2D-CNN). The findings show that the suggested approach, Lectin2D-CNN, outperformed the methods that were examined and obtained excellent accuracy and good specificity for comparison data sets.
In 2023, Chen et al. [6] established new and more efficient method for representing drug data has been proposed with DNN-PNN; this method reduces the drawbacks of high-dimensional discrete data in deep learning by using the product to express the correlation between features. In addition, the framework is improved to lower the model’s temporal complexity. We compared DNN-PNN with its variation DNN-FM, which represents the conventional feature correlation model, the component DNN or PNN alone, and the popular machine learning models through a series of comprehensive tests on the CCLE datasets. It is discovered that DNN-PNN offers notable benefits in stability and convergence speed in addition to its excellent prediction accuracy.
Problem statement
The literature survey reveals the features and challenges of extant approaches and there are only limited approaches regarding anti-cancer drug efficacy prediction. So, we considered relevant existing approaches with various methodologies and studied them for better enhancement of the proposed work. The paper [17] deployed an ML-based approach and attained improvement in the drug prediction responses in patients. Yet, utilizing the assessment of the network to lower the input data biological issues to enhance the performance of ML-based prediction is still a complicated issue. In [31], the DeepInsight-3D technique was deployed to resolve the overlapping problem but still analyzing the model with ingenuity pathway assessment is difficult. Moreover, [24] utilized the ML approach and accomplished greater accuracy and optimal pharmacotherapy. However, the population sample dimension of the model is still limited. In [4], the author used the DRP approach, which most anticipated how fulvestrant would be treated. Even yet, more research is required to fully understand how to apply the predictive fulvestrant in the complete assessment. In [18], though the author used and attained less computational cost with ML, it is difficult to investigate the evolutionary approach to analyze ML-based optimal configuration. In [23], the PDXGEM mechanism was deployed and handled the huge number of gene biomarkers but still implementing the optimal biomarker approach is difficult. The author achieved a higher response rate [26] but it is necessary to investigate the whole human gene for drug combinations is still a complicated problem. Consequently, [7] employed the NIPAM approach and acquired greater therapeutic efficiency. Nonetheless, analyzing the model with more gene datasets is still critical. Lectin2D-CNN [28] has better efficiency. But, more data analysis should be required. DNN [6] model has accurate performance. However, the need to look into every human gene for potential drug combinations remains a challenging issue. Many researchers focus the machine learning approaches for analysing and predicting the drug response but their performance is not better than the deep learning approach. So this paper introduces the UReLU-DCNN approach for analysing the drug response it gives excellent performance than the ML approach. Table 1 describes the feature and limitations of existing schemes.
Reviews on existing schemes
Reviews on existing schemes
Achieves better accuracy for the prediction of anti-cancer drugs. Obtains superior stability rate for enhancing performance. Ensures the reliability of the efficiency of the proposed model.
Proposed model of anti-cancer drug efficacy prediction in colorectal cancer patients via UReLU-DCCN approach
Proposed drug efficacy prediction architecture
Patients with a comparable diagnosis respond variably to anticancer medications due to the inherent intricacy and variety of malignancies, making cancer therapy challenging and refractory. Therefore, it is necessary to take the complete drug efficacy of the patient to recognize the pattern of accurate medication procedure. Figure 1 illustrates the entire framework of proposed anti-cancer drug efficacy prediction in colorectal cancer patients. The input gene data is pre-processed with the Min–Max normalization technique to normalize data. Then an improved entropy-based feature is deployed that induces weight to the entropy to reduce the poor performance of the model. Further, proposes an RFEHML mechanism for selecting the appropriate feature, which uses a hybrid ML model and employs a Gini impurity scheme for ranking the feature that removes the minimal importance score and selects the maximum importance score. Moreover, proposes a hybrid model that comprises two classifiers LSTM and UReLU-DCNN that train the selected features. The UReLU-DCNN model is a modified version of DCNN by including a dropout layer and updated activation function at the fully connected layer. Based on the trained features, the model predicts the anti-cancer drug efficacy in colorectal cancer patients as whether the patient is a responder or non-responder of the drug.

Framework of proposed anti-cancer drug efficacy prediction in colorectal cancer patients via UReLU-DCNN.
Pre-processing is the process of removing (or correcting) and transforming raw data to make it appropriate for feature extraction. Assume that the gene data be
Thereby, this provides a normalized range of data
Extraction of features: Improved entropy-based feature, correlation-based feature, and statistical features
The transformation of pre-processed data into numerical features without losing information is termed feature extraction. It recognizes the discriminating features in data and minimizes the number of redundant data from the pre-processed data
Improved entropy-based feature
The measure of data randomness or uncertainty distribution of data is termed entropy [29]. Entropy is used to gauge the level of uncertainty in a data set. It can be used to quantify the internal information qualities of the signal and to define the uncertainty distribution and complexity properties of the signal. So it is possible to extract the internal features of the data using the entropy characteristic. Consider that a series of elements in pre-processed data as
Equation (2) is improved to reduce the poor performance of the model by including weight in the traditional entropy value. Then the new formulation of entropy can be expressed as in Eq. (3). Here,
The final entropy weight value
Correlation-based feature
The measures of the relationship between two variables are termed as correlation. Correlation-based features are employed to maintain necessary characteristics while getting rid of unnecessary ones. It is able to assess the usefulness of attributes by taking into account both redundancy among features and relevancy between features and class labels. It analyzes the subset of features rather than individual characteristics. This work, employs Pearson correlation [13] for extracting the feature from pre-processed data
Where g and h refer to features and targets, and (
Statistical features
In order to reduce the dimensionality of the pre-processed data
Thus, the statistical features acquired from pre-processed data
Feature selection: Proposed Recursive Feature Elimination with Hybrid Machine Learning (RFEHML) mechanism
From the extracted feature set Fext, the appropriate or optimal features are selected to acquire good performance of deep learning tasks in terms of drug efficacy prediction. This feature selection task needs to be performed since the pre-processed data have too large a set of features, which may cause an overfitting issue and lead to low performance of the task. As we know that there are so many techniques to select the optimal features, this work deploys the RFEHML mechanism. This mechanism is the modified form of RFE [5] since it is difficult to attain high prediction accuracy in RFE and also it requires greater computational power. However, it is one of the prominent techniques of feature selection strategy due to its simplicity of use and configuration. The traditional way of extracting optimal features with the RFE technique can be determined in the following steps:
Step 1: Select an ML model such as a Decision tree.
Step 2: Allocate the number of features.
Step 3: Fit the Decision tree model.
Step 4: Rank the features.
Step 5: Remove the least essential features with the minimal importance score.
Step 6: Follow steps 3–5 until the desired amount of feature set is attained.
This traditional form of optimal feature selection based on the RFE technique is improved by including hybrid ML models as well as ranking the features with the Gini impurity approach. As illustrated in Fig. 2 the innovative improved form of RFE named RFEHML mechanism selects optimal features with the following steps:
Step 1: Select a hybrid ML model called RFEHML, which uses logistic regression and a decision tree classifier. Here, the feature selection is performed with a decision tree using selected features from logistic regression.
Step 2: Allocate the number of features.
Step 3: Fit the hybrid model.
Step 4: Rank the features with the Gini impurity [30] approach that ranks the features based on importance score (i.e. it ranks the features obtained from the RFEHML technique based on importance using Gini impurity). The Gini impurity predicts the feasibility, in which the randomly chosen features could be categorized into impurity while the model is distinct from pure partition. Then the formulation of Gini impurity (GI) can be expressed as in Eq. (10) and Eq. (11). Here, P refers to the total number of features,
Step 5: Remove the least essential features with the minimal importance score.
Step 6: Follow steps 3–5 until the desired amount of feature set is attained.
Thereby, the optimal features selected with the RFEHML mechanism can be represented as

The flow of the proposed RFEHML mechanism for optimal feature selection.
After selecting the optimal features

The process of selected features trained in the proposed hybrid model.
The selected feature set
Step 1: Initialize the filter size and set the filter coefficients (weights) randomly.
Step 2: The convolution between the filter coefficient and
Step 3: The first convolutional layer extracts low-level features that are stacked in the vertical direction. The dimension of the filter for this layer can be [
Step 4: The second convolutional layer performs with the first convolutional layers’ output in addition to filter coefficients. Also, the filter for this layer can be [
Step 5: Similarly, the third convolutional layer repeats a similar procedure and the output layer size can be the [

The internal flow of the convolution layer in this proposed UReLU-DCNN framework.
Equation (14) is improved by updating the activation, which resists the exponential progression in the computation can be expressed as in Eq. (15).

The proposed UReLU-DCNN model architecture diagram.
The selected feature set Fsel is subjected to the LSTM [15] model for anti-cancer drug efficacy prediction. In the recurrent hidden layer, the LSTM comprises notable units termed memory blocks. The self-connected memory cells involved in memory blocks store the temporal state of the network together with notable multiplicative units referred to as gates for regulating the information flow. The LSTM framework comprises two gates namely, input gate and output gate for all memory blocks. The input activations flow to the memory cell is regulated by the input gate; whereas, the output of cell activations to the remaining network is regulated by the output gate. Further, the memory block is appended with a forget gate that measures the cells’ internal state when the input is appended to the cell. Thus, it leverages the cells’ memory by resetting and forgetting. The input gate
The forget gate
The cell activation
The output gate
The cell output activation vector
The output
Thereby, the average of
Table 2 describes the symbols and their descriptions.
Notations and descriptions
Notations and descriptions
Simulation procedure
The proposed prediction model of anti-cancer drug efficiency in patients was implemented in PYTHON. Further, the Python version was “PYTHON 3.7.6” Moreover, the effectiveness of anti-cancer drugs was predicted using the transcriptome dataset [1]. Table 3 describes the system configuration.
System configurations
System configurations
The samples are collected from “billy-kong/ organoid_biomarker_detection”. It consists of a total of 114 samples and in this work, all 114 samples are considered for the prediction purpose. The source considered for predicting the disease is ‘organoid’ and ‘TCGA’ The cancer_type is called as ‘COAD. The testing_pathway_list of anti-cancer prediction is REACTOME. To comprehend the physiopathology and mechanisms underlying human hepatobiliary illnesses, organoids can be employed as disease models. Organoids are perfect models for toxicity tests and medication screening. Organoids produced from patients can be utilized to forecast individual patient reactions to medications and customized care.
Performance comparison
Here, the evaluation was done with regard to Precision, False Positive Rate (FPR), Mathew Correlation Co-efficient(MCC), Accuracy, False Negative Rate (FNR), F-measure, Sensitivity, Negative Predictive Value (NPV) and Specificity. Similarly, we conducted two kinds of comparison the Hybrid + RFEHML is compared with the state-of-art methods, such as Convolutional Neural Network (CNN) [28], Deep Neural Network (DNN) [6], Linear Regression (LR) + Ridge Regression (RR) + Support Vector Regression (SVR)[18] and Backward Stepwise Logistic Regression (BSLR) + C4.5 + Decision tree (DT) + Random Forest (RF) + Support Vector Machine (SVM) [24] and the Hybrid + RFEHML is contrasted to traditional classifiers, including, Random Forest(RF), Support Vector Machine (SVM), Neural Network (NN), Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU).
Comparative analysis of positive metric
The positive metric analysis on Hybrid + RFEHML over the CNN [28], DNN [6], RF, SVM, NN, LSTM, GRU, LR + RR + SVR [18] and BSLR + C4.5 + DT + RF + SVM [24] for the prediction of anti-cancer drug efficiency in patients is displayed in Fig. 6. Moreover, it is analyzed in terms of sensitivity, accuracy, specificity and precision by altering the learning rates (60–90). The main objective of the Hybrid + RFEHML is to boost positive metric ratings while predicting anti-cancer drug efficiency. In a similar way, the Hybrid + RFEHML accomplished better values than the traditional strategies. More particularly, the detection accuracy of the Hybrid + RFEHML scheme is 96.86% at the training rate of 90%, whereas the prior classifiers hold minimal accuracy ratings, including, CNN = 90.34%, DNN = 92.47%, RF = 93.52%, SVM = 90.29%, NN = 92.64%, LSTM = 94.23%, GRU = 90.94%, LR + RR + SVR [18] = 94.86% and BSLR + C4.5 + DT + RF + SVM [24] = 94.94%, correspondingly. Additionally, all the classifiers attained the greatest precision values at the training rate of 80 and 90. Nonetheless, the proposed Hybrid + RFEHML methodology scored higher precision of 96.42% and 97.28%.
Simultaneously, assessing the sensitivity and specificity metric of the Hybrid + RFEHML and traditional schemes is shown in Fig. 6(c) and Fig. 6(d). Here, the sensitivity rate increased steadily as the training percentage increased. Mainly, for the training rate 80, the Hybrid + RFEHML offered a sensitivity of 96.81%, which is greater over the CNN, DNN, RF, SVM, NN, LSTM, GRU, LR + RR + SVR [18] and BSLR + C4.5 + DT + RF + SVM [24]. Likewise, the specificity of the Hybrid + RFEHML scheme is much higher in all the training rates. This explains that Hybrid + RFEHML is significantly more effective at predicting the anti-cancer drug efficiency due to the improved entropy-based feature extraction in the initial stage paves the way for the appropriate feature selection. The UReLU-DCNN and LSTM with proper training of features make the prediction more accurate and precise.

Validation of Hybrid + RFEHML and conventional schemes with regard to positive metrics.

(Continued.)
Figure 7 depicts the analysis of Hybrid + RFEHML contrasted with CNN, DNN, RF, SVM, NN, LSTM, GRU, LR + RR + SVR [18] and BSLR + C4.5 + DT + RF + SVM [24] with regard to negative metric for the prediction of anti-cancer drug efficiency in patients. Here, the findings affirmed that the Hybrid + RFEHML is more efficacious for predicting drug efficiency in patients. Furthermore, the FNR of the Hybrid + RFEHML scheme is 3.19% at the training rate of 80, meanwhile, the RF is CNN is 7.98%, DNN is 8.1%, 7.62%, SVM is 11.61%, NN is 9.16%, LSTM is 7.31%, GRU is 7.09%, LR + RR + SVR [18] is 7.47% and BSLR + C4.5 + DT + RF + SVM [24] is 7.12%, correspondingly. In addition, the lowest FPR is acquired using the Hybrid + RFEHML methodology in all the training rates. Mainly, for the training rate 90, the Hybrid + RFEHML offered the FPR of 4.29%, this is extremely lower than the CNN, DNN, RF, SVM, NN, LSTM, GRU, LR + RR + SVR [18] and BSLR + C4.5 + DT + RF + SVM [24], correspondingly. Overall, it has been found that the model trained using standard methods performs poorly with higher error values, whereas the Hybrid + RFEHML approach, which includes all enhancements and optimal training, significantly decreases the error rate.

Validation of Hybrid + RFEHML and traditional approaches with regard to negative metrics.
Figure 8 illustrates the other metric evaluation on Hybrid + RFEHML and compared with the traditional methodologies for the prediction of anti-cancer drug efficiency in patients. Moreover, the Hybrid + RFEHML offered greater other metric values with accurate prediction of anti-cancer drug efficiency. Additionally, the F-measure of the Hybrid + RFEHML scheme is 95.56%, whilst the existing approaches hold lesser F-measure values, notably, CNN (89.67%), DNN (89.65%), RF (89.35%), SVM (90.26%), NN (89.77%), LSTM (93.31%), GRU (89.21%), LR + RR + SVR [18] (93.06%) and BSLR + C4.5 + DT + RF + SVM [24] (92.23%), correspondingly. Moreover, the training percentage = 80 and the lowest MCC attained by the NN classifier is 76.13%, accompanied by NN (76.13%) and RF (79.68%). However, the Hybrid + RFEHML scheme gained the MCC of 74.43%. Additionally, the NPV of the Hybrid + RFEHML methodology is 96.05% with higher prediction of drug efficiency. As a conclusion, the Hybrid + RFEHML performs better since it can reliably and precisely predict the anti-cancer drug efficiency in patients.

Validation of Hybrid + RFEHML and traditional approaches with regard to other metrics.
The ablation analysis on Hybrid + RFEHML (improved entropy, improved DCNN + LSTM and improved RFE), model with conventional entropy, model with conventional DCNN + LSTM and model with conventional RFE is described in Table 4. In an ablation study, the Hybrid + RFEHML model has been framed many times for improved feature extraction and feature selection along with the IDCNN and LSTM. In this way, the influence of the Hybrid + RFEHML method has been investigated. In general, the experiments were conducted by retaining all the enhancements specifications to the Hybrid + RFEHML scheme. Similarly, the Hybrid + RFEHML generated superior values in all the performance metrics with the exact prediction of anti-cancer drug efficiency. Moreover, the precision of the Hybrid + RFEHML is 0.95%, a model with conventional entropy is 0.91%, a model with conventional DCNN + LSTM is 0.92% and a model with conventional RFE is 0.90%. In addition, the Hybrid + RFEHML generated the FNR = 0.04%,accuracy = 0.94%,NPV = 0.92% and FPR = 0.09%. The improvisation employed in entropy and RFE process aids the Hybrid + RFEHML scheme to get better prediction outcomes. Also, the improvisation made in DCNN with LSTM aided in superior detection for anti-cancer drug efficiency in patients.
Impact on Hybrid + RFEHML, model with conventional entropy, model with conventional DCNN + LSTM and model with conventional RFE
Impact on Hybrid + RFEHML, model with conventional entropy, model with conventional DCNN + LSTM and model with conventional RFE
The statistical study is made to evaluate the stability of the Hybrid + RFEHML approach in determining the effort required to achieve a higher accuracy rate. Table 5 illustrates the statistical analysis of Hybrid + RFEHML and conventional schemes. The metaheuristic techniques are uncertain, and each method is analyzed numerous times to ensure improved estimation. For this, it is examined under five distinct types of statistical metrics, such as Mean, Maximum, Standard Deviation, Median and Minimum. Moreover, the greatest accuracy acquired using the Hybrid + RFEHML approach is 0.97 at the maximum statistical metric, though the CNN, DNN, RF, SVM, NN, LSTM, GRU, LR + RR + SVR [18] and BSLR + C4.5 + DT + RF + SVM [24] offered lower accuracy ratings. Likewise, for most of the statistical metrics, the Hybrid + RFEHML accomplished higher accuracy values than the traditional methodologies. Further, the Hybrid + RFEHML has justified its superlative for predicting the anti-cancer drug efficiency and this is due to the improved entropy-based feature extraction and RFEHML mechanism.
Statistical evaluation on Hybrid + RFEHML and traditional methodologies regarding accuracy metric
Statistical evaluation on Hybrid + RFEHML and traditional methodologies regarding accuracy metric
In this paper, a novel anti-cancer drug efficacy prediction model is developed for colorectal cancer patients. The Min–Max normalization technique is used as a pre-processing step to normalize the input gene data. The model’s subpar performance is then lessened by implementing an enhanced entropy-based feature that gives the entropy weight. Moreover, suggests the RFEHML method, which employs a hybrid machine learning model and a Gini impurity scheme to rank the feature in order to eliminate the feature with the lowest important score and choose the one with the highest importance score. also suggests a hybrid model that trains the chosen features using two classifiers, LSTM and UReLU-DCNN. The dropout layer and updated activation function at the fully connected layer are two modifications made to the DCNN model that make up the UReLU-DCNN model. The detection accuracy of the Hybrid + RFEHML scheme is 96.86% at the training rate of 90%, while existing models such as CNN, DNN, RF, SVM, NN, LSTM, GRU, LR + RR + SVR and BSLR + C4.5 + DT + RF + SVM have least accuracy. Thus the Hybrid + RFEHML performs better since it can reliably and precisely predict the anti-cancer drug efficiency in patients.
Practical implications
The proposed model demonstrates the value of testing time-staggered combination treatments experimentally for their anticancer effects, especially when paired with a mathematical modeling investigation of signaling pathways and responses. These kinds of methods could make it easier to find new therapeutic targets and combinations of drugs that work well together. Any medication that is successful in treating malignant, or cancerous, disease is an anticancer drug, often known as antineoplastic medication. Antimetabolites, hormones, natural products, and alkylating agents are some of the main groups of anticancer medications. They could also make it easier to plan various kinds of clinical trials that investigate how drugs can kill oncogene-addicted cancers by dynamically rewiring signaling networks.
Conclusion
The research work proposes anti-cancer drug efficacy prediction in colorectal cancer patients with the following approaches. The input gene data was pre-processed with the Min–Max normalization technique to normalize data in different scales. Then proposed an improved entropy-based feature that induced weight to the entropy to reduce the complexities characteristics of data. Further, the proposed RFEHML mechanism for selecting the optimal feature utilized a hybrid ML model and employed a Gini impurity scheme for ranking the feature that removed the minimal importance score and selected the maximum importance score. Moreover, proposed a hybrid model that comprises two classifiers LSTM and UReLU-DCNN that trained the selected features. The UReLU-DCNN model was modified by including a dropout layer and an updated activation function at the fully connected layer. The proposed work has better prediction of the anti-cancer drug efficacy in colorectal cancer patients as whether the patient is a responder or non-responder of the drug. However, some of these indicators have not yet been identified, and may be adopted and utilized widely to customize the clinical therapy of this frequent life-threatening cancer. This work will be extended as the medication classes that are making their way into anti-cancer drug – insulin growth factor inhibitors, mTOR inhibitors, and histone deacetylase inhibitors – will be made possible by these tools.
