Abstract
Background
Breast cancer results from an uncontrolled growth of breast tissue. Many methods of diagnosis are using multi-omics data to better understand the complexity of breast cancer.
Objective
The new strategy laid out in this work, called “Hybrid-OmniSeq,” is a deep learning-based multi-omics data analysis technology that uses molecular subtypes of breast cancer gene to increase the precision and effectiveness of breast cancer diagnosis.
Method
For preprocessing, the BC-VM procedure is utilized, and for molecular subtype analysis, the BC-MSA procedure is utilized. The implementation of Deep Neural Network (DNN) technology in conjunction with Sequential Forward Floating Selection (SFFS) and Truncated Singular Value Decomposition (TSVD) entropy enable adaptive learning from multi-omics gene data. Five machine learning classifiers are used for classification purpose. Hybrid-OmniSeq uses a variety of machine learning classifiers in a thorough analytical process to achieve remarkable diagnostic accuracy. Deep Learning-based multi-omics sequential approach was evaluated using METABRIC RNA-seq data sets of intrinsic subtypes of breast cancer.
Results
According to test results, Logistic Regression (LR) had ER (Estrogen Receptor) status values of 94.51%, ER status values of 96.33%, and HER2 (Human Epidermal growth factor Receptor) status values of 92.3%; Random Forest (RF) had ER status values of 93.77%, ER status values of 95.23%, and HER2 status values of 93.4%.
Conclusion
LR and RF increase the cancer detection accuracy for all subtypes when compared to alternative machine learning classifiers or the majority voting method, providing a comprehensive understanding of the underlying causes of breast cancer.
Introduction
Cancer is a group of illness, all related to aberrant cell proliferation. It is noteworthy that certain tumors are not innately malignant; rather, they may be confined and non-metastasizing without displaying the invasive features linked to cancer. Of them, breast cancer is the most common cancer that affects women worldwide. Breast cancer has been recognized as a very varied tumor with various subgroups of people. 1 The Molecular or Genetic information necessary for the development, upkeep, and operation of all living things is stored in deoxyribonucleic acid, or DNA. The primary purpose of DNA is to aid in the creation of proteins, which are essential for executing a wide range of biological functions. 2
Using 39 invasive breast tumors and three normal breast specimens, Perou et al. made significant improvements to the identification of four distinct subtypes such as Luminal A (LumA), Luminal B (LumB), Basal, HER2-positive of breast cancer through gene-expression profiling. 3 Survival analyzes performed on a sub-cohort of patients undergoing uniform treatment for locally advanced breast cancer in a prospective study showed statistical significant differences in patient outcomes between groups. The survival analysis was based on this six-subtype classification, which illuminated the various clinical paths linked to each subtype in this patient group. 4 The researchers hypothesized the possibility of Luminal C (LumC) cancers, a possible third luminal subgroup. Subsequent examinations of a larger dataset, however, did not corroborate the existence of this extra luminal subgroup. 5 Although the total number of genes in the signature has varied throughout investigations, five intrinsic categories have been confirmed in numerous other studies. 6
Based on a study including 13 samples, a unique molecular subtype known as claudin-low was identified in 2007. 7 All five of the intrinsic subtypes such as Luminal A, Luminal B, Erbb2 (HER2-enriched), Basal, Normal, with the exception of the normal-like tumors are successfully linked with an IHC (Immunohistochemistry)-defined subtype. 8 These subtypes each have their own unique biological features and prognostic attributes. Claudin-low cancers are defined by significant enrichment for immune response genes. Further investigation of the characteristic features of claudin-low tumors was carried out using genetically modified mice models and an array of breast cancer cell types. 9 Remarkably, numerous intrinsic molecular subtypes have defined a wider range of subtypes for breast cancer, such as Luminal A (LumA), Luminal B (LumB), Triple-negative, Basal, HER2-positive, and Claudin-Low. 10 Upon the end, during 2015 research work, two noteworthy patterns in the subtyping of breast cancer: either shifting toward more complex and refined groupings or convergent toward primary subtypes. 11
Breast cancer has been identified using an immunohistochemically approach that is based on the assessment of hormone receptor expressions, including those of the Progesterone (PR), Estrogen (ER), and Human Epidermal growth factor Receptor (HER2). 12 One of the first recognized steps in the progression of breast carcinogenesis is the overexpression of HER2. 13 The pathologic stage of the disease, the amount of tumor-containing axillary nodes, the histologic type, and the lack of Progesterone Receptor (PgR) and Estrogen Receptor (ER) are all strongly associated with HER2 amplification. The concept that HER2 amplification is a precursor in human breast carcinogenesis is supported by the data that is available. Moreover, the HER2 status does not change when the disease progresses toward nodal metastases, distant metastasis, or invasive disease. 14
The most important factor in differentiating between breast cancer molecular subtypes is the presence or absence of Estrogen Receptors (ER). 15 The fact that ER + cancers are hormonally highly sensitive, resulting in a longer time to relapse-free survival and overall survival than ER− subtypes, provides evidence for the effectiveness of treatment. As a result, in order to develop the best possible treatment plans, further information on molecular signature profiles and molecular networks is required beyond the ER state. 16 Progesterone Receptor (PR) expression is a useful biomarker for categorizing patients into initially good prognosis groups. With the help of this data, medications that target growth factor receptor pathways, prolonged endocrine therapy, and/or extra adjuvant chemotherapy may be considered, providing a more individualized and possibly advantageous approach to patient care. 17
Spotting research gap
In our Hybrid-OmniSeq breast cancer detection model, we utilize the power of a pre-trained Deep Neural Networks (DNN) in conjunction with the Truncated Singular Value Decomposition Entropy (TSVD Entropy) method and Sequential Forward Floating Selection method (SFFS, Wrapper method) to enable efficient feature selection of parameter ER, HER2 and ER measured by IHC. We incorporate Machine Learning (ML) techniques like as RF, LR, NB (Naïve Bayes) and Gradient Boosting (GB) Classifier in order to thoroughly evaluate Hybrid-OmniSeq's accuracy. Our findings show that RF and LR work better with the DNN than NB and GB, proving the usefulness of combining DL and ML. Spotting the Research Gap and outlining the primary contributions of our paper are summarized as follows:
Presenting a feature selection method that uses the Mutual information technique, the Variance threshold technique, the SFFS (Wrapper approach), the TSVD Entropy (Dimensionality reduction) and for normalizing (filter method). Building an improved model known as “Hybrid-OmniSeq,” which blends Deep Learning (DNN) with Machine Learning methods. The DNN is trained using the Wrapper strategy for optimal subtype feature selection, an iterative learning method, to enhance classification performance. ML classifiers are fed the DNN model training features (predictions) in order to categorize the features and generate higher classification accuracy. The DNN employs iterative learning procedures with a focus on optimum subtype feature selection and enhanced classification performance. Moreover, the Gene Dataset's Dimensionality is decreased using TSVD Entropy.
The rest of the paper is structured as follows: A brief summary of relevant research on breast cancer diagnosis is given in Section 2. Our ‘Hybrid-OmniSeq’ model is described in depth in Section 3, and its experiments and comparisons are presented in Section 4. Section 5 concludes with the presentation of conclusions.
Related work
In the area of breast cancer research, it is essential to keep refining and verifying these predictors such as ER, HER2, PR profiles present in the four intrinsic molecular subtypes in order to create individualized and efficient diagnostic tools. Regarding the subject of the current molecular breast cancer research, this section offers a summary of previous studies, research, and publications.
Raza, A et al. 18 model is implemented with 24 layers and many activation and normalization methods, DeepBreastCancerNet achieves a high classification accuracy of 99.35%. The model's resilience and efficiency were demonstrated when it outperformed other DL models and obtained 99.63% accuracy on a dataset that was made publically available.
R.K.Mondol et al. 19 proposed AFExNet, a two-stage neural network design that combines supervised fine-tuning and unsupervised pre-training. AFExNet depends on Adversarial Auto-Encoder (AAE) principles and is intended to extract features from high-dimensional genetic data. To assess the model's performance, evaluations are conducted through twelve independent supervised classifiers, utilizing a public RNA sequencing dataset of breast cancer samples. Richard Lupa et al. 20 presents the work to predict the ER-status, HER2-status, and PAM50 subtypes of breast cancer using Moanna. There is a stronger association between the subtypes predicted by Moanna and the survival results of patients. Moanna's neural network architecture is purposefully made to combine information from various omics data resolutions, producing strong classification accuracy.
Arooj S et al. 21 undergone a customized AlexNet model is used to investigate the application of transfer learning for breast cancer detection, with good accuracy across four datasets. On Dataset A, 96.7% on Dataset B, 99.1% on Dataset C, and 100% on Dataset A2, the model's maximum accuracy was attained. Additional CNN algorithms and fusion approaches will be used in the future for even further optimization. According to Zakareya, S. et al., 22 early detection is crucial for successful treatment of breast cancer, a prevalent disease. This article provides an innovative deep-learning model that enhances breast cancer diagnosis through granular computing, shortcut connections, and attention mechanisms. GoogLeNet and leftover blocks served the inspiration. The model performed quite well, scoring 93% on ultrasound images and 95% on images of breast histology, when compared to other state-of-the-art models.
D. Sun et al.'s 23 work on breast cancer prediction uses the MDNN, a novel deep learning technique. In contrast to more standard methods like Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR), this method displays an increased degree of efficiency in accuracy level. Precision is improved through lowering the diversity of characteristics, such as genes, through the usage of Minimum Redundancy Maximum Relevance (mRMR).
G.Lopez-Garcia et al. 24 proposed a model for predicting the risk of lung cancer was presented by using Transfer Learning (TL) and Convolutional Neural Networks (CNNs). A high-dimensional dataset's characteristics were extracted using the CNN. This study made use of the TCGA dataset, containing information on around 10,600 samples from distinct cancer types and focuses on the top number of genes with the greatest variation in expression. This study primarily focused on the lung cancer dataset to assess the effectiveness of the suggested model, despite the fact that the dataset encompasses a wide variety of malignancies.
J. Xu et al. 25 under gone a study in lieu of deep neural networks, a unique Deep Flexible Neural Forest (DFNForest) model was investigated for the classification of subtypes in three different malignancies (Lung, Breast, and Glioblastoma multiforme). TCGA RNA-Seq data were utilized by the model to conduct this thorough study. Ahmad J et al. 26 utilized Computer-Aided Diagnosis (CAD) systems supporting the diagnosis, classification, and prognosis of breast cancer have been developed as a result of advancements in AI, notably through the use of big data. With a 99% success rate in identifying and categorizing breast masses and a 95.39% success rate in segmenting lesions, a unique CAD system that made use of deep learning and computer vision techniques proved to be highly accurate.
Arturo Pardo et al. 27 presented a data-driven, nonlinear model that uses optical features obtained from spatial frequency domain imaging data to estimate margins of resected breast cancer samples in real time. Deep neural network models are employed in the process to produce sets of latent embedding's that build connections between the underlying tissue pathologies and optical data signatures. With the use of this novel technique, extensive datasets for several categories may be created, each with millions of samples that almost replicate the spectral and textural characteristics of real patient data. Yang et al. 28 proposed a novel approach suggests high-confidence samples for model training dynamically. A novel multi-omics data integration approach called Multi-Omics Self-Paced Learning (MSPL), is a proposed technique.
Umer et al. 29 developed a system with an impressive 92.7% accuracy rate, this research presents a deep learning-based method for histopathology image-based early breast cancer identification. Rahman et al. 30 work suggests a deep convolutional neural network that achieves 93% accuracy and 98.6% AUC for detection by using U-Net and YOLO to automatically detect and localize breast lesions in mammography pictures. Wang, X et al. 31 achieve high performance with 86.21% accuracy and an AUC of 0.89, this research offers a hybrid deep learning model (CNN-GRU) for the automatic identification of breast IDC (+,−) cancer utilizing entire slide pictures from the PCam Kaggle dataset.
Limitations
Numerous flaws were revealed by the previously described literature review. Listed below are a few of these challenges:
The literature evaluation identifies a number of barriers to breast cancer research. Although molecular subtyping provides useful information about the biology of tumors and their prognosis, it has significant drawbacks in terms of data volume, treatment response prediction, and classification accuracy. Small dataset sizes can impair the accuracy of RNA-seq datasets, which are essential for comprehending gene subtypes and result in decreased performance. Furthermore, acquiring high-quality, diverse data and reducing dimensionality become more challenging tasks as dataset sizes rise. There are other issues with computational requirements and the model's applicability to multi-omics datasets. Moreover, machine learning models are more likely to overfit and require lengthy training periods, which makes it more difficult to boost classification accuracy and predictive dependability.
Proposed model
Hybrid-OmniSeq model
In Hybrid-OmniSeq model, we introduce a new deep learning-based model called Hybrid-OmniSeq, which is intended for multi-omics sequential breast cancer diagnosis. Our model combines numerous omics data formats into a single, cohesive framework. A multi-stage sequential diagnosis strategy is used by the Hybrid-OmniSeq model. It starts with the input (Metabric dataset), followed by data preparation, molecular subtype analysis, an explanation of the Hybrid-OmniSeq model, and outcome analysis. Figure 1 shows the schematic representation of formulated Molecular breast cancer diagnosis using Hybrid-OmniSeq model. Essential components of the Hybrid-OmniSeq model consist of Input (METABRIC Gene Dataset), Data Pre-processing, Molecular Subtype Analysis, Hybrid-OmniSeq Model, Result Analysis.

A schematic representation of formulated Molecular breast cancer diagnosis using Hybrid-OmniSeq model.
A multi-feature fusion-based breast cancer subtype classification framework was trained and tested using the METABRIC datasets. Two technologies were used in the analysis: RNA sequencing utilizing the Illumina HiSeq apparatus for the TCGA transcriptome profile and microarrays for gene expression data from Illumina HT-12 v3 microarray experiments. Based on gene expression profiles, these datasets aid in the classification of breast cancer subtypes. An essential step in getting data ready for the Hybrid-OmniSeq model is pre-processing. Duplicate entries and samples are removed to preserve data integrity. Preprocessing using the Breast Cancer-Variance Measure (BC-VM) method. Filter techniques are used at every stage of the preprocessing phase of the BC-VM algorithm. First, omics data are subjected to quality control checks using the variance threshold approach to exclude noisy or low-quality measurements. Second, categorical variables are encoded using mutual information techniques like label encoding in order to represent categorical data quantitatively. In order to account for differences in feature distributions and scales, omics data must be normalized. By completing these pre-processing processes, the Hybrid-OmniSeq model may effectively prepare multi-omics data for in-depth analysis and precise diagnosis in the setting of breast cancer.
Filter method-variance threshold
The target variable is obtained from Metabric dataset with categorical features, and the encoding approach is used to transform categorical variables into numerical form. To determine the Mutual Information scores between every characteristic and the target variable, Mutual Information is computed. The features’ relative importance to the target variable is indicated by their Mutual Information scores, which are used to rank them.
BC-VM algorithm description
The BC-VM (Breast Cancer-Variance Measure) technique, which combines Mutual information for label encoding with Variance Threshold, does preprocessing. In Algorithm1, Variance_threshold (Y, T, i) demonstrates the variance and returns the output Y_Selected. Y_Selected is a feature matrix containing the features that have been chosen (V >= T) is given as an input to Label_Encoding (Y_Selected, x, i). Y_Selected is categorical variables and it should be label encoded. Label_Encoding (Y_Selected, x, i) procedure encodes the feature matrix of categorical variables and returns the encoded features L (L1… Ln), for the computation of mutual information scores and variance threshold computation. L is Feature list ranked according to Mutual Information scores.
Molecular subtype analysis
Breast Cancer Molecular subtype analysis (BC-MSA algorithm) in the context of gene subtype analysis, refers to the classification of subtypes by performing feature selection, standardization of selected feature and dimensionality reduction of standardized feature. BC-MSA algorithm performs these process to obtain the best feature as an output of molecular subtype analysis. We employ SFFS starts from the empty set in our model. As long as the objective function increases, SFFS executes backward steps following each forward step.
Feature scaling
When working with features that have different magnitudes, units and ranges, feature scaling is an essential normalization step in machine learning. The Euclidean distance between data points is a key component of many machine learning methods, and features with differing sizes can cause biased or ineffective model training. By bringing all features to the same magnitude, scaling makes them comparable and enhances the functionality of machine learning models.
A matrix Ag ∈Rm×n can be factorized into three matrices to find its SVD (Singular Value Decomposition). It might be expressed as
Compared to standard SVDs, which are used to decrease the dimensionality of the matrix, truncated SVDs (TSVDs) are distinct, represented in equation 4. A factorization with a number of columns equal to the desired truncation is produced using TSVD.
BC-MSA Algorithm is the combined algorithm of SFFS, Standardising the feature and Truncated SVD. In order to determine whether eliminating the poorest feature will raise the score K, SFFS_Procedure (A) in algorithm 2 repeatedly chooses the best feature to add to the set Z. Z is chosen as a best feature of Molecular subtype .’Z’ from SFFS_Procedure, ‘J’ is certain features from Z for training and ‘i’ iterative feature is passed as an input to Standardize_ Procedure (J, Z, i) procedure in algorithm 2 for standardising the best feature of Molecular subtype dataset. Equation 2 is computed to obtain Zg as an output. To assess model performance, the dataset is split into training, and testing sets. The spited data X1, X2 and target variable (minimum cumulative explained variance required (e.g., 0.75 for 75%)) is given as an input to TSVD_Procedure. Then, TSVD_Procedure (X1, X2, Target) shows the procedure for TSVD used in molecular subtype feature extraction.
The initialization of TruncatedSVD is the first phase. Fit and Transform Training Data is the second phase. Determines the Explained Variance Ratio (EVR) in the third phase. The computation of Cumulative Explained Variance (CEV) is the fourth phase. Fifth, at least 75% of the variance can be explained by the number of components. Finally, Converting Test and Training Data into Selected Components; if necessary, output is then printed.
Hybrid-OmniSeq model
Our model utilizes pre-processed input to derive representation using a Deep Neural Network (DNN) consisting of six layers of interconnected neurons. By integrating the DNN with standard Machine Learning (ML) simulations, a hybrid architecture has been developed that strengthens the precision of determining molecular subtypes for breast cancer prediction. The DNN's neurons have been optimized using the training data via the Adam optimizer. ML classifiers use the predictions made by the DNN as feature inputs. To find the most efficient combination to distinguish breast cancer subtypes, we compare the accuracy metrics of five machine learning classifiers.
Deep Neural Networks (DNNs) are integrated with conventional machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), Naive Bayes (NB), and Logistic Regression (LR), by means of the Hybrid-OmniSeq Network. In order to produce a more adaptable and efficient model for the categorization of breast cancer based on molecular subtype biomarkers, this hybrid technique combines the capabilities of deep learning and classical machine learning. For structured or lower-dimensional data, classic machine learning models have been widely applied; however, Hybrid-OmniSeq refers to the combination of these models with DNNs to improve prediction accuracy by using the advantages of both approaches.
The hybrid technique enables DNNs to extract hierarchical relationships and complicated patterns from raw sequencing data in the context of genomic sequencing data (mRNA-seq). By operating on a more comprehensible collection of variables, techniques such as Random Forest and Logistic Regression are used on top of these deep-learned features to improve classification accuracy. This hybrid technique aims to handle various parts of the data efficiently. Important components consist of Ensemble Learning: combines various models to increase overall accuracy and generalization, OmniSeq Hybridization: Oversees intricate feature extraction and categorization, Sequential Modeling: Captures intricate linkages and optimizes feature selection.
This approach is especially well-suited for applications in molecular subtype classification and breast cancer diagnosis because of its advantages, which include enhanced generalization, adaptability, and better feature extraction utilizing the BC-VM and BC-MSA algorithms.
Our hybrid model integrates classic ML methods with DNNs to improve diagnostic resilience and accuracy across different applications and domains. The Hybrid-OmniSeq model's organizational layout is described in Table 1, and its graphic illustration is shown in Figure 2(i). Figure 2(ii) displays the spread of breast cancer subtypes in the METABRIC database.

(i). Graphical illustration of the Hybrid-OmniSeq model. (ii). The spread of breast cancer subtypes in the METABRIC database.
Hybrid-OmniSeq model's organizational layout.
By integrating DNNs with conventional machine learning models, our model utilizes the best features of each method to successfully tackle a variety of challenging machine learning problems related to the molecular subtype of breast cancer.
Through the integration of the Hybrid-OmniSeq model into Comprehensive Genomic Profiling (CGP) platforms, gene expression variations can be more precisely detected through RNA sequencing, leading to improved diagnostic accuracy and more accurate treatment decisions, such as immunotherapies or targeted therapies. By merging transcriptomic and genomic data, it can be easily integrated into current workflows and produces complete, standardized outputs. The approach better finds actionable mutations, which helps with therapy selection and side effect reduction in terms of patient outcomes. It enhances the choice of immunotherapy, enables prompt changes, and finds treatment resistance early. Liquid biopsies allow for real-time monitoring of cancer progression and therapy response, allowing for more dynamic treatment plans. It also improves prognostic tools for better risk classification.
Result analysis
In order to evaluate the complex nature of breast cancer through deep learning-driven multi-omics sequential diagnosis, a Hybrid-OmniSeq model's outcome analysis typically requires an exhaustive assessment utilizing many performance measures. Criteria for evaluation including accuracy, recall, precision, ROC values, and F1-score are used to gauge how well the model performs in identifying breast cancer subtypes and enhancing overall diagnostic performance.
We assess the performance of our hybrid model with traditional machine learning methods that use deep neural networks (DNNs). This comparison demonstrates how the hybrid approach raises diagnostic accuracy and molecular subtype categorization. The following is a mathematical description of the evaluation parameters.
Measures of confusion for BC categorization using the Hybrid-OmniSeq model.
Measures of confusion for BC categorization using the Hybrid-OmniSeq model.
Adapting the Hybrid-OmniSeq model to the dataset through statistical analysis
Statistics prove that the Hybrid-OmniSeq model yields trustworthy results. Figures 3 and 4 show both frequency and value distributions. Study topics include the distribution of ER status as measured by IHC, HER2 status based on illness state, and overall survival status.

Frequencies for every row in the distribution analysis.

Equal and unequal variance statistics.
Figure 5 shows a bar graph that illustrates the frequency distribution of the following: HER2 status, overall survival status, ER status, and ER as determined by IHC. For receptor status, each bar indicates a range of positive and negative; for overall survival status, it reflects a range of living and deceased. The number of gene data points inside the specified range is indicated by its height.

Distribution analysis.
In Distribution, frequencies for every row of analysis is shown in Figure 3. The overall survival status frequency distribution in our dataset shows that 42.251% of the individuals are living and 57.749% of the individuals are deceased; the ER Status frequency distribution shows that 22.203% of the individuals are negative and 77.797% of the individuals are positive; the ER Status measured by IHC frequency distribution shows that 22.552% of the individuals are negative and 77.448% of the individuals are positive; and the HER2 Status frequency distribution shows that 87.227% of the individuals are negative and 12.773% of the individuals are positive; these findings are based on our dataset.
With through analysis of these outputs, the central tendency and variability of the gene data in Metabric dataset is determined, as well as whether the receptors status and overall survival status are regularly distributed and whether any outliers exist. With the use of this research, our Hybrid-OmniSeq cohort's survival status and receptor status distributions and their frequencies are well understood.
The range of possible true parameter values based on a particular level of confidence is referred to as “confidence intervals” in statistics. With a particular degree of confidence (95% in this case), it provides information about the accuracy and consistency of the calculated proportion. Table 3 shows the Confidence Intervals for HER2 status depending on diseased state, ER status, ER status by IHC, and overall survival status. “Level” stands for ER status, ER status according to IHC, HER2 status, and overall survival status across many categories. “Count” shows how many observations there are in each level or category. “Lower CI” refers to the confidence interval's lower bound for the given proportion of the value. The upper bound of the confidence interval for the given value's proportion is denoted by the term “upper CI.""1-Alpha” represents the interval's confidence level (95% confidence interval). Our model evaluates the accuracy and consistency of the estimated proportions for ER status, ER status by IHC, HER2 status, and overall survival status values in the dataset by looking at these intervals.
Confidence interval computation (95%).
Statistics of Receptors in Connection with Overall Survival Status is shown in Table 4. We might consider the finding to be statistically significant, implying a potential link, because the left tailed values of ER status (0.04109), ER measured by IHC (0.0453), and HER2 status (0.0462) are just below the standard significance level of 0.05. We can consider the result to be statistically significant, implying a potential link, since the right-tailed values of the ER status (0.0115), ER measured by IHC (0.01773), and HER2 status (0.0399) are below the standard significance level of 0.05.We can consider the finding to be statistically significant, implying a potential link, since the 2-tailed values of the ER status (0.022), ER measured by IHC (0.0322), and HER2 status (0.0484) are below the standard significance level of 0.05.
Receptor statistics in relation to overall survival status.
By measuring the percentage of the dependent variable's variation that can be predicted from the independent variable(s), Rsquare (Coefficient of Determination) is performed. The model corresponding to ER status accounts for 0.7890 (78.90%) of the response variability. Using the model for ER status by IHC, 0.6134 (61.34%) of the response variability can be explained. The model pertaining to HER2 status accounts for 0.4975, or 49.75%, of the response's variability. Though significantly higher than Rsquare, the adjusted Rsquare for ER status (0.8284), ER status by IHC (0.6540), and HER2 status (0.5097) all show good fits. The Rsquare value for the number of predictors in the model is adjusted using the term “Adjusted Rsquare,” or Adj Rsquare. When models with varying numbers of predictors are compared, it offers a more precise metric.
The variance between group means divided by the variance within the groups is known as the F-ratio. It is employed to evaluate the model's overall importance. Since the ER status F-ratio of 1.5631 is greater than the crucial value derived from the F distribution, it suggests a significant variation in group averages. There is more disparity between the means as indicated by the F-ratio of 1.9934 for ER status by IHC. The HER2 status F-ratio is 2.9223. A substantial difference between these receptors averages is indicated if this value is greater than the critical value derived from the F distribution. To ascertain whether there are statistically significant differences, Oneway Analysis of Variance (ANOVA) analyses the means across various groups. For both positive and negative scenarios, the means for Oneway Anova are calculated.
Consequently, in the Fisher's Exact Test, receptor status p-values that are less than the standard significance level (0.05) show statistically significant correlations by Hybrid-OmniSeq model, indicating that it is improbable that the observed distributions happened by chance. The Hybrid-OmniSeq model fit of receptors is good, as seen by their comparatively high Rsquare and Adjusted Rsquare. ER status, ER status by IHC, and HER2 status based on overall survival status may differ significantly between group means, according to the F Ratios (ANOVA) in Hybrid-OmniSeq model. Pooled t-test and t-test statistics are displayed in Figure 4. All the statistical analysis done by using jump software 17.2. A One-Way Analysis of the Total Survival Status by Receptor Type is presented in Figure 6.

Oneway analysis of overall survival status values by receptors (i) ER status (ii) ER status measured by IHC (ii) HER2 status.
By offering a more thorough molecular knowledge of diseases like cancer, the Hybrid-OmniSeq model has the potential to greatly enhance patient outcomes and diagnostic precision. Its incorporation into clinical procedures may result in more prompt, efficient, and individualized therapies, thereby improving patients’ quality of life and chances of survival. Sara Reis 32 (2017) proposed a method that achieved 84% classification accuracy in distinguishing between mature and immature stroma from histological images of breast cancer. XIN FENG 33 (2019) suggested the XGBoost (XGB) algorithm and demonstrated the best overall prediction performance for certain subtypes like HER2 and LBP, with accuracies reaching up to 87% for triple-negative (TN) subtype patients. Nan Wu 34 (2020) developed a deep neural network (DNN) for breast cancer screening, achieving an AUC of 89.5% for detecting the presence of cancer in mammograms. The significance of choosing a quality feature through the application of the BC-VM and BC-MSA algorithms was further illustrated by our suggested approach, which attained optimal accuracy through the DNN hybrid combination of LR and RF.
Tables 5, 6, and 7 provide more information about Recall, F1 Score, and Precision for both positive and negative examples. Table 8 displays the AUC values for LR and RF in the Hybrid-OmniSeq model. The RF is 84.19%, the ER Status by IHC is 93.24%, and the ER status is 92.21% for HER2 status; the AUC Score is 81.81%, the ER Status by IHC is 94.48%, and the ER status is 92.69%. In Figure 7, the accuracy of the Hybrid-OmniSeq model is shown. The accuracy findings, which are displayed in Table 9, demonstrate how well the Hybrid-OmniSeq model performs when DNN and ML approaches are combined in the diagnosis of breast cancer. Figure 8, 9 shows Precision and Recall values Visually.
Recall values for Hybrid-OmniSeq model.
Recall values for Hybrid-OmniSeq model.
F1-Score for Hybrid-OmniSeq model.
Precision values for Hybrid-OmniSeq model.
Score on AUC for Hybrid-OmniSeq model.

Accuracy visualization for the Hybrid-OmniSeq model.
Accuracy for Hybrid-OmniSeq model.

Recall values for ER status, ER by IHC status, and HER2 status visualized. The following cases are represented: (a) is the negative ER status case; (b) is the negative ER by IHC status case; (c) is the negative HER2 status case; (d) is the positive ER status case; (e) is the positive ER by IHC status case; and (f) is the positive HER2 status case.

Precision values for ER status, ER by IHC status, and HER2 status visualized. The following cases are represented: (a) is the negative ER status case; (b) is the negative ER by IHC status case; (c) is the negative HER2 status case; (d) is the positive ER status case; (e) is the positive ER by IHC status case; and (f) is the positive HER2 status case.
The time needed for different phases of the model development process, such as data preprocessing using the BC-VM algorithm, Molecular Subtype Analysis using the BC-MSA algorithm, Hybrid-OmniSeq model evaluation, and result analysis, is assessed in the computational time analysis of the Hybrid-OmniSeq model.
Figure 10 shows a graphic representation of the computational time analysis of the Hybrid-OmniSeq model using the RF, LR, GB, SVM, and NB classifiers in terms of performance time. The LR model finished in 117 milliseconds, which was the fastest processing time. The last four finishers in order of milliseconds are NB (125), SVM (128), GB (129), and RF (119). The computational effectiveness of the various classifiers inside the Hybrid-OmniSeq model framework is made clear by this study.

Computational time analysis of Hybrid-OmniSeq model.
Influence of BC-VM and BC-MSA algorithm
In order to determine the origin of the efficacy in the Hybrid-OmniSeq network, the ablation study would involve systematically removing or modifying different phases of the model. Multiple experiments are carried out. This evaluates the impact of different algorithms on the effectiveness of the model. Figure 7 compares the accuracy of the Hybrid-OmniSeq Model. By removing unnecessary or noisy features, DNN forecasts significantly improve the ability of ML classifiers to withstand data noise, producing predictions that are more accurate and dependable. When using DNN predictions (forecast) in conjunction with ML classifiers, accuracy is frequently higher than when using ML classifiers alone. improving feature representation through the application of the BC-VM algorithm; boosting accuracy and generalization through the use of ensemble learning techniques in molecular subtype analysis; and efficiently managing non-linearity through the application of the BC-MSA algorithm, which facilitates the creation of DNN predictions and their integration with ML classifiers.
Train and evaluate the complete Hybrid-OmniSeq network using DNN as a feature extractor and traditional ML models (GB, RF, LR, NB, and SVM) for final classification. The baseline performance metrics are Record in all cases. By combining DNN and LR, the suggested method was able to achieve and exhibit an accuracy of 92.03%, 94.51%, and 96.33% in HER2 status, ER status, and ER status by IHC. Likewise, DNN with RF showed and attained 93.04%, 93.77%, and 95.23% accuracy in ER status, HER2 status, and ER status by IHC.A thorough analysis of the specific molecular biomarker that raises the risk of cancer should serve as the basis for determining whether to use the BC-VM and BC-MSA algorithms to combine DNN forecasts with ML classifiers.
A detailed comparison of classifier performance with and without DNN forecasts is provided in Table 10. The accuracy of the Hybrid-OmniSeq Model with and without predictions is contrasted in Figure 11. Thus, the ablation research of the Hybrid-OmniSeq model provides a systematic means to understand the contribution of each model instance. As a result, the network performs better when predicting and classifying different subtypes of breast cancer.
Accuracy comparison between presence (with) and absence (without) of DNN forecast In Hybrid-OmniSeq model.
Accuracy comparison between presence (with) and absence (without) of DNN forecast In Hybrid-OmniSeq model.
We have conducted a comparative study comparing the state-of-the-art methods and our suggested Hybrid-OmniSeq model. Table 11 summarizes the dataset's specifics, feature extraction techniques, classifiers, and accuracy for molecular subtypes of breast cancer.
Classical method comparison
Here is a Methodical (systematic) comparison of some of the most popular open-source state-of-the-art (SOTA) techniques for molecular subtype biomarker-based breast cancer categorization with the Hybrid-OmniSeq model showed in Table 12.

Graphical illustration comparing the Hybrid-OmniSeq Model’s performance with and without DNN forecasting.
Comparison of the proposed model with the state-of-the-art methods for accuracy.
The Hybrid-OmniSeq network is unique in that it combines classic machine learning models with deep feature extraction through DNNs, providing reliable and flexible performance on intricate genomic and clinical datasets. The Hybrid-OmniSeq methodology offers a more complete answer for molecular subtypes biomarker classification jobs like breast cancer molecular subtype classification, especially when working with heterogeneous data sources, even though open-source SOTA approaches like CNNs perform well in some cases. The enhanced generalization, flexibility, and feature extraction capabilities of the hybrid technique justify the computational efficiency trade-off.
The model's capacity to generalize to new data is restricted by overfitting, which may result in subpar performance in practical situations. The scalability of the suggested method is additionally hampered by issues with data quality, high computing requirements, and limited generalizability to other malignancies or multi-omics datasets. It is essential but challenging to adjust deep learning models to dynamic changes in gene expression. The method of combining DNN forecasts with ML classifiers necessitates substantial preprocessing and adds complexity. Furthermore, it can be expensive to develop models for big, real-time datasets, and improper regularization or validation can cause overfitting, which lowers prediction accuracy.
Classical method comparison.
Classical method comparison.
This work presents a hybrid Deep Learning model that aims to improve diagnostic accuracy on METABRIC dataset, in order to execute good healthcare outcomes of patients and advance breast cancer research causing by gene. Simplifying the model's complexity, enhancing accuracy and investigating its diagnostic potential with comparison of results are the main goals. Through the use of neural Networks and Machine Learning techniques in conjunction with Sequential Forward Floating Selection (SFFS) and Truncated Singular Value Decomposition (TSVD), our new framework, named “Hybrid-OmniSeq,” progressively integrates molecular subtypes of gene data while revealing innate temporal correlations within the multi-omics domain. In the age of focused breast cancer research, the novel Hybrid-OmniSeq technique advances our understanding of molecular subtypes and establishes the groundwork for customized treatment strategies that will ultimately improve patient outcomes. Furthermore, our method breaks through conventional constraints by simultaneously predicting cancer subtypes and finding critical multi-omics signals during integration. Future research into the diagnosis or prognosis of tuberculous pleural effusion, the differentiation of benign from malignant thyroid nodules, early Parkinson's disease detection, and the prediction of RNA secondary structure could all benefit from the implementation of this suggested method. These additions highlight the Hybrid-OmniSeq framework's adaptability and potential influence in a variety of medical fields.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
