Abstract
Life-inspired algorithms offer robust tools to tackle intricate classification challenges by applying principles derived from natural systems. This study presents the Camel-Support Vector Machine (SVM) predictive model aimed at categorizing patients having breast cancer into distinct subgroups attributed on clinical and pathological characteristics. The proposed model can also assess the likelihood of recurrence or non-recurrence events through an analysis of clinicopathological parameters and differentiate breast cancer patients by evaluating digitized images of tumor masses. The Camel-SVM predictive model was developed through a three-step process: In the first step, the Camel algorithm was adopted to optimize the SVM hyperparameters, aiming to achieve the most effective configuration. Secondly, the Camel algorithm was further applied for identifying a subset of pertinent features. In the third step, the selected features were used by the hyperparameter-tuned SVM classifier to identify breast cancer subgroups. The Camel-SVM hybrid model exhibited superior classification performance than four other established hybrid life-inspired models when evaluated across five different datasets of hospitals patients undergoing treatment for breast cancer. Thus, the newly created life-inspired hybrid model can be used as an alternate diagnostic tool that can identify the growing complexity of breast cancer for various patient cohorts.
Introduction
Computational techniques, such as biologically-inspired algorithms—also known as “life-inspired algorithms”, leverage concepts found in biological systems to address challenging optimization problems. Natural processes such as evolution, cooperative strategies, and adaptive behaviours provide as inspiration for these algorithms. Life-inspired algorithms (LIA) provide robust and efficient solutions to a broad spectrum of optimization problems across different fields, leveraging the principles and strategies observed in natural phenomena. Life inspired algorithms bear differences from nature inspired algorithms (NIA). 1 The natural occurrences and processes that can be discovered in the larger natural world serve as inspiration for NIA. This can apply to physical, ecological, and biological systems as well. Swarm intelligence, 2 evolutionary algorithms, 3 and optimization strategies based on physical principles like thermodynamics or gravitational forces are a few examples. In contrast, LIA focuses precisely on mimicking or recreating biological processes. These algorithms frequently take inspiration from biological systems, such as neural networks (which mimic brain function), 4 ant colony optimization (which mimics behaviours of ant colonies), 5 and genetic algorithms (which mimic evolution). 6 Over time, these algorithms have developed into diverse variants and hybrid methodologies that integrate various biological concepts with computational techniques. Their continual evolution broadens their usefulness and efficacy in tackling diverse optimization problems. Furthermore, LIA play a significant role in finding solutions of classification problems by applying ideas from biological systems to boost the accuracy and efficiency of classification models. These facts served as the impetus for this investigation, which sought to separate breast cancer patients into discrete subgroups based on their pathological and clinical traits.
LIA provide a robust framework for feature selection by utilizing principles derived from biological systems. These algorithms adeptly explore the search space, assess feature relevance, and determine the optimal subsets of features. By mimicking natural processes, they enhance the selection process, ensuring comprehensive exploration and precise evaluation. Recently, they have shown a great deal of interest in feature selection field because of their proficiency to search globally. To enhance the quality of feature sets, feature selection 7 has been widely applied across numerous ML applications, especially classification. A summary of research employing evolutionary computation for feature selection in classification was provided by De la Iglesia et al. 8 Al-Thanoon et al. 9 explored various metaheuristic methods like binary bat algorithm, grey wolf algorithm, whale algorithm for tackling the challenges of high-dimensional feature selection. Al-Tashi 10 presented an innovative feature selection strategy employing Grey Wolf Optimization to enhance the relevance of selected features. In Kishore et al. 11 extracted three distinct feature subsets from pre-processed fundus images using life-inspired algorithms combined with correlation-based feature selection (CFS). Thahe et al. 12 presented a robust feature selection method built on a Boolean version of Particle Swarm Optimization (BPSO), which was further enhanced by incorporating Evolutionary Population Dynamics (EPD). The outcomes outperformed those of other life-inspired algorithms. Hashim et al. 13 presented a highly effective adaptive-mutated Coati optimization technique designed for feature selection and global optimization. Malik et al. 14 developed binary Capuchin Search Algorithm (CSA), known as BCSA, aimed to determine the most effective combination of features for selection. Yadav et al. 15 presented a hybrid feature selection framework that combines statistical techniques with soft-computing intelligence to enhance classification accuracy and minimize computational overhead. Bansal et al. 16 introduces a feature selection–based classification framework that integrates the Water Wave optimization algorithm with machine learning classifiers to improve accuracy and reduce computational time. The use of multi-objective particle swarm optimization (MOPSO) and NSGA-III for feature selection in digital mammography by generating Pareto-optimal solutions for conflicting objectives was explored in. 17 Recent literature survey reveals that the researchers have been actively involved in the development and adaptation of new algorithms, the hybridization of multiple algorithms, and the refinement of existing algorithms by incorporating additional operators to address the feature selection problem. This inspired us to use hybridization of multiple algorithms, such as the hybrid Camel algorithm, for selecting features in the breast cancer classification task.
LIA are pivotal for the hyperparameter tuning of ML models, significantly enhancing their performance and efficiency. These algorithms are versatile and can be applied across various ML models. Whether optimizing SVM, NN or decision trees, life-inspired algorithms can adjust their search strategies to align with various model architectures and requirements. Algorithms that draw inspiration from life are effective at exploring hyperparameter space. This exploration helps in identifying promising regions where optimal hyperparameters may reside. Algorithms viz. particle swarm optimization (PSO) or ant colony optimization (ACO) excel in in tasks involving global optimization. They can effectively search through a large and complex hyperparameter space to find configurations that improve model performance metrics, such as accuracy or loss. LIA can automate the process of hyperparameter tuning, reducing the need for manual intervention. Hyperparameter tuning typically demands substantial computational resources and time. LIA enhance resource utilization by focusing computational efforts on the most promising hyperparameter configurations, thus speeding up the tuning process. LIA can complement traditional optimization methods, such as grid search or random search, by providing a more guided and efficient exploration of the hyperparameter space. By tuning hyperparameters effectively, LIA aids in reducing overfitting and boosting the generalization capacity of machine learning models. They promote finding hyperparameters that enhance model robustness across different datasets and real-world scenarios. Shankarappa et al. 18 put forward two hybrid bio-inspired optimization approaches aimed at fine-tuning hyperparameters for recognizing student activities in an online examination setting. Lius Carlos et al. 19 examine two Estimation Distribution Algorithms for the hyperparameter tuning of SVM and demonstrated that these algorithms retain the effectiveness of exhaustive strategies in identifying optimal hyperparameters. Godínez-Bautista et al. 20 pointed out that bio-inspired algorithms offer greater efficiency for hyperparameter tuning of SVM in numerous classification problems and illustrated that the effectiveness of these algorithms is statistically comparable. Conversely, the No Free Lunch theorem (NFL) 21 asserts that across all types of optimization problems, no single algorithm can consistently excel beyond all others in every conceivable situation. This theorem highlights the context-dependent nature of algorithmic performance, indicating that the effectiveness of an algorithm is inherently tied to the specific characteristics of the problem it addresses. Consequently, an algorithm that performs exceptionally well in one scenario may not produce the same level of results in another scenario. This emphasizes the significance of selecting and tailoring algorithms to fit the distinct requirements and attributes of each optimization problem. Therefore, the creation of new meta-heuristic, life-inspired algorithm is crucial for tackling specialized optimization problems and boosting the effectiveness of existing algorithms.
In recent years, there has been a notable rise in the adoption of life-inspired methods for automated detection of breast cancer, reflecting their growing importance and effectiveness in medical diagnostics.22–24 This study introduces the Camel-SVM predictive model, designed to categorize breast cancer patients into distinct subgroups based on clinical and pathological characteristics obtained from oncological or tertiary care institutions. Additionally, the proposed model can assess the likelihood of recurrence or non-recurrence events by analysing clinicopathological parameters. It can also differentiate breast cancer patients by evaluating digitized images of tumor masses, which highlight the characteristics of cell nuclei. The Camel Algorithm (CA) with travelling behaviour, a life-inspired meta-heuristic algorithm was first introduced in 2016 by MK Ibrahim and RS Ali. 25 Camels are known for their ability to survive and navigate in harsh desert conditions, where resources are scarce, and the environment is challenging. The Camel traveling behaviour algorithm is unique because of its inspiration from the adaptive and efficient behaviour of camels. Its strengths lie in its dynamic adaptability, balanced exploration-exploitation, and robustness in handling complex, high-dimensional, and dynamic problems. These characteristics make Camel algorithm a strong contender among bio-inspired algorithms, particularly in scenarios where other algorithms might struggle with convergence, diversity, or adaptability.
This study offers significant contributions of categorizing breast cancer subtypes, as outlined below: Highlighted the challenges associated with breast cancer detection, particularly the limitations of traditional imaging techniques and the urgent need for automated solutions. Designed a hybrid Camel-SVM predictive model that works in three steps: In the first step, the Camel algorithm with traveling behaviour was harnessed to optimize the hyperparameters of SVM, with the goal of identifying the most effective combination. Secondly, the Camel algorithm was further applied to extract a subset of pertinent clinicopathological features. In the third step, the selected features were used by the hyperparameter-tuned SVM classifier to identify breast cancer subgroups. An evaluation was conducted on the suggested Camel-SVM model's classification performance against four other well-established hybrid life-inspired models: DE-SVM, GW-SVM, PSO-SVM, and GA-SVM. The Camel-SVM hybrid model exhibited superior classification performance than other models when evaluated across five different hospital-based patient datasets being screened for breast cancer. The hybrid models were assessed for the count of selected features, best fitness value, execution time, ROC curve and precision-recall curve. Thus, the newly created life-inspired hybrid model can be deployed as an alternate diagnostic tool that can recognize the increasing intricacy of breast cancer subtypes.
The rest of the article is structured as follows: Section 2 provides a background on the research area highlighting the challenges associated with breast cancer detection. Section 3 outlines the methodology and proposed model used. In Section 4, covers the results obtained, and their analysis. Section 5 presents the discussion of key findings. Lastly, Section 6 offers conclusions.
Related work
Globally, breast cancer is the most prevalent cancer diagnosis among women. It develops when abnormal cells in the breast proliferate rapidly, resulting in the formation of a lump or tumor. The global prevalence of breast cancer has surged rapidly, and estimates for 2030 indicate a substantial increase in cases, especially in developing countries. 26 Early diagnosis and appropriate treatment are crucial, particularly for enhancing the chance of patient survival and recovery. 27 Literature review reveals that numerous imaging techniques are utilized to support early breast cancer detection, with mammography widely recognized as the standard method for diagnosis.28,29 Detecting, localizing, and grading breast lesions in mammography are often done manually, which takes more time and heavily reliant on the radiologist's expertise and fatigue levels. Moreover, the substantial quantity of mammography images generated every day places a greater burden on radiologists and raises the likelihood of misdiagnosis. Consequently, developing computer-aided diagnosis (CAD) systems can significantly reduce radiologists’ effort while improving diagnosis accuracy.
CAD supports radiologists in differentiating normal tissue from abnormal tissue, and aids in diagnosing various pathological conditions. Additionally, the variations in shape, texture of calcifications and masses, along with the presence of blood vessels and muscle fibres, can hinder precise detection. This limitation emphasizes the need for an innovative computational model to achieve more precise breast cancer classification. Another disadvantage of mammography is that it exposes patients to low doses of X-rays, which carry potential health risks. Despite being the most popular imaging method, mammography is not very sensitive when it comes to breasts with thicker tissue, which is prevalent in younger women.30,31 In clinical settings, ultrasonography has been adopted nowadays as a mammography alternative. However, the varying textural features of low-quality ultrasound images often result in inconsistent performance on new test cases. Infrared thermography, another imaging modality has manifested significant potential in supporting breast cancer detection and diagnosis.32–34 Infrared thermography measures only the surface temperature and is unable to detect tumors located deep within the breast tissue. Consequently, it may overlook cancers that are not close to the skin surface. Furthermore, when compared to other diagnostic techniques like mammography or ultrasound, infrared thermography has poor sensitivity and specificity. This could lead to false positives or false negatives due to an erroneous distinction between benign and malignant tumors. This proofs that diagnostic performance is better than conventional visual imaging evaluations 35 and further inspires us to design a ML model that incorporates various clinicopathological features of patients to detect breast cancer with datasets of patients collected from different oncological or tertiary care settings.
The CAD system leverages ML methods to carry out classification tasks. The use of ML tools in medical diagnosis is steadily increasing. Some of these techniques have demonstrated excellent results in classification problems, aiding medical experts in identifying diseases. Life-inspired algorithms provide strong search capabilities, adaptability, and the ability to identify high-quality feature subsets, making them highly valuable in ML and data analysis. González-Patiño et al. 36 employed both bat algorithms and genetic algorithms to segment mammographic images. The approach involves segmenting mammograms by utilizing the Dunn index as the optimization function. In this process, each individual is represented by a grey level, which corresponds to different intensity values within the image. The grey levels are updated continuously during the process to maximized the Dunn index function; a higher index indicates better segmentation. González-Patiño et al. 37 designed an innovative immune-based classification algorithm named AISAC-MMD, specifically designed to manage multiclass, mixed, and incomplete datasets seamlessly. One of the main merits of this algorithm lies in its low computational complexity, making it highly efficient for various applications. Ali et al. 38 created an optimized model for detecting and diagnosing breast cancer early by synergizing a meta-learning algorithm with sophisticated deep learning approaches. Meta ensemble learning techniques was used to aggregate the outputs of several convolutional neural networks (CNNs), enhancing the classification accuracy of the model. Caroline et al. 39 utilized two bio-inspired optimization approaches GA and PSO to optimize the hyperparameters and design the fully connected layers of three advanced CNN architectures: VGG-16, ResNet-50, and DenseNet-201.Ting et al. 40 developed CNN-BCC algorithm to classify breast cancer lesion based on medical imaging data, attaining high accuracy in their classification results. Their method was evaluated using a real-world dataset comprising 221 patients, which were divided into three categories: malignant, benign, and healthy. Acharya et al. 41 suggested a novel approach that integrates advanced image preprocessing techniques with deep learning algorithms to refine the accuracy of breast cancer classification across multiple datasets. Sun et al. 42 presented a novel approach utilizing genetic algorithms to design the architectures and initialize the connection weights of deep convolutional neural networks. This approach is specifically tailored to enhance performance in image classification tasks by evolving network structures and weight configurations, ensuring better adaptability and accuracy. Bilal et al. 43 integrated the improved version of quantum-inspired binary Grey Wolf Optimizer with the radial basis function kernel of SVM, aiming to enhance classification accuracy of breast cancer by optimizing the SVM parameters. The efficacy of the suggested hybrid approach was determined by comparing its classification performance to other popular optimization methods, viz. GA and PSO. Analysing patient data and selecting suitable doctors present certain challenges, but developing intelligent systems and computational methods with ML for classification, can ultimately assist medical professionals in overcoming these obstacles. 44 As scientists increasingly acknowledge the importance of making decisive choices in disease treatment, the role of computer aided diagnostic tools has become indispensable. 45
Methodology
Datasets
This study was investigated with five datasets, primarily featuring clinical, pathological, and socio-demographic data from patients who underwent breast cancer treatment at tertiary care hospitals or specialized oncology centres. Breast cancer diagnosis involves a structured process, beginning with a comprehensive clinical evaluation and then followed by several imaging procedures like breast ultrasounds and mammograms. Thereafter, tissue sampling may be performed using fine needle aspiration cytology (FNAC) or a biopsy from the suspected areas, followed by microscopic analysis to verify the presence of malignancy. The dataset description are as follows: The first dataset comprises features extracted from the digitized image of a fine needle aspirate (FNA) taken from a breast mass.
46
This dataset is among the benchmark breast cancer datasets available in the UCI ML repository and has been applied in numerous research studies. The second breast cancer dataset
47
includes clinicopathological factors related to the recurrence or non-recurrence event of breast cancer in patients diagnosed at University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. This dataset exhibits multivariate characteristics and consists of 286 cases, each characterized by nine factors related to clinical, pathological, and demographic information, predominantly categorical in nature. It is also sourced from UCI ML repository. The third dataset,
48
analysed a cohort of 905 patients who had received treatment for breast cancer during the period 2009–2014 at the National Institute of Oncology in Rahat, Morocco. Patients were classified into two groups according to their molecular subtype: triple-negative breast cancer (TNBC) and non-TNBC. The clinical, pathological and demographic parameters of patients having breast cancer in this retrospective study, have been captured from Biostudies,
49
a public repository. The fourth dataset
50
includes 85 breast cancer cases, confirmed through both histopathology and cytology, underwent treatment at the NKP Salve Institute of Medical Sciences and Research Centre in Nagpur, India, during the years 2012 and 2014. All patients were enrolled prospectively in this longitudinal study, with a case sheet used to collect demographic information, risk factors, and clinical profiles. Patients were further categorized into TNBC and non-TNBC groups according to their hormonal status. The fifth dataset
51
comprises of 251 patients with histologically confirmed breast cancer, treated at the Radiotherapy Department of Lagos University Teaching Hospital in Nigeria. This non-interventional study involved female patients over the age of 18 who received treatment at the outpatient clinic between July 2017 and July 2019. Two groups of patients were formed depending on the molecular subtypes of each individual: 119 individuals, or 47.4%, were classified as TNBC, with the remaining 43.2% belonging to the non-TNBC category.
Existing hybrid life-inspired models
In ML model development, hyperparameter tuning and feature selection are crucial for ensuring that the model fits the data effectively while maintaining the ability to generalize to unseen data. Both processes address key aspects of model performance and complexity management, helping in the formulation of robust and effective models. Hyperparameters in ML are parameters that users specify to regulate the learning process and influence how the model gets trained. The purpose of hyperparameter tuning is to refine these settings to attain optimal model performance. Feature selection refers to selecting the most significant and relevant features from the dataset to improve the efficiency and accuracy of the model. It directly impacts the model's ability to learn and generalize by improving accuracy and reducing computational complexity. The existing hybrid models counter these two aspects throughout the model development phase.
Differential Evolution, 52 a stochastic, population-based evolutionary algorithm used for optimizing nonlinear and non-differentiable functions in continuous spaces on a global scale. Similar to other population-based, life-inspired algorithms, the differential evolution algorithm begins with an initial set of candidate solutions. In order to iteratively improve these candidate solutions, mutations are introduced into the population and the candidate solutions with the lowest objective function values are selected for retention. The differential evolution (DE) algorithm provides benefits over widely used life-inspired methods by efficiently handling nonlinear and non-differentiable multi-dimensional objective functions, while requiring only a few control parameters to direct the minimization process. As a result, recent literature survey reveals extensive application of DE in feature selection53–59 and hyperparameter tuning.60–62 DE-SVM model63,64 is often used because SVM serves as a robust machine learning method, especially well-suited for classification tasks. It excels at processing diverse medical datasets and uncovering complex relationships within them. The “grey wolf optimizer” (GWO), 65 a population-based meta-heuristic algorithm that simulates the hunting behaviour and social hierarchy of grey wolves in the wild. This algorithm models the cooperative hunting method and the wolves’ social dominance, where alpha, beta, delta, and omega wolves play distinct roles. By simulating the encircling, tracking, and attacking behaviours of the wolves, GWO effectively explores and exploits the search space, making it a robust tool for addressing optimization challenges.
GWO has been effectively applied in feature selection66–71 and hyperparameter tuning72,73 owing to its capability to update the solutions during optimization process. Kumar et al. 74 developed an advanced version of the GWO algorithm combined with SVM to identify the most relevant tumor features for accurately classifying benign and malignant tumors. The proposed method has been assessed and compared with several state-of-the-art and recently developed breast cancer classification techniques using the Wisconsin Diagnostic Breast Cancer (WDBC) benchmark dataset. PSO, 75 introduced by Eberhart and Kennedy is a population-based, stochastic optimization method inspired by natural processes, mimics the social dynamics and collective behaviour observed in bird flocks, where individuals refine their movements by considering both their own experiences and those of their neighbours. The PSO-SVM model has diverse applications, including in remote sensing of optical images, 76 feature selection and parameter optimization,77–79 and in addressing various engineering challenges. 80 The Genetic Algorithm (GA), introduced by John Holland, 81 is derived from the biological process of natural selection. It mimics the evolutionary principles of survival of the fittest, where candidate solutions evolve over generations through mechanisms like selection, crossover, and mutation, ultimately leading to optimized solutions for complex problems. The GA-SVM hybrid model integrates the evolutionary search capabilities of GA with the robust classification and regression abilities of SVM to enhance the performance of various recent applications viz. hospitalization expense modelling, 82 Sleep/Wake Classification, 83 maize variety detection, 84 fluid identification, 85 landslide susceptibility analysis. 86
The proposed model
The Camel Algorithm (CA) with traveling behaviour is designed with inspiration from the efficient, adaptable, and resource-conscious behaviour of camels, making it particularly well-suited for tackling complex optimization challenges. This concept is model by various methods and pathways that camels navigate to discover essential resources like water and food across the vast, arid landscapes of deserts. Various factors and operators are taken into account to define the procedure of CA, viz. temperature effects, resource availability, camel endurance, random movement, camel visibility, group dynamics, and termination conditions. The notations T, S, and E are used to represent temperature, supply, and endurance, respectively. Temperature is the main random factor influencing the traveling camel's journey and significantly affects its endurance. Further, it may differ from one camel to another, as each camel navigates distinct sector of the desert (search space). For any camel j, the current temperature Tnow can be represented as:
At the first step of the journey,
At the first step of the journey,
Here,
The camel algorithm, which leverages the traveling behaviour of camels, might be considered superior to other life-inspired algorithms in certain contexts due to several unique advantages: Camels are adapted to navigate and survive in harsh desert environments, which can be metaphorically translated the algorithm's capability to navigate challenging and complex optimization landscapes. The camel's traveling behaviour involves both exploration of new territories and exploitation of known resources. This dual capability can help the algorithm strikes a balance between exploring the search space and exploiting well-established solutions, preventing premature convergence to local optima. Camels are known for their efficient resource management, particularly in terms of water and energy conservation. This trait can be translated into the algorithm's ability to manage computational resources effectively, which could result in reduced computational costs and faster convergence. The traveling patterns of camels can inspire dynamic pathfinding strategies in the algorithm, allowing it to adjust its search path in accordance with the current state of search space. This adaptability can lead to more efficient searches compared to static path strategies used in other algorithms. The resilience and robustness of camels in extreme conditions might be mirrored in the algorithm's ability to perform well under various problem constraints and uncertainties. Given the camel's ability to travel long distances, the algorithm might be scalable and effective in handling large-scale optimization problems.
These unique features make the camel algorithm a valuable tool for feature selection and tuning of hyperparameters in ML settings.
The proposed model centres on creating a machine learning framework that integrates the camel algorithm with SVM, placing particular emphasis on optimizing hyperparameters and selecting relevant features. These two aspects are crucial for achieving high model performance and robust generalization, directly impacting the model's predictive accuracy and its adaptability across various datasets. Detailed literature covering Support Vector Machines (SVM) and highlighting its advantages over several classification algorithms is available in.87–89 This entire study was implemented using Python version 3.9.12, provided through the Anaconda distribution platform. Pandas, a comprehensive Python toolkit designed for data manipulation and analysis. It offers robust data structures like Series and DataFrames, which support a variety of data operations, including data cleaning, transformation, aggregation, and time-series analysis. The Pandas library was imported to handle data from datasets and store it in the form of DataFrame. The DataFrame in Pandas corresponds to a feature matrix, where rows represent the anonymous identities of patients, and columns correspond to sociodemographic, clinical, and pathological parameters associated with each patient. NumPy, short for “Numeric Python”, a Python library was imported for computing and processing elements in both multidimensional and single-dimensional arrays. Scikit-learn, 90 an open-source Python library that provides a unified interface for implementing a variety of machine learning algorithms, along with tools for data preprocessing, cross-validation, and visualization. The SimpleImputer function was used to address missing data through different imputation techniques, including mean, median, and most_frequent strategies for each column, or by assigning a specified constant value. A scikit-learn preprocessing method called StandardScaler standardizes the features by centering them around a mean of zero and scaling them to have a variance of one, ensuring consistent feature scaling across the dataset. The datasets were split into training and test sets randomly by utilizing the train_test_split function. In this model, the train_test_split method divides the datasets randomly in a 7:3 ratio, assigning 70% of the data for training purposes and reserving the remaining 30% as the test set. The model was fitted with the training dataset which allows it to perceive and learn from the data. Following the training of the model through the training dataset, it is important to evaluate its performance with the test dataset. This dataset facilitates the evaluation of the model performance and confirms its capacity to generalize well to newer or unexplored datasets. A validation dataset- a subset of the training set was utilized to assess the model performance while optimizing its parameters. For SVM hyperparameter tuning, the kernel, C, and gamma values were considered for achieving optimal settings. Different kernel functions, such as radial basis, sigmoid, linear, and polynomial, along with an array of evenly spaced C and gamma values on a logarithmic scale, had been represented in a dictionary format with parameters named kernel, C, and gamma from which the optimal combination of values was identified. The Camel algorithm, incorporating traveling behaviour, was applied to optimize the hyperparameters of SVM, aiming to discover the most effective combination. To effectively utilize CA for hyperparameter tuning of SVM classifier, it's essential to carefully configure the parameters, including population size, burden factor, dying rate, visibility, initial supply, initial endurance, as well as the minimum and maximum temperature thresholds. The parameter taken:
population_size=50, burden_factor=0.25, death_rate=0.5, visibility=0.5, supply_init=10, endurance_init=10, min_temperature=-10, max_temperature=10
where population_size stands for the number of camels in the population, burden_factor = the load or weight the camel carries, death_rate = dying rate of camel, visibility = the range or distance within which a camel can see, supply_init = initial supply of food and water, endurance_init = initial endurance, min_temperature and max_temperature indicates the minimum and maximum temperature that effect the camel activity range.
The SVM classifier, configured with the optimal hyperparameter values, was subsequently trained with the dataset to develop the model. The second step involved employing Camel algorithm to find pertinent subset of clinicopathological traits for the SVM classifier. The objective function of this wrapper-based feature selection aims to optimize classification accuracy while penalizing the selected features count. In the third step, the refined SVM classifier was executed to classify the subtype of breast cancer using the chosen features. Hence, the predictive model was designed by incorporating two essential stages of the model-building process—hyperparameter tuning and feature selection. By combining these techniques, the model aims to enhance interpretability and achieve higher performance. Hyperparameter tuning optimizes model parameters for more precise predictions, while feature selection refines the input data to focus on the most impactful features, resulting in a more efficient and insightful model. Figure 1 illustrates a flowchart depicting the development methodology of the Camel-SVM predictive model.

Flowchart of CA-SVM.
Outlined below are the essential algorithmic steps of the suggested Camel-SVM predictive model that are realized in Python. Step1: Import the dataset as a data frame in pandas with rows = patient identity and the columns = the related patients’ sociodemographic, clinical and pathological parameters. Step 2: Data standardization and missing values manipulation with SimpleImputer and StandardScaler functions. Step 3: Class labels as (m × 1) targeted array. Step 4: train_test_split () function for training and test datasets in the ratio of 7:3. Step 5: Import Camel Algorithm Step 6: Choose the best kernel =: [“rbf,” “sigmoid,” “linear”, “poly”], C and gamma values using CA. Step 7: Print best_parameters of SVM Classifier. Step 8: Apply CA in SVM Feature Selection function and fit on x_train, y_train data. Step 9: Test and display best_features, best_fitness. Step 10: Calculate subset_accuracy and all feature_accuracy.
This study investigated datasets from five oncology centres or tertiary care hospitals, focusing on patients diagnosed with breast cancer and presenting distinct clinicopathological traits. The first dataset inspected 569 unique breast cancer cases, featuring 30 attributes derived from FNA images, along with an ID and a target diagnosis identifying each case as either benign or malignant. The second dataset comprises of 286 instances, divided into two classes: one with 201 instances and the other with 85. Each instance is characterized by 9 distinct attributes, incorporating both linear and nominal types, providing a varied feature set for analysis. Each case represents a unique patient with associated clinical data intended for research on predicting breast cancer recurrence. The third dataset examined a cohort of 905 patients who underwent treatment for breast cancer. Mouh et al. 2020 comprehensive research initiative commenced in 2009 at the National Institute of Oncology in Rahat, Morocco, and extended until 2014. The study aimed to analyse various clinicopathological characteristics, treatment strategies, and outcomes among the patients. The fourth dataset, Aktar et al. focused on a cohort of 85 patients diagnosed with breast cancer, with their conditions confirmed through both histopathological and cytological examinations. Out of the total cases, 37 patients (43.7%) were classified as having TNBC, while the remaining 48 patients were identified as non-TNBC. The fifth dataset encompasses a non-interventional study conducted in Nigeria, which included 251 patients who were histologically confirmed to have breast cancer and sought for treatment at outpatient clinics during the period from July 2017 to July 2019. This dataset provides valuable insights into the clinical characteristics and treatment experiences of these patients at the onset of their diagnosis. The results section refers to these datasets as Dataset 1, Dataset 2, Dataset 3, Dataset 4, and Dataset 5, respectively.
Performance evaluation of camel-SVM predictive model
The Camel-SVM predictive model was rigorously examined across multiple performance metrics. These metrics included the confusion matrix, which provided a detailed classification report, and the number of selected features to gauge model simplicity. Additionally, the best fitness value from the fitness function was recorded, along with code execution time to measure computational efficiency. Key performance indicators viz. mean squared error (MSE) for error measurement, log loss for evaluating probabilistic predictions, and graphical analyses like the ROC curve and precision-recall curve to assess model accuracy and balance between precision and recall were also exercised.
A confusion matrix, a N × N tabular layout estimated the classification model's performance by generating actual and predicted classifications. It offers detailed insights into model accuracy by displaying counts of correct and incorrect predictions for each class. This matrix involved four primary components: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). Using these components, the confusion matrix enables the calculation of various performance metrics like accuracy, precision, recall, F1 score, and support, providing a comprehensive view of the model's classification performance through classification report. To simplify, instances of malignant and benign are designated as 1 and 0 in dataset 1. In dataset 2, cases of recurrence-events and non-recurrence-events are similarly labelled as 1 and 0. For datasets 3, 4, and 5, samples of TNBC and non-TNBC are also assigned with labels 1 and 0, respectively. Figure 2 and 3 outlines the classification reports of the Camel-SVM model for datasets 1–5 respectively. Figure 2–3 illustrates that the hybrid model can accurately identify malignant versus benign cases, recurrence from non-recurrence events, and TNBC and non-TNBC patients, demonstrating a higher accuracy rate. The higher performance indicators—such as accuracy, sensitivity, specificity, and possibly other metrics like F1-score or AUC—suggest that the model generalizes well across diverse datasets, which is crucial for its robustness and clinical applicability.

Classification report for dataset 12,3.

Classification report for dataset 4 & 5.
CA was further employed to retrieve the best possible subset of clinical and pathological features from the solution vector. The list of features and their corresponding fitness values were obtained by applying the algorithm to the solution vector. To evaluate the objective function, the solution vector is initially transformed into binary format, and a threshold value of 0.5 is used to select the significant features. The objective function was optimized by maximizing classification accuracy while penalizing the number of selected features. Table 1 highlights the feature selection performance of the Camel-SVM model across five datasets, showing a substantial reduction in the number of selected features compared to the total features in each dataset. This reduction indicates the model's strong capability to filter out less relevant features, streamlining the datasets for more efficient and accurate classification.
Camel-SVM model feature selection on five datasets.
Dataset1 = Breast Cancer Wisconsin (Diagnostic).
Dataset2 = University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia.
Dataset3 = National Institute of Oncology, Rabat, Morocco breast cancer dataset.
Dataset4 = NKP Salve Institute of Medical Sciences and Research Centre; Nagpur.
Dataset5 = Lagos university, Nigeria breast cancer dataset.
MSE or mean square error, a crucial evaluation metric and training objective in machine learning guides in model optimization towards achieving more accurate predictions. It quantifies the average of the squared differences between predicted and actual values, providing an indicator of the model's accuracy in making predictions.
The formula for Mean Squared Error (MSE) is as follows:
n = indicates the overall number of patients included in the five datasets as test cases.
In this formula, each error term
The performance of Camel-SVM model has also been evaluated using logarithmic loss (Log loss), a critical statistic for classification problems that quantifies the variation between predicted probability and actual values. Log Loss is computed by averaging the negative logarithmic values of the predicted probabilities for patients correctly classified into their respective classes. A Log Loss of 0 signifies a perfect match between the predicted probabilities and the actual outcomes, whereas higher values reflect greater deviation between them.
n = Total number of cancer patients. yi = Actual class label of the patients. pi = Predicted probability of patients belonging to the actual class.
The Camel-SVM model produced Log Loss values of 0.09 on dataset 1, 0.60 on dataset 2, 0.34 on dataset 3, 0.71 on dataset 4, and 0.61 on dataset 5, respectively. Log Loss values across all datasets indicates that the model delivers superior predictive accuracy and overall performance.
The ROC curve, a probabilistic graph that visually illustrates the performance of a ML model across various threshold settings. The ROC curve is generated by computing the true positive rate (TPR) and false positive rate (FPR) at various threshold levels, thereafter plotting TPR on y-axis and FPR on x-axis. The Area Under the Curve (AUC) measures a two-class classifier's ability to distinguish between classes and serves as a summary metric of the ROC curve. A two-class classifier is considered effective only if it achieves an AUROC score greater than 0.5, ideally approaching 1. If a classifier produces an AUROC score below 0.5, it suggests that the model is performing worse than random chance—implying that it fails to differentiate between classes and is essentially ineffective for making predictions. When the AUC is 0.5, the classifier cannot distinguish between positive and negative class labels. This implies that the model either predicts classes randomly or consistently assigns the same class to every patient, offering no useful predictive capability. Figure 4 depicted the AUROC curve of Camel-SVM predictive model on five datasets. A diagonal line running from the bottom left to the top right of the plot defines a no-skill line for the model across all thresholds, with an AUC value of 0.5. A model is deemed to have perfect skill when its curve starts at the bottom left, rises vertically to the top left, and then extends horizontally to the top right, intersecting the no-skill line at TPR value of 1. 91 For a false positive rate of 0.1, the AUROC curves for datasets 1, 2, 3, and 5 demonstrated a sensitivity of 1, capturing a large portion of the area under the curve before intersecting the no-skill line. However, for dataset 4, the AUROC score starts below 0.5 at certain initial threshold points, and eventually crosses over no-skill line to achieve sensitivity value of 1 before ultimately converging with the no-skill line. A potential reason for this could be the smaller dataset size, which may not have enough data samples to sufficiently represent the full range of possible input values. To mitigate these limitations, 10-fold cross-validation method was employed across all datasets.

ROC curve.
Precision-Recall (PR) curves depict the balance between the positive predictive value(precision) and true positive rate(recall) for a predictive model, considering different probability thresholds. ROC curves are ideal for datasets with balanced class distributions, whereas PR curves are more effective for handling imbalanced dataset. In a PR curve, precision is displayed on the y-axis and recall on the x-axis, with each point along the curve representing a distinct threshold value. High precision is achieved when there are few false positive cases, while high recall is attained when false negative cases are minimal. Figure 5 illustrates the PR curve for five datasets. A PR curve near the top-right corner signifies a strong classifier with high precision and high recall at various thresholds. On the other hand, a curve that is close to the bottom-left corner indicates poor model performance, with both low precision and low recall, suggesting that the model struggles to correctly identify positive cases. The PR curve of dataset 1 stays close to the top-right corner, significantly surpassing the baseline, with no overlapping areas and resembling a strong classifier across various thresholds. The PR curves for datasets 2, 3, 4, and 5 start below the baseline with a recall value of 0.0, then follow a zigzag path right upwards before eventually aligning near the baseline.

Pr curve.
The performance of the proposed predictive model was evaluated against several well-established life-inspired models, namely DE-SVM, GW-SVM, PSO-SVM, and GA-SVM. The key principles underlying these models were already explained in the preceding section. Table 2–6 showcases several metric comparisons between the Camel-SVM predictive model and the existing hybrid models on five datasets. The Camel-SVM predictive model achieved classification accuracies of 95.9%, 70.9%, 88%, 57.6%, and 94.7% on datasets 1 through 5, respectively, demonstrating superior performance across the datasets. Table 2–6 also provided a comparison of the number of features selected by the Camel-SVM model for each corresponding dataset. The details about the feature selection process are provided in the previous section. In the context of the CA with traveling behaviour, the best fitness value represents the optimal solution identified during the search or traveling process, showcasing the algorithm's effectiveness in exploring and exploiting the search space. The Camel-SVM model attained best fitness values of 0.06, 0.21, 0.15, 0.25, and 0.09 from datasets 1 to 5, successively. Python provides a built-in module called ‘timeit,’ which offers a straightforward method for measuring the execution time of small block of code. It delivers precise timing data by executing the code multiple times and averaging the results. The average execution time values for datasets 2, 4, and 5 were notably lowest compared to the other models. However, the execution time for datasets 1 and 3 were comparatively higher, among the other models. It is crucial to mention that the results from all models were evaluated using the 10-fold cross-validation method to guarantee robustness across five multi-centred datasets. Different kernel functions, such as radial basis, sigmoid, linear, and polynomial, along with an array of evenly spaced C and gamma values on a logarithmic scale, had been represented in a dictionary format with parameters named kernel, C, and gamma from which the optimal combination of values was identified. The Camel algorithm, incorporating traveling behaviour, was applied to automatically evolve the hyperparameters of SVM, aiming to discover the most effective combination. Datasets 1 and 5 produced linear SVM kernels with C values of 2.61 and 0.14, respectively. Datasets 2 and 3 generated RBF kernels, with C values of 1.0 and 4.61, and gamma values of 0.26 and 0.21, respectively. A higher C parameter value indicates that the CA technique focuses on minimizing misclassified samples, accepting a larger penalty, while a smaller gamma value suggests a broader similarity radius, enabling more points to be included in a particular class. Dataset 4, however, evolved sigmoid kernel function with C value of 0.82. The higher C parameter values correlated with increased classification accuracy, reinforced the idea that the CA method prioritizes minimizing misclassified cases. Thus, the newly developed life-inspired hybrid model can serve as an alternative diagnostic tool to assess the growing complexity of breast cancer, helping the doctors to deliver the patients with more effective treatment outcomes.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 1.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 1.
Dataset1 = Breast Cancer Wisconsin (Diagnostic).
SVM = Support Vector Machine.
DE-SVM = Differential Evolution-SVM model.
GWO-SVM = Grey Wolf optimization-SVM model.
PSO-SVM = Particle Swarm Optimization-SVM model.
GA-SVM = Genetic Algorithm-SVM model.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 2.
Dataset2 = University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia.
SVM = Support Vector Machine.
DE-SVM = Differential Evolution-SVM model.
GWO-SVM = Grey Wolf optimization-SVM model.
PSO-SVM = Particle Swarm Optimization-SVM model.
GA-SVM = Genetic Algorithm-SVM model.
Rbf = Radial Basis Function.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 3.
Dataset3 = National Institute of Oncology, Rabat, Morocco breast cancer dataset.
SVM = Support Vector Machine.
DE-SVM = Differential Evolution-SVM model.
GWO-SVM = Grey Wolf optimization-SVM model.
PSO-SVM = Particle Swarm Optimization-SVM model.
GA-SVM = Genetic Algorithm-SVM model.
Rbf = Radial Basis Function.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 4.
Dataset4= NKP Salve Institute of Medical Sciences and Research Centre; Nagpur.
SVM = Support Vector Machine.
DE-SVM = Differential Evolution-SVM model.
GWO-SVM = Grey Wolf optimization-SVM model.
PSO-SVM = Particle Swarm Optimization-SVM model.
GA-SVM = Genetic Algorithm-SVM model.
Performance metric comparisons between Camel-SVM predictive model and other existing models on dataset 5.
Dataset5 = Lagos university, Nigeria breast cancer dataset.
SVM = Support Vector Machine.
DE-SVM = Differential Evolution-SVM model.
GWO-SVM = Grey Wolf optimization-SVM model.
PSO-SVM = Particle Swarm Optimization-SVM model.
GA-SVM = Genetic Algorithm-SVM model.
Statistical methods like heatmaps and correlation matrices assist in determining the correlation between clinicopathological markers, which facilitates risk assessment and classification in medical research, including studies on breast cancer. Heatmaps are data visualization tool that use colour gradients to illustrate the level of dependency between different clinicopathological attributes in datasets. The heatmaps utilized blue and red shades to represent the positive and negative correlations between the clinicopathological components. Darker shades of color are associated with higher magnitudes of correlation. The dark blue shading along the heatmap's diagonal represents the correlation of each variable with itself. Figures 6, 7, 8, and 9 displayed the correlation heatmaps corresponding to datasets 2, 3, 4, and 5, respectively. In dataset 2, a strong positive association prevailed between age and menopause, as well as between invading nodes and node-caps. Furthermore, there lies a significant positive correlation between tumor size and factors such as invading nodes, node-caps, and degenerative malignancy. Similarly, irradiation and class also exhibit strong positive association. In dataset 3, factors such as age, menopause, the number of full-term pregnancies, hormone therapy, lymph nodes, tumor size with surgical type, and tumor progression showed a positive correlation. In the heatmap of dataset 4, a strong association was evident between the size at presentation, duration in months, and tumor quadrant. Additionally, tumor size bears a significant correlation with the type of surgery, adjuvant chemotherapy, and radiotherapy. There was a notable relationship between node stage and TNBC/non-TNBC status. According to the heatmap generated from dataset 5, age delineated a positive correlation with menopausal status, nutritional status, hypertension, and the presence of comorbidities. Moreover, correlation exists between the histological type, the stage of the disease, and the presence of metastasis.

Heatmap on dataset 2.

Heatmap on dataset 3.

Heatmap on dataset 4.

Heatmap on dataset 5.
Another statistical test known as chi-square test, is used to measure the goodness of fit between observed and expected data or to ascertain whether two categorical variables have a significant association. The purpose of this test is to assess whether the disparity between observed and expected values is a result of random chance or if it indicates a notable relationship between the clinicopathological variables being studied. The calculation of the chi-square statistic incorporates the degrees of freedom, which depend on the number of feature labels and class labels. The chi-square statistic is determined by squaring the difference between the observed and expected values for each categorical clinicopathological variable, and then dividing the result by the expected value of that variable.
In this context, τ symbolizes the chi-square value, Oij represents the observed frequency, and Eij indicates the expected frequency of the clinicopathological variable. The chi-square test was executed using Python version 3.11.2. The output generates values such as the chi-square score, chi-square p-value, F-score, F-score p-value, and the mutual information between the clinicopathological variables. In dataset 2, the clinicopathological variables that showed statistical significance (p < 0.05) includes tumor size, invading nodes, node caps, degenerative malignancy, and irradiation. In dataset 3, hormone therapy and progression (metastasis/relapse) were recognized as clinicopathological variables with statistical significance (p < 0.05). The chi-square test of dataset 4 identified duration in months, axillary mass, duration of breastfeeding in months, tumor stage, and node stage as statistically significant clinicopathological parameters (p < 0.05). In dataset 5, clinicopathological factors such as patient height, BMI, family history of breast cancer, comorbidities, allergies, and hormone receptor status were found to be statistically significant (p < 0.05). These findings underscore the deadly effect of breast cancer, which are connected to multiple risk factors, including hormone therapy, metastasis/relapse, and hormone receptor status. For further information, the original studies48,50,51 provided an in-depth statistical analysis of clinicopathological factors in both TNBC and non-TNBC subgroups.
Clinical significance refers to the practical importance of a medical factor in improving patient outcomes and informing clinical decisions. From the preceding results, tumor size, tumor grade, hormone receptor status and hormone therapy emerged as the clinically significant factors in breast cancer diagnosis. Tumor size is a critical factor in cancer staging (such as T stage in TNM staging) and helps to guide treatment decisions, including determining the feasibility of surgery and the need for chemotherapy or radiation therapy. Tumor size is a crucial prognostic factor, as larger tumors are typically associated with poor prognosis due to their greater potential for metastasis. Tumor grade plays a pivotal role in assessing prognosis and guiding decisions about the aggressiveness of treatment. Higher-grade tumors are linked to an increased risk of recurrence and metastasis. Hormone receptors are proteins located in and on breast cancer cells. The combined assessment of ER, PR, and HER2 status helps to categorize breast cancer into distinct molecular subtypes. Understanding hormone receptor status enables clinicians to make evidence-based decisions, enhancing patient outcomes and guiding the selection of effective, individualized treatment strategies. Hormone therapy is another clinically significant factor in discriminating TNBC from non-TNBC breast cancers. TNBC tends to be more aggressive and has a higher risk of recurrence compared to hormone receptor-positive subtypes, and the absence of effective hormone therapy makes a significant clinical challenge. Patients with TNBC often rely on alternative treatment strategies, such as chemotherapy, immunotherapy, or targeted therapies like PARP inhibitors, particularly in cases with BRCA mutations. In conclusion, the presence or absence of hormone receptor status defines the clinical significance of hormone therapy in breast cancer management, highlighting a major difference between TNBC and non-TNBC subtypes.
Discussion
The Camel-SVM predictive model outperformed four widely recognized models—DE-SVM, GW-SVM, PSO-SVM, and GA-SVM—in terms of classification accuracy. The hybrid models were additionally analysed for their selected feature count, best fitness value, execution time, as well as performance metrics like the ROC curve and precision-recall curve. The results suggest less execution time, higher AUC scores, and reduced MSE and Log loss values. Moreover, the AUC and PR curves provided convincingly better plots, reinforcing the overall classification ability of the model. The PR curve offered more insightful information for assessing model performance on the positive class, especially in the case of imbalanced datasets. Statistical analysis revealed a positive correlation among various clinicopathological parameters, as shown by the heatmaps, and the chi-square test was employed to identify the features with statistical significance. From clinical perspective, tumor size, tumor grade, hormone receptor status, and hormone therapy were identified as the significant features following the application of the Camel-SVM predictive model to five breast cancer datasets. The advantage of analysing breast cancer across datasets, majority of which incorporate clinical, pathological, and demographic features, lies in the identification of clinically significant factors that ultimately support patient diagnosis and treatment management. This reinforces the idea that the predictive model of Camel-SVM can acts as an alternate diagnostic tool to navigate the increasing complexity of breast cancer, optimize treatment strategies, and offer personalized care for various patient populations.
The technical implication of Camel-SVM model achieving strong performance metrics is that it showcases the model's capability to make precise predictions, ensuring its reliability and applicability in real-world scenarios. Higher values of performance metrics, such as precision, recall, AUC, and F1-score, suggests that the model is capable of generalizing well on unseen data, ensuring robust and reliable results. This creates opportunities for the model to be deployed in practical applications, where it can assist the tasks viz. medical diagnosis, decision-making, or optimization in various sectors, tailored to the needs of each domain. It also confirms the suitability of the selected algorithm, features, and overall methodology used in the model development. The medical significance of the Camel-SVM model achieving high classification accuracy in breast cancer detection lies in its ability to reliably discriminate malignant and benign tumors. It can accurately classify patients into specific subtypes, such as TNBC vs non-TNBC, and has the potential to predict the risk of recurrence versus non-recurrence events, aiding in more tailored treatment strategies and follow-up care. By recognizing patterns in patient data, including tumor size, grade, and molecular characteristics, the model can help recommend personalized treatment plans and optimize therapy options based on individual patient profiles. The Camel-SVM model, with its high accuracy, can serve as a valuable support tool for clinicians, minimizing the risk of human error and ensuring that the treatment decisions are guided by reliable, data-driven insights.
To our knowledge, the application of the Camel-SVM model for breast cancer detection has not been investigated in the existing literature. While similar strategies have been utilized for tasks like in silico-based toxicity prediction, this study marks a novel attempt to investigate its effectiveness in the context of breast cancer diagnosis. Bamba et al. 92 research addresses the complexities of toxicity prediction by integrating CA with SVM model, specifically targeting the prediction of NR-AhR toxicity type. CA was utilized exclusively for feature selection, while the Tox21 Data Challenge dataset was employed to determine the optimal SVM kernel. In contrast, our study implemented CA in two consecutive steps, first for feature selection and then for hyperparameter tuning of the SVM model. This strategy resulted in improved effectiveness compared to other hybrid models. Al-Waily et al., 93 implemented several significant modifications to the original CA to boost its performance, with a particular focus on tackling complex engineering optimization challenges. In the modified CA, the algorithm adaptively adjusts the exploration phase based on the quality of the current solution, allowing for more exploration in the initial stages and shifting focus to exploitation as it nears the optimal solution. Demiral et al. 94 employed an enhanced variation of the Camel Algorithm to tackle the Traveling Salesman Problem, demonstrating its effectiveness on a series of well-established benchmark datasets. The findings reveal that the Modified CA surpasses Simulated Annealing (SA), Tabu Search (TS), Genetic Algorithm (GA), and the original CA in performance across 60% of the analysed datasets. Demiral et al. 95 implemented the Modified CA in labor management by combining it with widely used heuristics, such as the constructive heuristic. To refine the results, the approach incorporated local search techniques, achieving efficient and reasonable solutions for labour management tasks within acceptable CPU times across various randomly generated datasets. The distinctive mechanics of the CA may include parameter or behavioural settings that are challenging to implement or optimize, which could hinder its wider adoption.
Our study implemented the Camel-SVM predictive model for breast tumor classification, comparing its performance against four well-established hybrid models. However, numerous other meta-heuristic hybrid models exist in the domain still remain unexplored in this research. Breast cancer analysis using hospital-collected datasets with clinicopathological features often results in limited dataset sizes. Small sample sizes often lack sufficient diversity to explore subgroup differences, thereby restricting the depth and scope of the analysis. This scarcity of hormone receptor data could potentially be attributed to financial constraints faced by patients in economically backward regions such as parts of Africa and India. To address these limitations, 10-fold cross-validation approach was used for all datasets. This study classifies TNBC and non-TNBC subtypes of breast cancer, but it does not explore the several other variants of TNBC that exist in reality. The Camel Algorithm is relatively new in comparison to well-established algorithms like GA and PSO. Consequently, the absence of in-depth comparative studies highlighting its effectiveness across different problem domains has hindered its visibility and credibility within the research community. This study aimed to showcase the effectiveness of the CA by hybridizing it with SVM to breakthrough breast cancer detection. Thus, the potential limitation of this research has been emphasized.
Conclusion
This study highlights the effectiveness of CA on SVM for breast cancer detection. The hybrid CA-SVM model demonstrates promising performance, leveraging the optimization capabilities of CA to enhance SVM's predictive accuracy. By addressing the challenges of feature selection and parameter optimization, the proposed approach offers a reliable method for distinguishing between TNBC and non-TNBC subtypes. Although this research underscores the potential of the CA-SVM model, further exploration with larger and more diverse datasets, as well as comparisons with other meta-heuristic hybrid models, is necessary to establish its robustness and generalizability.
This study provides key insights into the clinical relevance of tumor characteristics such as size, grade, hormone receptor status, and the impact of hormone therapy. The findings highlight the critical role these factors play in understanding disease progression, tailoring treatment strategies, and predicting patient outcomes. Tumor size and grade were found to correlate with disease severity, while hormone receptor status and hormone therapy effectiveness underscored their importance in personalized treatment planning. These results emphasize the need for integrating such clinical parameters into routine diagnostic and therapeutic frameworks to enhance patient care. Further research with diverse datasets and larger sample sizes could build on these findings, offering a deeper understanding of their implications across varied patient populations.
Footnotes
Acknowledgments
The authors would like to express their sincere gratitude to Institute of Engineering and Management, University of Engineering and Management Kolkata for providing the necessary facilities to carry out this research.
Ethical consideration
Five datasets of breast cancer patients were obtained from the publicly accessible repositories- UCI ML Repository and Biostudies. Therefore, ethical clearance was not necessary.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Grant-in-Aid Project of the IEM-UEM Group through Grant No. IEM(N)/2025/S/08-G113. The authors sincerely appreciate this support, which contributed significantly to the execution and advancement of the present study.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
