Abstract
The segmentation of cancerous tumours, particularly brain tumours, is of paramount importance in medicine due to its crucial role in accurately determining the extent of tumour lesions. However, conventional segmentation approaches have proven less effective in accurately delineating the exact extent of brain tumours, in addition to representing a time-consuming task, making it a laborious process for clinicians. In this study, we proposed an automatic segmentation method based on convolutional neural networks (CNNs), by developing a new model using the Resnet50 architecture for detection and the DrvU-Net architecture, derived from the U-Net model, with adjustments adapted to the characteristics of the medical imaging data for the segmentation of a publicly available brain image dataset called TCGA-LGG and TCIA. Following an in-depth comparison with other recent studies, our model has demonstrated its effectiveness in the detection and segmentation of brain tumours, with accuracy rates for accuracy and the Dice Similarity Coefficient (DSC), the Similarity Index (IoU) and the Tversky Coefficient reaching 96%, 94%, 89% and 91.5% respectively.
Introduction
Brain tumours represent a major challenge in both neurology and oncology [1], posing complex challenges for the diagnosis, treatment and management of patients. Their variety in terms of histological type, location and clinical behaviour makes them particularly difficult to classify and treat [2]. In addition, the presence of brain tumours can lead to a wide range of neurological symptoms, from headaches and visual disturbances to severe cognitive and motor deficits, further emphasising the need for accurate and rapid identification [3, 4]. In oncology, the ability to accurately characterise brain tumours directly influences treatment options and patient prognoses [5].
Brain tumours can be classified into two main categories: malignant and benign [6]. Malignant tumours, also known as brain cancers, are characterised by rapid and invasive growth, with the potential to metastasise to other parts of the brain or body. They include common types such as glioblastoma multiforme, which is notorious for its resistance to treatment and poor prognosis [7]. Benign tumours, on the other hand, grow more slowly and are usually localised, exerting pressure on surrounding brain structures without infiltrating them. Although they may cause symptoms and require treatment, benign tumours tend to have a better prognosis than their malignant counterparts [8]. Distinguishing between malignant and benign tumours is crucial in guiding treatment decisions and predicting clinical outcome, underlining the importance of accurate classification of brain images to differentiate between these two categories of tumour [9].
Accurate classification of brain tumours as malignant or benign is of paramount importance in clinical management, guiding the treatment plan and predicting patient prognosis [10]. Malignant tumours often require an aggressive therapeutic approach, including surgery, radiotherapy and chemotherapy, to control tumour growth and prolong survival [11]. In contrast, benign tumours can often be managed more conservatively, with surgery sometimes necessary to relieve symptoms or prevent complications [12]. Accurate classification also allows the risk of tumour recurrence to be predicted and the response to treatment to be assessed, which is essential for adjusting clinical management over time. By providing a sound basis for medical decision-making, accurate classification of brain tumours helps to optimise clinical outcomes and improve patients’ quality of life [13].
Automatic segmentation of brain tumours using convolutional neural networks (CNNs) offers several significant advantages over traditional manual methods [14]. Firstly, CNNs can process large amounts of brain imaging data with remarkable efficiency, considerably speeding up the segmentation process compared with manual methods, which are often tedious and time-consuming [15]. Furthermore, CNNs are able to learn to recognise and segment complex anatomical structures and brain tumours with high accuracy, leveraging large training datasets to improve their performance [16]. This ability to generalise from diverse training data allows CNNs to adapt to a variety of tumour types and clinical scenarios, offering considerable flexibility and versatility in their clinical application [17]. In addition, automatic CNN segmentation methods are reproducible and less prone to interoperator bias than manual methods, ensuring greater consistency and reliability in the interpretation of brain images [18]. Combining these advantages, automatic CNN segmentation represents a significant advance in brain imaging analysis, offering a valuable tool for the accurate characterisation and quantification of brain tumours in a clinical context [19].
Automatic brain tumour segmentation techniques have evolved considerably in recent years thanks to advances in machine learning [20], in particular convolutional neural networks (CNNs) and deep learning [21]. These approaches exploit the ability of neural networks to extract complex features from large amounts of imaging data, enabling accurate and robust segmentation of brain tumours, even in complex cases with tissue heterogeneity and imaging artefacts [22].
Related work
In the context of our study, several previous researches have been conducted on the segmentation of medical images, especially brain images, using CNN-based machine learning techniques. For example, (Havaei et al.) conducted a study in 2017 where they proposed a brain tumour segmentation analysis based on 2D CNN architecture by patch slicing. They used the BRATS-2013 dataset and obtained results with a Dice Coefficient (DC) score of 0.88 for both input cascade and local cascade CNN models. However, this study was relatively limited due to the lack of important metrics such as intersection on union (IoU) scores, pixel accuracy or Jaccard distance [23].
Pereira et al., worked on the same dataset but found small 3
Rajinikanth et al. (2018), This study proposes a method to segment brain tumours from two- dimensional (MRI) images from the BRATS 2015 set. It uses a Tsallis entropy supervised thresholding technique to preprocess the MRI data. Finally, it identifies the tumour cross-section using a regularised level set technique. Experimental results expect DSC 0.89 [29]. Brosch et al., applied a semantic segmentation approach using a 3D convolutional neural network (CNN) on the MICCAI 2008 and ISBI 2015 datasets to segment multiple sclerosis (MS) regions, obtaining Dice coefficients of 0.84 and 0.68, respectively [30].
Ramy A. Zeineldin, and al (2022), This study from the BraTS 2022 Challenge introduces an ensemble of deep learning frameworks (DeepSeg, nnU-Net, and DeepSCAN) for automatic segmentation of gliomas in pre-operative MRI scans. The ensemble achieved high performance with Dice scores of 0.9294, 0.8788, and 0.8803 for whole tumor, tumor core, and enhancing tumor respectively. The approach outperformed other methods in the final evaluations on both the BraTS testing dataset and an unseen Sub-Saharan Africa dataset [31].
Noreen N and al (2023), In the investigation of brain tumor categorization, an array of deep- and machine-learning techniques, including softmax, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors, and the ensemble method, were employed. These outcomes were compared with existing methods. Notably, the Inception-v3 model exhibited the highest performance, attaining a test accuracy of 94.34%. This advancement holds the potential to establish a prominent role in clinical applications for brain tumor analysis [32].
Reza AW and al (2023), The author’s goal was to devise a classification approach that is notably more accurate, cost-effective, and self-training, utilizing an extensive collection of authentic datasets rather than augmented data. The customized VGG-16 (Visual Geometry Group) architecture was employed to classify 10,153 MRI images into three distinct classes (glioma, meningioma, and pituitary). The network demonstrated a remarkable performance, achieving an overall accuracy of 99.5% and precision rates of 99.4% for gliomas, 96.7% for meningiomas, and 100% for pituitaries [33].
Saravanan Srinivasan and al (2024), A recent study proposed a hybrid deep CNN model that leverages Grey Wolf Optimizer and rough-set theory for feature selection, achieving a classification accuracy of 98% on high-resolution brain MRI images. The model showed significant improvements in training accuracy and reduced error rates compared to traditional methods [34].
Méthodologie
Descriptive organizational chart of the adopted model.
In our study, we opted to use a homogeneous and relevant dataset called TCGA-LGG and TCIA, in order to create a specific CNN model. This model includes two Resnet50 architectures for classification, as well as a refined DrvU-Net architecture for the segmentation of brain tumours visualised by MRI. We also implemented the CLAHE technique to improve the quality of medical images, and the data augmentation technique to generate additional images. The results obtained were then compared with those of previous studies in order to make a concrete estimate of the final results. Our study comprises a set of essential steps that will be explained in detail later in the document. Figure 1 precisely describes all the steps adopted.
In our study, we utilized a set of images from TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Imaging Archive) [35, 36]. Initially, 120 patients with lower-grade malignant tumors of the nervous system were identified in TCGA. However, ten patients were excluded from this dataset as they required specific informed consent for genomic information. Consequently, the final cohort included in the analysis comprised 110 patients.
The imaging dataset employed in our research was extracted from the Imaging Archive. Exemplar images are depicted in Fig. 2. This dataset encompasses patient images linked to TCGA and is financially supported by the National Cancer Hospital. During our computational scrutiny of the remaining 110 patients, it was indeed noted that six patients lacked the pre-contrast sequence, nine lacked the post-contrast sequence, and 101 had all relevant sequences. Therefore, our study comprehensively leverages 101 complete records, encompassing subdirectories of images and their associated masks.
Data preprocessing
Random sample images with their corresponding mask.
Example images from TCGA-LGG and TCIA dataset.
Data preprocessing is an indispensable process for deep learning applications when training a model, as it helps it to perform the selected task smoothly and accurately [37]. In our study, the medical images provided by the MRI imaging system suffer from poor quality and unusable for exploitation due to several artefacts such as Gaussian noise, salt and pepper noise [38]. For these reasons, we have chosen two procedures to pre-process all the images before using them for automatic classification and segmentation. The first consists of data augmentation to generate several images so that the results are more relevant and also to help the algorithm adopted to better understand the content of the integrated image [39]. The second procedure aims to improve the contrast quality of the images to facilitate accurate segmentation [40]. Figure 3 effectively illustrates the states of the images existing in our dataset.
Parameter setting for traditional data augmentation method
Parameter setting for traditional data augmentation method
Examples of images generated through data augmentation.
Among the main strategies used to generate a multitude of images and cover the problem of lack of data, data augmentation stands out as a pre-eminent method. Its significance lies in the fact that it addresses the challenge of insufficient data, particularly in cases where accessibility is difficult [41]. In the context of this study, various augmentations were introduced to systematically increase the volume of images generated. The specific parameters added are clearly defined in Table 1 and Fig. 4. In our study, we deliberately adopted the attributes described in the table for our data preparation.
Example of an application of the CLAHE technique.
The CLAHE technique, is an image processing method widely used to improve image contrast. Unlike the classic equalised histogram [42], which distributes pixel intensity uniformly over the whole image, CLAHE adjusts contrast locally, which means that it takes into account local variations in contrast. This approach makes it possible to maintain high contrast in low-light areas while avoiding overexposure in high-light areas [43]. In addition, CLAHE uses a contrast limitation technique to prevent excessive noise amplification. Using this method, image detail can be better preserved and relevant features can be highlighted, which is particularly beneficial in the context of medical imaging, such as in our MRI study, where the accuracy of tumour segmentation and classification is crucial. Figure 5 shows the difference between the image before and after implementing the CLAHE technique [44].
ResNet-50 is a deep convolutional neural network (CNN) architecture that has become very popular due to its efficiency and performance in a wide range of computer vision tasks. The “50” in the name indicates that it has 50 layers. What distinguishes ResNet-50 from other architectures is the introduction of residual blocks, which make it possible to build deeper networks while avoiding the problem of the gradient disappearing. Each residual block contains a skip connection that bypasses one or more convolution layers, facilitating the flow of the gradient during training. Thanks to this innovation, ResNet-50 can be successfully trained on massive datasets such as ImageNet, producing state-of-the-art performance in terms of accuracy and generalisation. Due to its depth and residual architecture, ResNet-50 is often used as a base model for transfer learning in many computer vision applications [45, 46].
Architecture VGG-16
The VGG-16 model is a convolutional neural network (CNN) widely used for image classification. With its 16 layers, including 3
Architecture VGG-19
The VGG-19 model is an extension of the VGG-16, adding three extra convolution layers. With a total of 19 layers, it offers an even deeper architecture for image classification. Like its predecessor, it uses 3
Architecture DrvU-Net
The model provided implements an architecture derived from U-Net, a convolutional neural network often used for image segmentation, particularly in the medical field such as brain tumour segmentation. The architecture of the model is divided into several key parts: the encoder, the transition and the decoder.
In the encoder part, image features are extracted using convolution layers, followed by pooling layers to progressively reduce the spatial resolution of the image while preserving important information. Then, in the transition pass, the extracted features are consolidated using convolution layers.
In the decoder part, the consolidated features are progressively restored to their original resolution using convolutional transpose layers. Information from previous encoding layers is also incorporated at this stage to improve segmentation accuracy, using residual connections or connection hopping.
The final model generates an output segmentation map, where each pixel in the image is classified as tumour or non-tumour, represented by probability values between 0 and 1. This model can then be trained on a brain image dataset to perform automatic brain tumour segmentation on new images.
Résultats
Evaluation criterion
Pour évaluer les performances du modèle adopté, nous avons utilisé les métriques les plus largement adoptées aux études d’évaluation les modèles basés sur CNN, En effet, Les équations ci-dessous représentent les critères d’évaluation couramment utilisés dans le cadre de cette étude:
Accuracy: The ratio of correctly predicted instances to the total instances in the dataset, often expressed as a percentage. It is a measure of overall model correctness.
Precision: Also known as Positive Predictive Value, it is the ratio of True Positives to the sum of True Positives and False Positives. It represents the accuracy of positive predictions.
Dice Similarity Coefficient: A metric used to measure the similarity between two sets. In the context of image segmentation, it quantifies the agreement between the predicted and true segmentation masks.
Intersection over Union (IoU): Also known as Jaccard Index, it is the ratio of the intersection of the predicted and true sets to their union. It is commonly used in image segmentation tasks.
“Segmentation Tversky” est une mesure d’évaluation, elle compare la similitude entre deux ensembles et utilisée pour évaluer la performance d’un modèle de segmentation par rapport à une référence ou à une vérité terrain. Cette métrique prend en compte à la fois la similarité entre les régions segmentées et les régions de référence, ainsi que le désaccord entre les deux.
The study presented in this article showed promising and concrete results for the brain tumour detection task. The DrvU-Net model based on the ResNet50 architecture demonstrated its effectiveness in detecting brain tumours in both tumour and non-tumour cases in terms of expected performance compared with the DrvU-Net models based on the VGG-19 and CGG-16 architectures. On the basis of the statistics obtained, the confusion matrix in Table 4 and Fig. 9 clearly highlights the effectiveness of the (DrvU-Net & VGG-16) model in terms of accuracy, which indicates positive predictions, with an precision of 0.96, while the (DrvU-Net & VGG-16) and (DrvU-Net & VGG-19) models display precisions of 0.87 and 0.76 respectively. Furthermore, the Accuracy coefficient also highlighted the ability of the (DrvU-Net & Resnet50) model to correctly classify tumour cases among all available cases, with Accuracy values of 0.92, 0.87 and 0.96 for the refined (DrvU-Net & VGG- 16), (DrvU-Net & VGG-10) and (DrvU-Net & Resnet50) models respectively. With regard to the variation in LOSS and ACCURACY during training of the three models, Figs 6, 7 and 8 show that the (DrvU-Net & Resnet50) model is more consistent than the other models, bearing in mind that all three models are subjected to the same training parameters, as shown in Table 3.
Data distribution and model training parameters for classification
Data distribution and model training parameters for classification
Training results of DrvU-Net model with various CNN architectures
Performance metrics of DrvU-Net model with different CNN architectures on the test set
Data distribution and model training parameters for segmentation
Accuracy and loss graph for DrvU-Net & VGG-16.
Accuracy and loss graph for DrvU-Net & VGG-19.
Accuracy and loss graph for DrvU-Net&Resnet50.
Test confusion matrix for DrvU-Net model with (VGG-16, VGG-19, Resnet-50).
As part of the study on brain image segmentation, the three segmentation models are based on the DrvU-Net architecture for segmentation, and on VGG-16, VGG-19 and ResNet50 for tumour detection. The weights from the classification phase of each architecture are used to assess the effectiveness of each model in segmentation, by generating a probability map indicating the probability that each pixel of the input image is in the tumour region. Then, from each model, a binary segmentation map is obtained and used to evaluate the performance of the three models. The model based on the ResNet50 and DrvU-Net architectures proved particularly effective in detecting and accurately segmenting brain tumours. Statistically, in terms of accuracy, this model showed a significantly better ability to correctly classify a greater number of tumour cases than the models based on VGG-16 and VGG-19. The respective accuracy values were 96%, 92% and 87%, as shown in Table 5.
All model results
All model results
Of all the positive tumour cases, the DrvU-Net model identified the largest number with high certainty. The sensitivity for detecting these cases with Resnet50 was 93%, while it was 89% for VGG-16 and 88% for VGG-19, as shown in Table 6.
In this comparative study, we evaluated the performance of three different models using Dice and Jaccard coefficients. These metrics are essential for quantifying the similarity between the segments predicted by the models and the reference segments (see Fig. 13).
The Dice coefficient is a similarity measure widely used in medical image segmentation to assess the accuracy of models. A Dice coefficient of 95% for the DrvU-Net model based on ResNet50 demonstrates that this model segments brain tumours with an accuracy very close to the ground truth, which is crucial in clinical applications requiring accurate delineation of tumour boundaries for diagnosis and treatment planning. In contrast, the Dice coefficient of 82% for the model based on VGG-19 suggests lower accuracy compared with ResNet50, probably often due to the less complex architecture of VGG-19, which captures the subtle details of brain tumours less well. The model based on VGG-16, with a Dice coefficient of 88%, shows an intermediate performance, better than VGG-19 but inferior to ResNet50, this difference possibly being due to the specific configurations of the layers and the way in which each model processes the features of the image.
The Jaccard coefficient (IoU) is another essential metric for assessing the performance of segmentation models. A Jaccard coefficient of 90% for the ResNet50-based DrvU-Net model indicates a substantial overlap between predicted segmentation and ground truth, which is consistent with the Dice coefficient results and confirms the effectiveness of ResNet50 in accurately capturing tumour extent. On the other hand, an IoU value of 69% for the model based on VGG-19 highlights its difficulty in obtaining a good match with the ground truth, reflecting the limitations of its architecture compared with ResNet50. The VGG-16 model, with an IoU of 79%, performs better than VGG-19, but is still inferior to ResNet50, suggesting that, although VGG-16 is capable of segmenting tumours with reasonable accuracy, it still lacks certain regions compared with ResNet50.
Focal tversky accuracy and loss for DrvU-Net+VGG-16.
The results clearly show that the DrvU-Net model based on ResNet50 outperforms the models based on VGG-19 and VGG-16 in terms of Dice and Jaccard coefficients. The high Dice (95%) and Jaccard (90%) coefficients for ResNet50 indicate that this model offers both accurate and reliable brain tumour segmentation, which is essential for clinical applications. The slightly lower performance of the VGG-16 and VGG-19 models highlights the importance of choosing the right model architecture for critical segmentation tasks. By presenting this detailed analysis, we address reviewers’ concerns and justify the superiority of the ResNet50-based model in our study. This result is also confirmed by the Tversky index, as shown in Figs 10, 11 and 12, which highlights the performance of the ResNet-based DrvU-Net model in efficient segmentation. The values obtained are 90.83% for Resnet50, 90.17% for VGG-19 and 90.73% for VGG-16 respectively.
Focal tversky accuracy and loss for DrvU-Net+VGG-19.
Focal tversky accuracy and loss for DrvU-Net+Resnet60.
Segmentation of data sets and model prediction of non-tumour or tumour regions (VGG16, VGG19, Resnet50).
Comparative analysis of achieved results
Comparative analysis of achieved results
Abbreviations: IoU, intersection over union; IoU_B, IoU for background; IoU_P, IoU for polyp; mIoU, mean IoU. Note: The bold values indicate higher values.
In this study, we prepared a DrvU-Net model based on an architecture derived from U-Net for segmentation and based on the Resnet50 architecture for the detection of brain images from a TCGA-LGG and TCIA dataset. This study recognised an integration of two techniques, namely data augmentation and the Contrast Limited Adaptive Histogram Equalisation (CLAHE) technique, aimed at improving image contrast quality. The results obtained show much more promising results for the two classification tasks and segmentation, particularly in comparison with other models based on the VGG-16 and VGG-19 architectures. The effectiveness of our model also compared with previous studies conducted with a similar segmentation objective, whether using our TCGA-LGG and TCIA datasets or other datasets containing the same type of medical images. Despite the relatively small size of our adopted dataset, the results are as expected. Table 7 provides a comprehensive comparison between previously performed models and our adopted model.
The study unequivocally demonstrates the ability of the elaborate DrvU-Net model to effectively segment brain images, thanks also to the integration of the CLAHE technique, which improved image quality. This significant efficiency enables results to be obtained that are in line with the expectations of models based on convolutional neural networks (CNN), whether in the context of classification or segmentation. At the same time, the implementation of the data augmentation approach for the final tasks proves crucial during the training phase of any model, enabling the efficient generation of new images despite the limited availability of data.
Ethical approval
This work does not involve any experiments on humans or animals. It solely relies on data in the form of image sets, as discussed in the Data section.
Availability of supporting data
In our study, we utilized a dataset consisting of MRI images referred to as TCGA. Indeed, this dataset is accessible, with all the pertinent information previously outlined in the “DATA” section of the article.
Competing interests
This work was carried out as part of the preparation for Mr. Halloum Kamal’s doctoral thesis under the supervision of Professor Ez-Zahraouy Hamid. Indeed, this project commenced three years ago with the objective of developing a system for treating cancerous tumors by leveraging CNN algorithms. This objective encompasses three studies:
Enhancement of the quality of medical images using the CLAHE and HE techniques. Classification of improved medical images through the CLAHE technique. Segmentation of tumor images using the U-Net architecture. Treatment of cancerous tumors through advanced prior X-ray simulation.
Over the past three years, we have successfully published an article entitled “Reconstruction of a 3D medical image from pre-processed 2D DICOM slices: Clinical application.” In our upcoming articles, we plan to further delve into the remaining studies.
Funding
This work is not funded by any agency, institution, or university. Haut du formulaire.
The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study. No funds, grants, or other support was received.
Authors’ contributions
Kamal Halloum: Owner of the article and responsible for the entire study conducted.
Hamid Ezzahraouy: Professor at the Faculty of Sciences, Mohammed V University in Rabat, and the supervisor of this work.
Employment
This work has solely originated from a doctoral thesis defense and has not been endorsed or adopted by any organization or institution.
Footnotes
Acknowledgments
Based on this tangible work, we would like to express our heartfelt gratitude to all the parties who supported and believed in the success of this project, including my parents, my small family. We also extend sincere thanks to Professor Ez-Zahraouy Hamid, the supervisor of this entire work, for his guidance, corrections, and feedback that greatly contributed to its completion. We also appreciate the Faculty of Sciences for providing the excellent laboratory equipment at Lamscis, where this work was conducted.
Conflict of interest
The authors declare they have no financial interests.
