Abstract
Colorectal polyps are a prevalent precursor to colorectal carcinoma. In recent years, there has been an increasing interest among deep learning researchers in developing automated neoplasm detection systems to serve as assistive tools for clinicians in detecting diminutive and inconspicuous polyps. Precise segmentation of these neoplasms in medical images is essential for early detection and intervention. While current research efforts focus on enhancing segmentation performance and achieving new state-of-the-art results, a comprehensive analysis of the various factors influencing the performance of neoplasm segmentation models remains to be conducted. In this study, we investigate the impact of color space on the performance of colorectal neoplasm segmentation networks. We employ three pre-trained semantic segmentation architectures: U-NET, DeepLabV3 and Pyramid Attention Network (PAN) to elucidate the relationship between the color space of input images and model performance. We examine this relationship using four color spaces: 1) RGB, 2) HSV, 3) HSL and 4) CIEL*A*B*. Four publicly available datasets: Kvasir-SEG CVC-ClinicDB, CVC-ColonDB and ETIS-LaribPolypDB, are utilized for training and testing. Our findings indicate that the choice of color space can significantly influence the performance of colorectal neoplasm segmentation networks.
Keywords
Introduction
Colorectal polyps, abnormal tissue growths in the colon resulting from uncontrolled cell proliferation, present a significant health concern.1–3 While most polyps are benign, untreated cases may progress to colorectal cancer (CRC), the third leading cause of cancer-related mortality.4–6 Detecting colorectal polyps is crucial for preventing colorectal cancer, as most cancers originate from adenomatous polyps. Early detection and removal of these polyps significantly reduce cancer risk. However, the variability in polyp size, shape, and color, particularly for small or flat lesions, presents challenges in traditional colonoscopy. Automated polyp detection systems, especially those leveraging color space transformations and deep learning segmentation, offer valuable assistance by enhancing diagnostic accuracy and reducing missed polyp rates, which can be as high as 25%. 5 For doctors, these systems alleviate cognitive burden and improve the quality of colonoscopies, allowing for more reliable polyp identification. This ultimately leads to more standardized care and better clinical outcomes. For patients, early detection translates into less invasive treatments, lower cancer risk, and improved overall prognosis. In this context, our empirical study on color space transformations in polyp segmentation contributes to advancing these automated systems, promoting more effective colorectal cancer prevention.
Colonoscopy remains the gold standard for detection and diagnosis of colorectal polyps. The procedure entails two primary stages: bowel preparation and endoscopic examination. Bowel preparation involves ingestion of a prescribed solution to facilitate clearance of faecal matter and optimize visualization during the procedure. During endoscopic examination, a colonoscope - a flexible tube equipped with a light source, camera and potential polypectomy capabilities - is inserted via the rectum to enable direct visualization and assessment of the colonic mucosa. 7
Despite colonoscopy's prominence, research indicates miss rates for polyps between 6% and 28%, with the ascending colon showing the highest prevalence of missed lesions.8–11 Factors like endoscopist fatigue, equipment limitations, and inadequate training contribute to these inaccuracies, underscoring the need for improved detection methods.7,12
Recognizing this challenge, deep learning researchers have turned to automated polyp detection algorithms to reduce miss rates.13–15 However, existing efforts have predominantly focused on enhancing detection accuracy, often overlooking the critical relationship between algorithmic performance and variables such as color space, model architecture, and image size.
Understanding how factors like color space selection, image preprocessing techniques, and feature extraction methods impact model performance is equally vital to enhancing accuracy. Such insights can pave the way for more robust and reliable colorectal polyp detection models, ultimately benefiting patient outcomes.
Our research addresses this research gap by pioneering an investigation into the relatively uncharted territory of color space selection in colorectal polyp detection. Unlike prior studies that primarily concentrated on algorithmic enhancements, we adopt a comprehensive approach.
We explore the correlation between the color space utilized in colonoscopic images and the efficacy of advanced semantic segmentation models, including DeepLabV3, 16 U-NET, 17 and Pyramid Attention Network (PAN). 18 Each model undergoes training on colonoscopy images represented in four distinct chromaticity spaces: RGB, HSV, HSL, and CIEL*A*B*. To overcome the challenge of limited training samples, we apply transfer learning techniques, substituting the encoder component with pre-trained variants from the ImageNet dataset. 19
Utilizing the publicly available colonoscopy image datasets Kvasir-SEG, 20 CVC-ClinicDB, 21 ETIS-LaribPolypDB, 22 CVC-ColonDB 23 we conduct an extensive analysis of each model's performance and behavior. We meticulously examine the predicted masks in each color space, aiming to discern which color space(s) exhibit optimal performance and understand the reasons behind observed variations. Our approach is grounded in the evaluation of standard metrics such as mean Intersection over Union (mIoU) and Dice coefficient.
Our findings reveal that, under certain conditions, models struggle to accurately identify specific polyps in images represented in the commonly used RGB color space, highlighting the significance of our research. These insights hold substantial implications for future advancements in the field and may guide the development of tailored pre-processing techniques to achieve acceptable performance levels. The remainder of this manuscript is organized into four sections: Background & Literature Review, Methodology, Experimental Results, Discussion and Analysis, and Conclusion.
Background & literature review
Overview
This section offers a comprehensive introduction to the challenge of polyp segmentation using deep learning techniques. We conduct a review of existing studies to gain insights into the methodologies employed by polyp segmentation models in identifying and delineating polyps. Furthermore, we delve into the significance of color space selection in medical imaging and investigate the broader association between color space choice and the performance of segmentation networks.
Automatic polyp segmentation
Polyp segmentation has been a subject of long-standing interest, primarily driven by the growing emphasis on automating medical processes to alleviate the workload of healthcare professionals. A polyp segmentation model typically employs deep learning techniques. Inference involves inputting an endoscopy image, and the model generates a binary mask as an output, predicting the polyp's location. During the training phase, the model utilizes both the input image and the ground truth mask to learn and refine its ability to accurately identify and delineate polyp regions. A typical polyp segmentation network follows an encoder-decoder architecture, where the encoder is responsible for feature extraction, and the decoder generates the binary mask. Evaluating the performance of a colorectal polyp segmentation model involves employing a range of metrics, including Intersection Over Union (IoU), Dice Coefficient, Precision, Recall, and F1 scores. These metrics are favored for their ability to account for both true positive and false positive predictions, offering a more comprehensive assessment of segmentation accuracy. In contrast, pixel accuracy, while occasionally utilized, places equal importance on all pixels, which may not be suitable for imbalanced datasets or when the focus is on accurately delineating polyp regions amidst complex backgrounds. Consequently, IoU and Dice Coefficient are often preferred due to their sensitivity to the delineation of specific regions of interest.24,25 Figure 1 provides a visual representation of this standard polyp segmentation network.

Building a polyp segmentation network: in this framework, the encoder undertakes the task of learning and extracting pertinent features from input images. Concurrently, the decoder's role is to generate a prediction mask, typically in the form of a binary segmentation mask, to delineate regions as either polypoid or non-polypoid.
Over the past few years, numerous polyp segmentation models and methods have emerged. In a notable study conducted by 26 the authors introduced an efficient encoder-decoder model enhanced with an attention mechanism. This innovative model exhibits the capability to perform polyp segmentation across diverse scales, thanks to its incorporation of an Atrous spatial pyramid pooling mechanism and strategically placed skip connections bridging the encoder and decoder components. The authors conducted rigorous training and validation using the Kvasir-SEG and CVC-ClinicDB datasets, achieving impressive results with a Dice coefficient of 0.905 and an Intersection Over Union (IoU) of 0.880.
In a different study by, 27 researchers introduced a DeepLabV3 network with ResNet101 as its backbone architecture for the segmentation of polyps, encompassing various shapes and types. The evaluation of this model was carried out exclusively on the Kvasir-SEG dataset, without explicit reference to external datasets or cross-validation procedures. In their efforts to mitigate overfitting, the authors implemented augmentation techniques, including color jitters and flips. The outcomes of this study revealed an impressive Intersection Over Union (IoU) of 0.901 and a Dice coefficient of 0.940.
In a different approach, 28 introduced a method based on fully convolutional networks (FCN) designed for the challenging task of detecting polyps under adverse conditions, including scenarios with intense white light reflection and various obstructions. The authors leveraged transfer learning and conducted comprehensive training and evaluation on the CVC-EndoSceneStill and Kvasir-SEG datasets. This innovative network yielded impressive results, with an Intersection Over Union (IoU) of 0.8666 and a Dice coefficient of 0.8680.
In another innovative approach, 29 introduced BLE-net, a specialized model designed for the precise segmentation of polyps, addressing a limitation often encountered with existing UNET or U-Shaped networks—ineffective boundary delineation. BLE-net comprises distinct boundary enhancement and boundary learning modules and adopts ResNeXt101 as its backbone architecture. This model underwent comprehensive training and evaluation on both the Kvasir-SEG and CVC-ClinicDB datasets, demonstrating its effectiveness with an Intersection Over Union (IoU) of 0.878 and a Dice coefficient of 0.926 on CVC-ClinicDB, as well as an IoU of 0.854 and a Dice coefficient of 0.907 on the Kvasir-SEG dataset.
The study in 30 introduces a novel method which integrates a SWIN transformer with EfficientNet 31 for polyp detection, with mean dice coefficient of 0.906, an IoU of 0.842, a mean weighted F-measure of 0.88, and a mean absolute error of 0.001.
Similarly, 32 proposes an automatic polyp segmentation model that utilizes the SegNet 33 architecture, which has shown promising IoU results, averaging 81.7% across multiple datasets.
In, 34 the authors explore polyp detection through the use of pre-trained VGG16 and MobileNet models, applying preprocessing techniques along with sliding window approaches. This resulted in competitive performance on cross-validation datasets, yielding precision, recall, and F1 scores of 91.9%, 89.0%, and 0.90, respectively. The work in, 35 introduces NeutSS-PLS, a saliency detection network specifically designed to address specular regions, achieving a notable precision of 92.30% and an F1 score of 92.40%.
Lastly, 36 presents sECA-NET, a framework that combines CNN and RPN for polyp detection and segmentation, demonstrating high precision at 94.9%, recall at 96.9%, and an F1 score of 95.9% across diverse datasets. Concluding this subsection and providing a concise overview of the landscape of polyp segmentation research, we have highlighted four key studies that exemplify various approaches and innovations in this field. These studies collectively underscore the significance of polyp segmentation as a critical task in medical imaging. They demonstrate the ongoing efforts to enhance the accuracy and robustness of segmentation models, addressing challenges related to scale, boundary delineation, and challenging imaging conditions. As we delve deeper into our investigation of the relationship between color space choice and segmentation performance, it is essential to recognize the foundation laid by these pioneering studies.
To date, the examination of the impact of color space selection on the performance of semantic segmentation models, particularly concerning its influence on the precise segmentation of colorectal polyps, and the elucidation of the relationship between segmentation performance and color space choice, remains a relatively scarce area of investigation within the academic literature.
Numerous studies have explored the interplay between color space selection and image segmentation, often focusing on color-based segmentation rather than ground-truth mask-guided segmentation. For instance, in a study conducted by 37 the authors investigated the correlation among four distinct color spaces: RGB, YCbCr, XYZ, and HSV, and its implications for image segmentation. Their investigation encompassed the utilization of diverse segmentation techniques, including K-means, Fuzzy C-means, Region growing, and Graph Cut. The authors trained their algorithms using images from the Berkeley databases and evaluated their performance using peak signal-to-noise ratio (PSNR), mean square error (MSE), and accuracy metrics. They conducted experiments on individual components of each color space and employed a voting mechanism to ensemble the results from these components.
The study found that while all four-color spaces exhibited the same PSNR and MSE values, the models performed most effectively when using the HSV's value (V) component and the YCbCr (Y) component. “Furthermore, the study confirmed that the voting mechanism, which aggregated results from different color components, outperformed the single-color component method. It was observed that the ensemble approach using the voting mechanism yielded superior segmentation results.
In a separate study conducted by, 38 the authors carried out an extensive investigation into the impact of color space on image segmentation. Their study also included a comprehensive survey of the utilization of different color spaces in image segmentation. The authors employed the fuzzy c-means clustering algorithm for image segmentation and explored multiple color spaces, including RGB, CMY, HSI, HSV, HSL, L1 L2 L3, CIEL*A*B*, CIELuv, CIEXYZ, YIQ, YUV, YCbCr, CIELch, LMS, and LSLM. The findings of this study reaffirm that the performance and speed of image segmentation are closely linked to the choice of color space. Notably, the study confirms that the CIELuv color space specifically, is associated with the least computational cost, highlighting its efficiency for image segmentation.
In another study by, 39 the authors conducted an examination of image segmentation methodologies based on color space selection. The primary focus of their investigation was to assess the influence of two prominent color spaces: 1) RGB and 2) HSV. Additionally, the study delved into the impact of employing a hybrid color space, which combines RGB with HSV. The study's outcomes affirm the superior performance achieved through the utilization of a hybrid color space in comparison to individual color spaces. Furthermore, the authors acknowledge that the suitability of the RGB color space for image segmentation may not be universally optimal, signifying the importance of judicious color space selection in segmentation tasks. There were not mention of the algorithm used to segment images; however, the authors conducted their experiments on images of hand gestures.
In conclusion, while several studies have explored the influence of color space selection on image segmentation in various contexts, there remains a notable gap in our understanding of how color space choice impacts the performance of semantic segmentation models, particularly concerning its application to the precise segmentation of colorectal polyps. The studies reviewed here have primarily focused on color-based segmentation and have demonstrated the importance of judicious color space selection in achieving optimal segmentation results. This research gap serves as the driving force behind our study, which seeks to elucidate this relationship and provide valuable insights into the selection of color spaces for improving the accuracy and reliability of colorectal polyp segmentation. The subsequent sections will delve into our methodology and experimental findings.
In summary, a color space is a specific organization of colors that defines how colors are represented, perceived, and processed by a computer system. It maps the range of colors in an image into a format that can be interpreted for analysis, such as RGB, HSV, or CIELAB. In medical imaging, and particularly in detecting neoplasms like colorectal polyps, the choice of color space can dramatically influence the model's ability to distinguish polyps from surrounding tissues. Neoplasms often have subtle variations in color, texture, and brightness compared to healthy tissue, making color space selection critical for enhancing these differences in an image.
Different color spaces emphasize various aspects of an image, such as hue, saturation, or lightness, which can make small but important features of polyps more visible. For instance, RGB focuses on combining primary colors but may struggle in low-contrast situations, while HSV and CIELAB help separate color from brightness, making it easier for the model to detect polyps even under challenging conditions like varying light or complex backgrounds. By transforming images into a color space that highlights these subtle differences, AI models can improve detection accuracy, particularly in difficult cases like small, flat, or poorly lit polyps.
We hypothesize that leveraging alternative color spaces, such as HSV, HSL, and CIELAB, will enhance the performance of AI-based polyp detection models by improving contrast and color differentiation between polyps and surrounding tissue, thus increasing the detection rate of neoplastic lesions in diverse clinical settings.
Methodology
Overview
To thoroughly examine the impact of color space on colorectal polyp segmentation performance, a series of experiments were conducted utilizing three distinct semantic segmentation architectures: U-NET, DeepLabV3, and Pyramid Attention Network (PAN). Four variations of each architecture were generated, with each variation trained and evaluated on a specific color space. The color spaces investigated in this study are RGB, HSL, HSV, and CIEL*A*B*. Each architecture-color space combination was assigned a unique identifier to facilitate model management and organization. A description of all 12 models, including their architecture and associated color space, is presented in Table 1.
A list of all the models used in this study.
A list of all the models used in this study.
Prior to training, we pre-processed our dataset by resizing all samples to a uniform dimension of 224 × 224 and normalizing the pixel values to fall within the range of 0 to 1 using ImageNet's standard deviation and mean values. The samples were then converted to the specific color space expected by each respective model. Upon completion of pre-processing, the samples were fed to a segmentation network and trained over several epochs. The resulting validation and training accuracies and losses were recorded for subsequent analysis. An example of this process is illustrated in Figure 2.

A summary of the color conversion process. In this example we convert an RGB image to HSL to illustrate the process.
In this work, four public datasets are employed for training and testing: Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, and ETIS-LaribPolypDB. All samples from these datasets are combined and allocated as follows: 80% for training, 10% for validation, and 10% for testing. To maintain the integrity of the training and validation processes, precautions are taken to ensure no overlap between the testing set and the training or validation subsets.
The Kvasir-SEG dataset contains 1000 colonoscopy images along with their corresponding ground-truth masks, showcasing polyps in diverse shapes, colors, and sizes. Each image has been meticulously annotated and validated by skilled gastroenterologists. The sample resolutions range from 332 × 487 to 1920 × 1072 pixels, with all 1000 images utilized for training and validation. The CVC-ClinicDB dataset consists of 612 static polyp frames sourced from 31 colonoscopy sequences, all having a fixed resolution of 384 × 288 pixels and is widely recognized for testing and validating various methods of colorectal polyp segmentation. Additionally, the CVC-ColonDB dataset comprises 300 samples, while the ETIS-LaribPolypDB includes 196 samples. In total, a comprehensive dataset comprising 2108 samples is created. A summary of the dataset composition and sample count is provided in Table 2.
A breakdown of the training, validation and testing portions.
A breakdown of the training, validation and testing portions.
Pre-processing constitutes an essential preliminary phase that must be executed prior to the training of any deep learning models to ease the training process. In this study, we implement multiple pre-processing steps to prepare the data for training. Initially, all samples are resized to a uniform dimension of 224 × 224. This resizing has resulted in a reduction in both training duration and model complexity. After resizing, pixel normalization is performed on the images such that every pixel value falls within the range of 0 and 1. This step is imperative in ensuring a consistent distribution across all images. The objective of normalization is delineated in Equation 1.

Samples of the HSL-converted images.
As the samples are already in RGB color format, we use the images as they are when training RGB_UNET, RGB_DeepLabV3 and RGB_PAN models. RGB, HSV, HSL, and CIELAB are distinct color spaces, each with unique strengths for medical image processing. RGB, the standard color space for digital devices, combines red, green, and blue intensities but often struggles to separate color from brightness, limiting its ability to detect subtle polyp features. HSV and HSL address this by decoupling color from brightness, making them better suited for identifying polyps with similar brightness to surrounding tissues but different hues. HSV emphasizes color contrast, while HSL offers enhanced sensitivity to brightness variations, which is crucial in variable lighting conditions during colonoscopy.
CIELAB, on the other hand, represents colors based on human vision and separates lightness from color-opponent dimensions, making it highly effective for detecting subtle color differences. This perceptually uniform color space allows for better distinction of polyps that may be similar in hue but differ in tone. These four colors spaces were chosen for testing because they offer diverse approaches to handling color and brightness, helping to improve the accuracy and reliability of AI-based polyp detection systems under different clinical conditions.
RGB to HSL conversion
The transformation of an image from the RGB color space to the HSL color space entails several computational steps. Initially, the RGB components must be normalized from their original range of 0–255 to a range of 0–1. This normalization is performed during the pre-processing stage and results in new values we denote as R
Subsequently, the maximum and minimum values among R
The methodology for transforming RGB samples into the HSV color space closely resembles that of the RGB to HSL conversion. The primary difference between these two conversions lies in the computation of saturation (S) and value (V). Saturation (S) is derived from an image utilizing Equation 8, while value (V) is calculated via Equation 9.

Samples of the HSV-converted images.
Unlike the RGB to HSV transformation, the conversion process of RGB images to the CIEL*A*B* color space is different. First, the RGB values must be linearized by applying a gamma correction. This correction can be achieved using Equation 10.

Samples of the CIEL*A*B*-converted images.
To investigate the impact of image color space on the performance of semantic segmentation models in detecting colorectal polyps, we trained three distinct architectures: U-Net, DeepLabV3 and Pyramid Attention Network (PAN). For each architecture, four variants were produced by training on one of the four color spaces discussed in the previous subsection. This resulted in a total of 12 models.
In this study, we have selected U-Net, DeepLabV3, and Pyramid Attention Network (PAN) for their established effectiveness in the realm of medical image segmentation, particularly in the context of colorectal neoplasm detection.
Each of these models offers distinct advantages that enhance their capability to accurately segment polyps from surrounding tissue, which is crucial for early diagnosis and intervention. U-Net is renowned for its success in biomedical image segmentation tasks, owing to its encoder-decoder architecture that incorporates skip connections.
This design allows U-Net to maintain high-resolution feature maps while simultaneously capturing context from deeper layers. As a result, it excels in detecting polyps of various sizes, particularly small or flat lesions that may be easily overlooked in conventional analyses. U-Net's ability to effectively balance local and global features makes it a valuable choice for this study. DeepLabV3 builds upon the strengths of atrous convolutions to capture multi-scale contextual information, making it especially adept at handling images where polyps may be situated in complex or cluttered backgrounds. The model's capability to maintain spatial resolution while integrating wider contextual cues allows for improved segmentation performance, which is critical in identifying subtle polyp features that could otherwise blend with adjacent tissue.
Pyramid Attention Network (PAN) introduces a unique attention mechanism that focuses on both local and global features across different layers of the network. By employing a pyramid structure, PAN enhances the model's ability to refine boundaries and prioritize relevant regions within an image. This is particularly beneficial for polyp detection, as it helps to distinguish neoplasms from their surroundings more effectively.
To ensure that performance differences were solely attributable to color space and not confounded by other factors, we maintained consistent hyperparameters and configurations across all 12 models during training. The number of epochs was fixed at 20, and the Adam optimizer was employed with a learning rate of 0.0001. Instead of using the conventional binary cross-entropy loss function, we utilized Dice loss as it is more relevant to our problem domain. Dice loss is defined in Equation 16.
The U-NET architecture was first introduced in 2015. Since its inception, it has become a widely utilized segmentation network within the field of medical imaging analysis. The U-shaped design of the network, coupled with skip connections between the downsampling and upsampling layers, enables it to effectively capture a broad range of features. The model is designed for semantic segmentation and comprises both a downsampling path and an upsampling path. The contracting path adheres to the standard architecture of a convolutional network and consists of repeated applications of two 3 × 3 unpadded convolutions. Each convolution is followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling operation with stride 2 for downsampling. In this study, we use a pre-trained ResNet50 encoder that has been trained on the ImageNet dataset to extract features from the images. We choose a pre-trained encoder to mitigate the issue of overfitting. The design of UNET is illustrated in Figure 6.

The original design of the U-NET network. The diagram is taken from the original paper. 17
DeepLabV3 is another commonly used semantic segmentation architecture that was introduced in 2017. DeepLabv3 is an advanced semantic segmentation architecture that builds upon its predecessor, DeepLabv2, 32 through the implementation of several key modifications. In order to address the challenge of segmenting objects at varying scales, specialized modules have been designed that utilize atrous convolution in either a cascading or parallel manner. This approach enables the capture of multi-scale context through the adoption of multiple atrous rates. Additionally, the Atrous Spatial Pyramid Pooling module from DeepLabv2 has been augmented with image-level features that encode global context and further enhance performance. ResNet50 43 has been used as the backbone for this network and transfer learning was employed. Figure 7 illustrates the design of a DeepLabV3 network.

The original design of the DeepLabV3 network. The diagram is taken from the original paper. 20
Pyramid Attention Network (PAN) is a different semantic segmentation network capable of extracting global contextual information from a given image. Pyramid Attention Network (PAN) has been developed to leverage global contextual information in semantic segmentation tasks. In contrast to many existing approaches, PAN integrates an attention mechanism with a spatial pyramid structure to accurately extract dense features for pixel labelling, circumventing the need for complex dilated convolutions and custom-designed decoder networks. A key component of this architecture is the Feature Pyramid Attention (FPA) module, which fuses features across three distinct pyramid scales using a U-shaped structure similar to that of a Feature Pyramid Network. We use a pre-trained ResNet50 as our encoder. Figure 8 presents the architecture of a typical PAN network.

The original design of the Pyramid attention network. The diagram is taken from the original paper. 25
Overview
In this section, we report the findings of our experimental study comparing the performance of three pre-trained deep learning models (UNET, DeepLabV3, and Pyramid attention network) applied to four different color spaces (RGB, HSV, HSL, and CIEL*A*B*) for the semantic segmentation of colorectal polyps.
Our results indicate that models trained on images represented in the commonly used RGB color space generally achieved higher performance metrics compared to those trained on images represented in other color spaces. However, we also observed instances in which RGB-based models failed to generate reliable segmentation masks. These instances are presented and discussed in this section and analyzed further in the next section.
Additionally, we provide a comprehensive analysis of our testing dataset and present visual examples of the segmentation masks generated by each model. All experiments were conducted on a single machine equipped with a P5000 Quadro GPU and 16GB of RAM, using PyTorch as the primary deep learning framework. Additional packages including NumPy, Matplotlib, and Sci-kit learn were utilized for data analysis and visualization. In this study, the models utilized for semantic segmentation—UNET, DeepLabV3, and Pyramid Attention Network (PAN)—were initialized with pre-trained weights from the ImageNet dataset, providing a robust foundation for feature extraction. Each model was trained for 15 epochs, during which a learning rate of 0.001 was maintained. The Adam optimizer was employed to facilitate efficient weight updates throughout the training process. To address the binary segmentation task, the models were configured to output binary predictions, using a sigmoid activation function to scale the outputs within the range of 0 to 1. This approach enabled the delineation of pixel probabilities corresponding to the polyp class, allowing for the application of a threshold to derive the final segmentation mask. These implementation details, including the specific hyperparameter settings and computational resources, are provided to ensure reproducibility of the results and to facilitate future research in the domain of colorectal polyp detection.
Dataset
Our testing dataset consists of 210 randomly selected samples taken from the datasets. These samples were not included in either the training or validation datasets used in our experiments. The testing dataset contains a diverse range of images, including several challenging samples with complex polyp structures and varying levels of image quality. Examples of these samples are illustrated in Figure 9.

Random samples taken from the testing dataset.
In preparing the testing dataset for our experiments, we applied the same pre-processing steps utilized for the training and validation datasets. This included resizing the images to a standardized resolution and performing data normalization to ensure consistent pixel intensity values across all images. The datasets employed in this study—Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, and ETIS-LaribPolypDB—are all well-established resources for training and testing AI-based polyp detection models, and their distinct characteristics render them particularly suitable for developing and evaluating segmentation algorithms.
The Kvasir-SEG dataset comprises 1000 polyp images that offer a wide variety of sizes, shapes, and textures captured during colonoscopy procedures. This dataset features high-resolution images taken under diverse lighting conditions, making it highly diverse and representative of real-world clinical scenarios. It encompasses a broad range of polyp appearances, from small, flat lesions to larger, easily identifiable polyps, thereby ensuring that models trained on this dataset can generalize well to various cases encountered during routine colonoscopies. Kvasir-SEG is particularly valuable for its diversity, as it includes both straightforward and complex images, providing a rich training ground for improving detection accuracy across varied environments.
The CVC-ClinicDB dataset consists of 612 images of polyps, exhibiting a similarly wide range of polyp types, characterized by significant diversity in size, color, and texture. What distinguishes CVC-ClinicDB is its inclusion of more challenging polyp instances, such as those captured from unusual angles or under suboptimal lighting conditions, which are common during real-time colonoscopy procedures. This dataset is well-suited for testing models, as it closely simulates the challenges encountered in clinical settings, making it an excellent benchmark for evaluating the robustness of AI models under varying conditions.
CVC-ColonDB contributes an additional 300 images, focusing on polyp frames extracted from colonoscopy videos. This dataset is characterized by its real-world conditions and includes samples with varying degrees of blur, illumination, and occlusion, which reflect the typical challenges encountered during colonoscopy. The diversity in this dataset further enhances its utility for training and validating segmentation algorithms, as it forces models to learn how to identify polyps in less-than-ideal scenarios.
Finally, the ETIS-LaribPolypDB consists of 196 images, primarily featuring smaller polyps that present unique detection challenges. The dataset is particularly useful for training models to recognize subtle features in polyp morphology and for handling cases where polyps may be partially obscured or blended into the surrounding tissue. The inclusion of this dataset allows for a more comprehensive evaluation of the models, as it tests their ability to detect polyps that are difficult to identify.
In this subsection, we present the evaluation metrics employed to assess the performance of our deep learning models in the task of semantic segmentation of colorectal polyps. These metrics provide a quantitative measure of the accuracy and reliability of the segmentation masks generated by each model-color space combination. In our analysis, we considered several commonly used metrics including precision, recall, F1 score, Dice coefficient, and Intersection-Over-Union (IoU).
Precision measures the proportion of true positive predictions among all positive predictions made by the model. Recall measures the proportion of true positive predictions among all actual positive instances in the data.
The F1 score is the harmonic mean of precision and recall, providing a balanced measure of the model's performance. The Dice coefficient measures the similarity between two sets, in this case the predicted segmentation mask and the ground truth mask. The Intersection-Over-Union (IoU) metric measures the overlap between the predicted and ground truth masks as a proportion of their union. The mathematical formulations for these metrics are provided in Equations 17–21.
A detailed comparison of the performance of all 12 models is presented in this subsection. To facilitate this comparison, we have tabulated the scores of each model in Tables 3, 4, 5, and 6. These scores provide a quantitative measure of the effectiveness of each model in achieving the desired results. Additionally, we present the training and validation losses and accuracies for each model in Figures 10, 11, and 12. These figures provide a visual representation of the learning process of each model and allow for a more in-depth analysis of their performance.
The calculated scores of the RGB-based models.
The calculated scores of the RGB-based models.
The calculated scores of the HSV-based models.
The calculated scores of the HSL-based models.
The calculated scores of the LAB-based models.

Training and validation loss of the four U-NET models.

Training and validation loss of the four PAN models.

Training and validation loss of the four DeepLabV3 models.

Samples of the RGB_UNET (Left) misidentified polyps compared to HSL_UNET (rows 1 and 2, right) and HSV_UNET (row 3, right).

Samples of the RGB_UNET (Left) misidentified polyps compared to HSL_UNET (rows 1 and 2, right) and HSV_UNET (row 3, right).

Samples of the RGB_UNET (Left) misidentified non-polyp regions compared to LAB_UNET (Right).
The data presented in Tables 3, 4, 5 and 6 reveals that the trained on RGB-based networks consistently outperformed their counterparts in other color spaces. Specifically, these models achieved an average Intersection over Union (IoU) score of 0.751 across all three models, indicating a higher degree of accuracy in their predictions; therefore, making it the preferred color space for colorectal polyp segmentation. In contrast, the HSV-based models exhibited the lowest performance among all models, with an average IoU score of 0.675.
Despite the generally superior performance of the RGB-based models, our analysis revealed occasional instances where their performance was hindered by certain factors. To gain a deeper understanding of these factors, we conducted a detailed comparison of the failed predictions produced by the RGB-based model and those produced by the best-performing network on the same instances. The results of this comparison are presented in Figures 13, 14, and 15, which provide valuable insights into the common factors that may affect the performance of RGB-based models. The analysis in Figure 16 presents a comprehensive evaluation of various deep learning models for polyp segmentation in colorectal images, utilizing multiple color spaces. Performance metrics, including the Dice coefficient, Jaccard index, F1 score, precision, and recall, are essential for assessing the accuracy and reliability of segmentation models. The first visualization illustrates the performance of three distinct models—DeepLabV3, PAN, and UNET—across five evaluation metrics in the RGB color space. Each model's performance is represented through individual bars, allowing for direct comparison of their efficacy. The metrics provide insights into the models’ ability to accurately identify and segment polyps, with higher values indicating superior performance. This depiction highlights the strengths and weaknesses of each model in the context of RGB image processing. In Table 7 we present the statistical analysis of our findings.
Figure 17 presents a qualitative comparison between the ground truth segmentation masks and the predicted outputs of the segmentation models across four color spaces—RGB, HSV, HSL, and LAB. The figure visually contrasts the accuracy of polyp boundary delineation between the models, highlighting areas where the predictions closely match or diverge from the actual polyp regions.
Overview
In this section, we present a detailed discussion of the results obtained in our study. We analyze the performance of the different models and color spaces and provide insights into the factors that may have contributed to their relative strengths and weaknesses. Through this discussion, we aim to provide a deeper understanding of the implications of our findings and to highlight potential avenues for future research. We look into the performance of the RGB_UNET model specifically as it is the best performing compared to DeepLabV3 and PAN to better analyze our results.
Paired t-test p-values for model comparisons across color spaces
Paired t-test p-values for model comparisons across color spaces

Evaluation of deep learning models for effective polyp detection in diverse color spaces.
Initially, we discerned that despite outperforming their counterparts in most scenarios, all three RGB-based models exhibited a failure to detect polyps accurately under specific conditions.
One such condition was the presence of white light glare on the polyps. Upon taking a closer look at the performance of the three RGB architectures, it becomes evident that polyps exhibiting white light glare present a considerable challenge to RGB-based models. Our results indicate that both the HSV_UNET and HSL_UNET models are effective in addressing this problem.
As shown in Figure 13, these models were able to accurately identify polyps in colonoscopy images despite the presence of white-light reflection. These findings have important implications for improving the accuracy and reliability of polyp identification in colonoscopy images, especially on challenging samples with white light glare. This is likely due to the fact that the RGB color space represents color information in terms of red, green, and blue components, which can make it difficult to distinguish between white-light reflection and the true color of the underlying tissue.

Comparison of models and color spaces.
We hypothesize that HSL_UNET and HSV_UNET outperformed RGB_UNET due to the fact that the HSL and HSV color spaces represent color information in terms of hue, saturation, and lightness/brightness values, which can provide a more robust representation of color information in the presence of white-light reflection.
This allows the HSL and HSV models to more effectively distinguish between white-light reflection and the true color of the underlying lesions.
In addition to effectively mitigating the impact of white-light reflection on polyp identification, our analysis revealed that the HSV and HSL-based models also demonstrated superior performance compared to their RGB-based counterparts when processing colonoscopy images containing visible blood vessels. This is a significant finding, as the misidentification of blood vessels as polyps has been another commonly reported issue in existing work.35,36,44,45
Subsequent representations provide a transposed view of the performance metrics for each model across different color spaces (RGB, HSV, HLS, LAB). Annotations within these representations offer a clear and concise portrayal of the models’ performance, facilitating the identification of trends and patterns. Darker shades indicate higher scores, enabling quick visual assessment of which model performs best for specific metrics in various color spaces.
This emphasizes the impact of color space selection on model performance, guiding future model development.
The bar plot displaying average model performance metrics by color space presents the average scores achieved by each model across different color spaces, supplemented by error bars representing the standard deviation of the scores. The error bars provide insight into the variability and consistency of the models’ performance, where smaller error bars suggest more reliable predictions. This figure underscores the overall performance trends of each model across color spaces, indicating which model demonstrates the most robustness and reliability in polyp detection.
Finally, the distribution of scores for all models across different color spaces is visualized, with each representation illustrating the interquartile range (IQR) of scores and the median indicated by a line within the box. Whiskers extend to the minimum and maximum values within 1.5 times the IQR, illustrating the variability in performance. This representation is pivotal for understanding the range of scores and identifying any outliers, providing a nuanced view of model performance beyond mere averages. Our results suggest that the use of alternative color spaces such as HSV and HSL can provide a more robust representation of color information in colonoscopy images, leading to improved performance in the identification of polyps and other relevant features. Several examples of the UNET_RGB model incorrectly identifying blood vessels as polyps are presented in Figure 14.
We theorize that the misidentification of blood vessels as polyps by the RGB_UNET model may be attributed to the limitations of the RGB color space in representing color information in colonoscopy images. Precisely, the RGB color information could make it challenging to distinguish between subtle variations in color and intensity. As a result, blood vessels may be misidentified as polyps due to their similar color and intensity characteristics in the RGB color space.
Furthermore, we observed that the RGB_UNET model was prone to misidentifying certain non-polyp regions as polyps. These regions were primarily located in parts of the colon that exhibited features similar to those of polyps, despite not being polyps. In contrast, the LAB_UNET model demonstrated superior performance in accurately identifying these polyp-like objects. This suggests that the use of the CIEL*A*B* color space may provide a more robust representation of color information in colonoscopy images, leading to improved performance in the identification of polyps and other relevant features. Figure 15 illustrates a comparison between the predictions made by RGB_UNET and LAB_UNET on images containing such objects.
This behavior suggests that the use of the CIEL*A*B* color space may be effective in distinguishing polyps from polyp-like parts in colonoscopy images compared to the RGB color space. This is likely due to the fact that the CIEL*A*B* color space represents color information in a perceptually uniform manner, in which the perceived difference between two colors is proportional to their Euclidean distance in the CIEL*A*B* color space. This can provide a more robust representation of color information in colonoscopy images, allowing for improved discrimination between polyps and other relevant features.
In addition to the observed limitations of the RGB_UNET model, we also found that the HSL_UNET, HSV_UNET and LAB_UNET models were less effective in detecting irregularly shaped polyps, particularly those with an elongated shape.
One possible explanation for the observed superior performance of the RGB-based models in detecting polyps of irregular shapes could be related to the specific characteristics of the RGB color space which may provide a more effective representation of the visual features associated with irregularly shaped polyps compared to other color spaces such as HSL, HSV and CIEL*A*B*.
To improve AI-based polyp detection systems, supporting multiple color spaces within these models offers a promising direction. Different color spaces can enhance various features of polyps—such as edges, textures, and subtle color differences—that are often missed by standard RGB-based systems. Our findings demonstrate that transformations into alternative color spaces, such as HSV, HSL, and CIELAB, can provide significant advantages in detecting polyps with diverse characteristics, particularly in challenging cases involving flat or small lesions.
The RGB color space, defined by red, green, and blue components, is an additive model primarily suited for devices like monitors and cameras. However, it does not align with human visual perception, which is more attuned to differences in hue, saturation, and brightness rather than the raw intensity values of primary colors. RGB also struggles to separate brightness information from color information, which becomes problematic in medical images, particularly in colonoscopy, where illumination and reflection play a key role.
One fundamental issue with the RGB color space is its sensitivity to variations in lighting conditions, as it integrates both color and intensity information directly. For example, specular reflections—common in colonoscopy—lead to high-intensity values across all channels, making it difficult to differentiate between tissue surfaces and reflective artifacts. Literature highlights that this inability to disentangle lightness from chromatic information leads to errors in segmentation tasks involving non-uniform lighting conditions.
The HSV (Hue, Saturation, Value) and HSL (Hue, Saturation, Lightness) color spaces offer an alternative approach by decomposing color information in a manner more consistent with human vision. Hue represents the color itself, while saturation indicates the intensity of the color, and value (or lightness in HSL) separates brightness from chromatic content. The key advantage of HSV and HSL over RGB lies in their ability to isolate brightness (value or lightness) from color. In medical imaging, this means that changes in illumination—such as those caused by the bright light of the colonoscope or the presence of specular reflections—affect the “Value” or “Lightness” channel, leaving the hue and saturation components more consistent and reflective of the actual tissue characteristics. This separation is particularly beneficial when segmenting polyps, as it reduces the likelihood that bright reflections will confuse the model.
Several studies in image processing have demonstrated that models trained in the HSV or HSL color space can outperform their RGB counterparts in tasks requiring object distinction under variable lighting conditions.37,46–48 By isolating hue, these models can more effectively recognize boundaries and features that are otherwise masked by brightness in the RGB space. Moreover, the non-linear distribution of color information in HSV and HSL closely aligns with the way humans perceive color contrast, leading to better performance in segmentation tasks that rely on subtle color distinctions.
The LAB color space's robustness comes from its explicit separation of luminance (L*) from chromaticity (a* and b*). In scenarios like colonoscopy, where variations in lighting can obscure critical polyp features, LAB can preserve the color differences while mitigating the influence of brightness. Studies in medical image analysis have shown that models operating in LAB can effectively distinguish between tissues of similar luminance but different chromatic properties, offering improved polyp detection, particularly in flat or small lesions where texture and subtle color differences are key.
The theoretical advantage of these color spaces—HSV, HSL, and LAB—has been empirically supported by their superior performance in polyp detection tasks. The capacity to dissociate brightness from color components allows these color spaces to handle common visual challenges in medical imaging, such as specular reflection, lighting variability, and subtle color differences between tissues. As a result, models trained on HSV, HSL, or LAB representations generally exhibit improved precision and recall, particularly in challenging cases involving glare or the misidentification of non-polyp regions. Moreover, models operating in LAB and HSV have been found to outperform RGB in cases where blood vessels or similar structures are misclassified as polyps, as the color separation in these spaces provides a clearer distinction between different tissue types. This enhanced feature representation aligns with the fundamental visual distinctions made by the human visual system, making these color spaces particularly suited for clinical applications such as colonoscopy.
Incorporating multiple color spaces into AI models enables better feature extraction, allowing the system to adapt to diverse visual environments commonly encountered in real clinical settings, such as variations in lighting or the presence of specular reflections. This can lead to improved segmentation performance and more accurate neoplasm detection, addressing key limitations in current detection systems.
Moving forward, integrating multi-color space processing into deep learning frameworks should be a priority. Validating these models in clinical environments and ensuring compatibility with various endoscopic equipment will further enhance their utility in hospitals and clinics, improving early detection of polyps and reducing the rates of missed neoplasms. Ultimately, the inclusion of diverse color spaces can make AI-assisted colonoscopy systems more robust, ensuring their real-world applicability and effectiveness in colorectal cancer prevention.
By examining Figure 15 we observe several instances where the models struggled to accurately distinguish between polyp and non-polyp regions. Several failure cases were identified, primarily related to over-segmentation and under-segmentation, as well as misclassification caused by image artifacts and challenging anatomical features. Over-segmentation was a recurring issue, particularly in the HSV and LAB color spaces, where non-polyp regions, such as surrounding mucosa or shaded areas, were incorrectly classified as polyp tissue. This tendency was especially evident in images with inconsistent lighting or shadowing, where the models appeared to misinterpret these visual cues as part of the polyp, leading to false positives.
Conversely, under-segmentation was observed in instances where the models failed to capture the full extent of the polyp, particularly in low-contrast regions where the polyp boundaries were less distinguishable from the surrounding tissue. This issue was more prevalent in the RGB and LAB color spaces, suggesting that the models had difficulty identifying subtle textural differences, resulting in incomplete or fragmented polyp segmentation. The models also exhibited sensitivity to specular highlights and reflections, with bright regions often being misclassified as polyps, further contributing to segmentation inaccuracies. This was particularly notable in the HSV and LAB outputs, where reflection artifacts caused distortions in the predicted polyp boundaries.
Moreover, the models demonstrated limitations in segmenting smaller polyps and polyps with complex textures, where finer details were often lost, resulting in coarser segmentations. This challenge was particularly evident in the LAB and HSL color spaces, indicating that these models may lack the necessary sensitivity to capture intricate polyp boundaries. Overall, the figure reveals that performance varied across color spaces, with RGB generally providing more consistent results, while HSV and LAB models exhibited more frequent segmentation errors. These observations highlight the need for further refinement in model architecture and preprocessing techniques to improve robustness against variations in lighting, reflections, and polyp texture complexity.
The statistical analysis using paired t-tests highlights significant differences in model performance metrics across different color spaces utilized in segmentation tasks. The RGB-based models demonstrated superior performance compared to HSV and HSL counterparts, as evidenced by significant p-values across multiple evaluation metrics, including Precision, IoU, and Dice score.
The RGB models significantly outperformed both HSV and HSL models (p < 0.05), indicating that the RGB color space enhances the model's ability to accurately identify true positives relative to the total number of predicted positives. However, no significant difference was observed between RGB and LAB models (p = 0.237), suggesting that the LAB color space might offer comparable precision but requires further investigation.
The recall metric showed significant differences in performance between RGB and both HSL (p = 0.009) and LAB (p = 0.047) models, indicating that RGB models are more effective in identifying all relevant instances. The comparison with HSV approached significance (p = 0.054), reflecting a potential advantage of RGB in capturing true positives. Similar to the precision findings, RGB models outperformed both HSV (p = 0.010) and HSL (p = 0.014) models in terms of IoU, indicating a greater overlap between predicted and ground truth regions. The comparison with LAB (p = 0.100) did not reach significance, suggesting the need for further analysis to clarify this relationship.
The Dice score results further corroborate the superior performance of RGB models, with statistically significant differences observed against HSV (p = 0.002) and HSL (p = 0.016) models. While the comparison with LAB did not yield significant results (p = 0.130), it indicates a promising avenue for exploration in enhancing model performance through LAB color space utilization.
Comparisons among HSV, HSL, and LAB models yielded predominantly non-significant p-values, indicating similar performance levels among these color spaces. The observed similarities suggest that these models may not leverage color information as effectively as their RGB counterparts, underscoring the importance of color space selection in deep learning-based segmentation tasks. In conclusion, the findings emphasize the quantitative efficacy of RGB color space for model performance in segmentation tasks while revealing potential avenues for future exploration of the LAB color space. This comprehensive statistical evaluation provides a foundational understanding of the color space's influence on model effectiveness, guiding subsequent research and applications in the domain.
Contextualizing color space selection with state-of-the-art models
While this study primarily focuses on the impact of different color spaces on the performance of polyp segmentation models, it is important to position these findings within the context of existing state-of-the-art models. Several recent studies have made notable advancements in polyp detection and segmentation by utilizing more sophisticated architectures such as SWIN transformers, EfficientNet, and attention mechanisms. These models have achieved high accuracy and generalizability across diverse datasets, as evidenced by the following examples. In the work by, 36 the authors present a hybrid model that combines a SWIN transformer with EfficientNet, yielding high performance across multiple datasets, with a mean Dice coefficient of 0.906, IoU of 0.842, and a mean weighted F-measure of 0.88. These results, when compared to the RGB-based UNET model in our study, which achieves a Dice score of 0.88 and an IoU of 0.87, highlight the role that advanced architectures play in improving segmentation outcomes. However, our findings suggest that color space selection—specifically RGB—plays a critical role in maintaining model performance, and that careful selection of color space can yield competitive results even in the absence of advanced architectural modifications.
Further, the work in 38 proposes a polyp segmentation model based on SegNet, which reports an average IoU of 81.7% on various datasets. Comparatively, the UNET model in our study achieves similar IoU results (87% with RGB, 71% with LAB), demonstrating that model architecture alone is not the sole determinant of performance. Our results indicate that the choice of color space can offer comparable benefits to more complex models, particularly when using RGB or LAB configurations.
Additionally, NeutSS-PLS, a model introduced in, 35 incorporates a saliency detection mechanism specifically designed to handle specular reflections, yielding high precision (92.30%) and F1 scores (92.40%). While our study did not address the issue of specular reflections, the performance metrics for RGB-based models like UNET (Precision: 90%, F1 Score: 91%) suggest that color space selection could complement such models, potentially improving their robustness when combined with attention mechanisms or saliency-based approaches.
While our study highlights the importance of color space in improving segmentation accuracy, these findings should be viewed as complementary to state-of-the-art architectural improvements. Future research could explore integrating color space optimizations with models like SWIN-EfficientNet or NeutSS-PLS to potentially enhance performance even further. The ability of advanced models to capture complex features, combined with an optimized color space, could offer a more holistic approach to improving the accuracy and reliability of polyp detection and segmentation.
Exploring segmentation challenges: Effects of image quality on model performance
A common observation across all models and color spaces is the persistent difficulty in accurately detecting extremely blurred, hidden, and small polyps.
The presence of blurred polyps is likely attributed to suboptimal imaging conditions during the colonoscopy procedure, where motion artifacts and variations in the focal plane can result in reduced clarity. These factors impede the model's ability to discern polyp boundaries, leading to incomplete or inaccurate segmentations. For instance, instances where polyps are significantly out of focus exhibit a tendency to blend into the surrounding mucosal texture, causing the models to misclassify or overlook them entirely.
Furthermore, hidden polyps, often obscured by folds of the colonic wall or overlapping structures, present a substantial challenge. The models appear to struggle in these scenarios, as the contextual information required to identify these concealed polyps is insufficient. This limitation suggests a need for enhanced model training with a focus on identifying polyps in occluded scenarios, potentially through the incorporation of synthetic data that mimics these difficult conditions.
The detection of small polyps is another notable issue, as the models frequently fail to identify lesions below a certain size threshold. This may result from inadequate feature extraction capabilities when processing smaller image regions, particularly in color spaces where subtle color variations indicative of polyps are less pronounced. The analysis underscores the necessity for models to incorporate multi-scale feature extraction techniques that could enhance sensitivity to smaller polyp sizes, thus improving overall detection rates.
Limitations
One major limitation to this study is that, we have not considered the use of additional color spaces such as Cyan, Magenta, Yellow, and Key (CMYK), CIE 1964 (U, V, W*) color space (CIEUVW) and the HSLuv color space (HSLuv). Further research is needed to explore the performance of semantic segmentation models using other color spaces to fully understand the relationship between color space selection and model performance in colorectal polyps detection.
Another limitation that is related to the selection process of the UNET, DeepLabV3 and Pyramid Attention Network models which may have influenced the results of this study. These models have their own strengths and limitations, and their performance may vary depending on the characteristics of the data and the specific task at hand.
Future research could compare the performance of these models with other state-of-the-art models for colorectal polyps semantic segmentation to further validate and extend these findings.
Future work
Moving forward, it is essential to encourage further exploration of various color spaces beyond those examined in this study. While RGB, HSV, HSL, and CIELAB have demonstrated significant potential in enhancing the performance of segmentation models for colorectal polyp detection, other color representations, such as YCbCr, XYZ, and various perceptually uniform color spaces, may offer additional benefits. Future research should focus on evaluating these alternative color spaces to determine their impact on model accuracy and robustness. Additionally, integrating advanced techniques such as deep learning-based color space transformations could further optimize detection performance by allowing models to adaptively learn the most informative features. By broadening the scope of color space analysis, researchers can contribute to the development of more effective AI-driven diagnostic tools, ultimately improving clinical outcomes in real-world settings.
Conclusion
In this study, we examine the impact of color space selection on the performance of semantic segmentation algorithms for colorectal polyps detection. We trained three distinct semantic segmentation models: U-Net, DeepLabV3 and Pyramid Attention Network (PAN), using four different color spaces: RGB, HSV, HSL and CIEL*A*B*, resulting in a total of 12 models. To mitigate overfitting and ensure a fair comparison, we employed transfer learning and maintained consistent configurations such as the number of epochs, learning rate and optimizer type across all models. Our results indicate that while the RGB-based models generally outperformed their counterparts in other color spaces, they exhibited limitations in detecting polyps under certain real-world conditions. The three main well-known obstructions that affected the performance of the typical RGB models were white-light reflection, blood vessels that exhibit polyps-like features and regions of the colon that look like polyps. As part of our future work, we plan to investigate whether an ensemble approach utilizing multiple color spaces can improve the performance of segmenting colorectal polyps.
Footnotes
Abbreviations
Acknowledgements
The authors would like to thank Swinburne University of Technology (Sarawak Campus) for providing the necessary resources to carry out this study.
Ethical approval
The datasets used in this study are all publicly available online for research purposes. No experiments on human subjects were conducted during this study.
Consent for publication
All the authors reviewed the results, approved the final version of the manuscript and agreed to publish it.
Consent to participate
Not applicable.
Author contribution
All authors contributed to the study conception and design equally. The first draft of the manuscript was written by Khaled ELKarazle and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declare that they have no conflict of interest to report regarding the present study. In addition the authors did not receive any specific funding to carry out this study.
Data availability
The datasets used in this study are all publicly available for research purposes and have been cited accordingly.
