Abstract
Cholangiocarcinoma (CCA) is a type of cancer that forms in the bile duct that carry digestive fluid from the liver. CCA is the primary form of liver cancer that affects population ranging from age 60 to 69 years. CCA is difficult to diagnose at an early stage. Hyperspectral (HS) imaging is an advanced imaging technique that combines spectroscopy with conventional imaging. HS imaging is an emerging field of study which can be used for early CCA detection. HS imaging involves capturing images across various spectral bands, which forms a three-dimensional data cube often called as hyperspectral data cube. In this study, we have utilized U-Net based models, namely U-Net and DenseUNet were used to perform semantic segmentation on the HS images of CCA tissues. A band selective approach was employed to derive a subset of meaningful bands based on the spectrum plot from the HS image. The HS images are further preprocessed with Principal Component Analysis (PCA). The models were further evaluated by computing the accuracy, AUC (Area under the ROC curve), sensitivity and specificity metrics. The proposed models, namely, U-Net and DenseUNet reported an overall accuracy of 73.47% and 77.09% respectively. The DenseUNet models outperforms the U-Net model on every evaluation metric. The proposed models were also compared with other state-of-the-art (SOTA) models trained on various HS dataset. This study explores the application of HS imaging in carcinoma detection. The findings of this study could be used for further enhancement of the approach.
Introduction
Cholangiocarcinoma (CCA), also known as Choledochal cancer, is a type of cancer that forms in the slender tubes (bile ducts) that carry the digestive fluid bile. Bile ducts connects the liver to the gallbladder and to the small intestine. CCA is the primary cancer of the bile ducts. CCA originates from the malignant transformation of cholangiocytes, which are the epithelial cells lining the biliary system [1]. CCA is generally categorized into two types based on its location within the biliary tree: intrahepatic and extrahepatic. Intrahepatic CCA develops within the liver parenchyma, forming distinct mass lesions and often displaying advanced clinical symptoms. Extrahepatic cholangiocarcinoma (CCA) originates in the larger bile ducts, including the left and right hepatic ducts, common hepatic duct, and common bile duct. CCA is a significant contributor to primary liver cancers, accounting for roughly 10% to 25% of these malignancies globally [2]. CCA rarely occurs before the age of 40; the typical age at presentation is between ages 60 to 69 years [3]. Due to the lack of symptoms in the early stages and their location, CCA is diagnosed at an advanced stage. Advanced diagnostic techniques like fluorescence in situ hybridization (FISH) and mutational analysis have become crucial for accurate diagnosis [4]. Early diagnosis is necessary to prevent the fatal situation of the patient. The primary method for diagnosing CCA is through histopathological analysis of choledochal tissues stained with Hematoxylin and eosin (HE), sample micrographs of Choledochal tissues stained with HE is depicted in Fig. 1.
Micrographs of distinct categories of Cholangiocarcinoma obtained from [5] (a) Tissue without cancer regions (b) Tissue with partial cancer regions (c) Tissue entirely affected by cancer.
Hyperspectral imaging (HSI) is a non-ionizing sensing technique, with the ability to capture the diffuse reflectance spectra across the visible (VIS) and near-infrared (NIR) wavelength range [6]. Hyperspectral imaging involves capturing multiple images across a range of adjacent spectra, allowing the reconstruction of the spectrum reflection for every pixel. This process results in the acquisition of three-dimensional hypercube information. The spatially resolved spectra collected provide valuable tissue diagnostic data, facilitating non-invasive monitoring of biopsies, histological and fluorometric analysis, and enhancing the understanding of diseases. The data acquired generates a hyperspectral cube, where two dimensions represent the spatial extent of the location, and the third dimension signifies the spectral content [7]. HSI has found its applications in numerous fields, such as archaeology [8, 9, 10], vegetation and water resource control [11, 12, 13, 14, 15, 16], food quality control [17, 18, 19, 20], and many more. HSI is an advance imaging technique that combines spectroscopy and imaging. HSI collects spectral information at each pixel of a 2D detector array, resulting in a 3D dataset containing spectral and spatial information, this dataset is often referred to as a hypercube [21]. HS imaging can be utilized to provide intra-operative feedback to the surgeon for objective assessment of cancer [22]. Studies show the effectiveness of HS imaging in noninvasive tissue analyses and in the detection of cancer in tissue samples collected from human body, ex vivo human tissue, in breast [23, 24, 25], skin [26, 27, 28, 29, 30, 31], colon [6, 32], brain [33], etc.
Conventional algorithms like Support Vector Machines [34], K Nearest Neighbors [35], Convolutional Neural Networks [36], and other image processing algorithm have been used on RGB (Red Green Blue) color space images for carcinoma detection . HS images requires a substantial amount of memory, this implies that applying image processing techniques on HS images is resource-intensive. For example, the dataset used in this study, the average size of a HS image is about 150 Megabytes (MB). These images contain multiple bands representing spectral information, which increases dataset dimensionality. Therefore, preprocessing is necessary to reduce dimensionality and make HS images compatible with conventional algorithms and models.
In this study, we explore the capabilities of U-Net based models on HS images of Choledochal tissues. This study proposes two semantic segmentation models, namely U-Net and DenseUNet, which were trained on HS images preprocessed with Principal Component Analysis (PCA), to perform semantic segmentation tasks. The proposed models are also compared with other state of the art (SOTA) models trained on various HS dataset.
The literature review conducted for this study indicates that hyperspectral (HS) imaging is an emerging field of study that demonstrates enhanced performance compared to RGB color space images. However, the storage requirements of HS images are enormous, which makes them difficult to work with, therefore there is a need to use various image processing techniques, in order to train a conventional model on HS images data. In [37] the authors investigated the segmentation of rat bile duct carcinoma from hyperspectral images (HSI) using the Otsu algorithm (OTSU) and support vector machine (SVM). Their study demonstrated the potential of HSI in detecting liver tumors, contributing to automated tumor detection techniques and paving the way for improved surgical outcomes and future applications in image-guided interventions for liver cancer. Utilizing a combination of spectral and spatial data analysis, a study [23] employed broadband hyperspectral imaging technique based on the U-Net [38] architecture that effectively identifies breast cancers with high efficiency.
In [24] Aboughaleb et al., investigated the efficacy of Hyperspectral Imaging (HSI) combined with advanced processing techniques for diagnosing ex-vivo breast cancer. Their study revealed distinct optical responses in breast tissue properties, utilizing K-mean clustering to differentiate between malignant and normal tissue. Results demonstrate high sensitivity (95%) and specificity (96%) in discriminating tumor regions from normal tissue, suggesting the potential of HSI for improving surgical outcomes compared to conventional methods. In the study [26] the authors presented a hyperspectral imaging (HSI) technique that allow for a non-invasive tool for skin cancer diagnosis. Their research demonstrates the effectiveness of HS Imaging in detecting and classifying pigmented skin lesions (PSLs), achieving high sensitivity (87.5%) and specificity (100%) in discriminating between benign and malignant PSLs. Their study underscores the potential of HSI to assist dermatologists in real-time diagnosis during clinical practice, offering a significant advancement in skin cancer detection. Manni et al. in [6] assessed the potentials of Hyperspectral Imaging (HSI) for automating colon cancer detection during surgery. Their study, employed a spectral-spatial patch-based classification approach on six ex-vivo specimens, which demonstrated promising results with a sensitivity of 0.88 and specificity of 0.78. Comparison with deep learning approaches highlights the superiority of their hybrid CNN method, paving the way for improved surgical outcomes with HSI guidance.
The study [32] proposed a machine learning and hyperspectral imaging technique for automatic colon and esophagogastric cancer recognition. Their study highlights the effectiveness of 3D Convolutional Neural Networks (3DCNN) with a high ROC-AUC of 0.93, suggesting the potential of this approach in clinical practice. Jansen-Winkeln et al. in [39] explored the potential of combining Hyperspectral Imaging (HSI) with artificial intelligence algorithms for automatic colorectal cancer (CRC) detection. Using a four-layer perceptron neural network, they achieved a sensitivity of 86% and specificity of 95% in distinguishing cancerous or adenomatous tissue from healthy mucosa. Additionally, HSI revealed significant perfusion parameter differences related to tumor staging and neoadjuvant therapy, suggesting its ability to detect chemotherapy-induced biological changes. Khan et al. in [40] provided a review on the surge of deep learning applications in medical hyperspectral image analysis. Addressing a gap in the literature, the paper explores how deep learning methods are utilized for classification, segmentation, and detection in this domain. By synthesizing current research, the authors identify challenges and propose strategies for future advancements, offering valuable insights for researchers in the field. The study done by Tsai et al. in [41] proposed a method combining Hyperspectral Imaging (HSI) with deep learning for early esophageal cancer detection. Using a single-shot multibox detector (SSD)-based system, they achieve 88% accuracy with white-light endoscopic images (WLI) and 91% with narrow-band endoscopic images (NBI). Compared to RGB images, this approach shows a 5% increase in accuracy for both WLI and NBI, suggesting substantial improvement in cancer detection precision. Urbanos et al. in [33] explore the synergy between supervised machine learning (ML) methods and hyperspectral imaging (HSI) techniques for brain cancer classification. Their study utilizes HSI and ML algorithms like SVM, Random Forest (RF), and CNN to differentiate healthy and tumor tissues during brain tumor surgery. Results showed the overall accuracy ranging from 60% to 95%, indicating promising potential for ML-assisted diagnosis and surgical guidance in brain cancer. Wang et al. (2021) [25] proposed a PCA-U-Net method for segmenting breast cancer nests from hyperspectral images. By combining unsupervised principal component analysis with the U-Net neural network, the approach achieves an 87.14% segmentation accuracy, offering potential for aiding pathologists in diagnosing breast cancer lesions and advancing tumor diagnosis.
A method for staging skin cancer, focusing on squamous cell carcinoma (SCC), using Hyperspectral Microscopic Imaging (HMI) and machine learning was developed in [27]. The study highlights the importance of early detection due to increasing global incidence. The authors optimized their approach, achieving a staging accuracy of 0.952
Agrawal et al. [7] proposed a lossy compression method for hyperspectral images, employing modified convolutional autoencoders with attention layers. The encoder-decoder architecture, tested on images taken from Airborne Visible / Infrared Imaging Spectrometer (AVIRIS), Reflective Optics System Imaging Spectrometer (ROSIS), and NASA EO1, achieves up to a 5% increase in Peak Signal to Noise Ratio (PSNR) and up to 200 times higher compression ratio compared to existing methods, addressing the challenge of processing large hyperspectral datasets efficiently. La Salvia et al. conducted a study [44], which utilized Hyperspectral Imaging (HSI) to automate glioblastoma segmentation during surgery. Their AI-based approach, employing deep learning techniques, improves processing times for real-time segmentation. Evaluated against ground truths, their method enhances the gold-standard machine learning pipeline for intraoperative glioblastoma delineation. Mohamed et al. [45] introduced the Automated Laryngeal Cancer Detection and Classification using a Dwarf Mongoose Optimization Algorithm with Deep Learning (ALCAD-DMODL) technique for automating the detection and classification of laryngeal cancer (LCA). This method combined deep learning with the Dwarf Mongoose Optimization Algorithm to enhance accuracy in identifying LCA from throat region images, surpassing existing approaches in performance metrics. In the study [28], the authors investigated the use of hyperspectral imaging (HSI) combined with deep learning to classify skin cancer lesions. Utilizing the ISIC dataset, they trained models using YOLOv5 on both HSI and RGB images. Results showed that the HSI model outperformed the RGB model in identifying squamous cell carcinoma (SCC) features, with a recall rate of 0.794. This study highlights the potential of HSI technology for improving skin cancer classification accuracy. In [46] the authors addressed the challenge of early melanoma diagnosis using hyperspectral imaging and deep learning. Their study, encompassing samples from 50 melanoma and nevus patients, achieved promising classification accuracies of 89% and 98% for one-dimensional and two-dimensional data, respectively. This approach shows potential for improving diagnostic precision in distinguishing between melanoma and nevus, offering a non-invasive alternative to traditional histological methods. The study done in [29, 30], focuses on parallelizing HS processing methods using CUDA to expedite classification, emphasizing the need for efficient disease detection. Results showed significant improvements in classification times with parallel SVM and XGBoost algorithms, affirming GPUs’ suitability for hyperspectral image analysis.
The study done by Huang et al. [31] pioneered AI and hyperspectral imaging for identifying skin lesions, notably Mycosis fungoides (MF) from psoriasis (PsO) and atopic dermatitis (AD). The authors used a dataset of 1659 skin images, they developed a multi-frame AI algorithm, achieving high accuracy in lesion segmentation and classification. Their study highlights the potential of AI and HSI in dermatological diagnostics, offering a noninvasive and efficient approach for early detection of skin conditions. In summary, the literature reviewed provides valuable insights into HS imaging. However, it is important to acknowledge a limitation observed in the existing literature. The enormous storage requirements of HS images, makes it difficult to work with.
Dataset description
In this study we have utilized a secondary dataset, which is the Multidimensional Choledoch Database [5] to train the proposed models for semantic segmentation tasks. The Multidimensional Choledoch Database contains both the microscopy hyperspectral images and RGB color space images of cholangiocarcinoma tissues stained with HE (hematoxylin and eosin). All the images in the dataset are meticulously labelled by experienced pathologist to generate annotations files, these annotation files are further be processed to form the ground truth maps for further processing and training.
HS images in the datasets with respective ground truth mask: (a) Sample with full cancer regions (N). (b) Sample with no cancer regions (P). (c) Sample with partial cancerous regions [47].
The images in the dataset can be categorized into three types: L (samples with partial cancer regions with annotation files), N (samples with full cancer regions), and P (samples without cancer regions). The dataset contains 880 samples of multidimensional images captured from choledoch tissues of 174 patients. Among these multidimensional images, there are 689 scenes that contain partial cancer areas, 49 scenes that depict complete cancer areas, and 142 scenes that do not feature any cancer areas. The annotations are stored under “.xml” files, which contains the coordinates of the polygons that represents cancerous regions. These “.xml” files are required to be converted into binary masks that can be further used for training and testing purposes. These binary masks serve as ground truth maps, this can be seen in Fig. 2, the white regions in the mask represents the cancerous regions, while the black regions indicate the non-cancerous regions of the tissue. In the following sections we discuss the image acquisition system, and image formats.
The imaging system comprises a microscope (Nikon 80i, Nikon Corp.) and an acousto-optic tunable filter (AOTF) adapter(VA310-.37.80-L, Brimrose Corp.), an SPF Model AOTF controller (VFI130-140SPFB2C2exSTS, Brimrose Corp.), a gray scientific complementary metal oxide semiconductor (sCMOS, Dhyana 400D, Tucsen Corp.), a color charge coupled device detector (color CCD, DigiRetina 16, Tucsen Corp.), and a personal computer [5]. The hyperspectral images in the dataset are captured using the system depicted in Fig. 3.
Schematic of the image acquisition system for capturing HS images extracted from [5].
Single-band images are acquired by sCMOS with wavelength ranging from 550 nm to 1000 nm, utilizing narrow bandwidth via the AOTF [5]. These images contain two-dimensional spatial data and one-dimensional spectral data. They can be visualized as a three-dimensional cube.
The dataset contains the microscopy HS images, which are stored in two formats, namely “.hdr” and “.raw” files. The “.hdr” files contains important description about the “.raw” files. Some of the important parameters stored in “.hdr” file are the “band
Storage details of ‘.raw’ files in the database [47]
Storage details of ‘.raw’ files in the database [47]
In this section we discuss methods and models used in this study. We begin by describing the Principal Component Analysis (PCA) algorithm, U-Net model and DenseUNet model. We also delve into the details related to the various methods employed in this study and also discuss the rationale behind selecting those methods.
Dimensionality reduction using principal component analysis (PCA)
As evident from Table 1, HS images have large storage requirements, this makes processing HS images challenging. Due to larger number of bands present, HS images often suffer for the “curse of dimensionality” phenomenon [48]. Numerous bands within hyperspectral images frequently exhibit strong correlation. PCA is one of the oldest and simplest technique which is used to reduce dimensionality of the dataset, while preserving as much “variability” as possible. Principal Component Analysis (PCA) serves as a descriptive tool that does not rely on distributional assumptions. It is an adaptive exploratory method suitable for analyzing numerical data of various types. PCA transformation denotes a linear conversion of the original image bands into a collection of new, uncorrelated features.
For an
In this manner, the individual variables
In hyperspectral images, many bands frequently display strong correlations. PCA involves a linear transformation of the original bands in the image to produce a new collection of independent features. These new features are determined by the image covariance matrix’s eigenvectors, where each eigenvalue denotes the variance along the direction of its corresponding eigenvector. A very small number of primary components can be used to capture a significant portion of the variation in the image.
Contribution rate of each of principal component after applying PCA.
In [47] we have applied PCA to the HS image and retained the first component, the first component retains around 80.95% of the original variance, this can be clearly seen in Fig. 4. This study further explores the approach by retaining the top three components with retains around 96.96% of the original variance.
The U-Net [38] model is a popular model, which was originally introduced for biomedical image segmentation. U-Net falls under the category of Fully Convolutional Networks (FCNs), which are neural network that only contains convolutional layer in the network. The architecture of U-Net makes it capable for performing semantic segmentation with very few training images and yields more precise segmentation.
Architecture of a classical U-Net model [38].
The U-Net architecture comprises a contracting pathway and an expanding pathway, resulting in a symmetrical model structure. The contracting path comprises a sequence of convolutions, each succeeded by a rectified linear unit (ReLU) and a max-pooling operation with a stride of 2 for downsampling. At every downsampling step the number of feature channels is doubled. In the expansion phase of the model, the feature map is upsampled, then subjected to a 2
FCNs based on the encoder (contracting) and decoder (expanding) architectures, usually have millions of parameters and they suffer with the issue of vanishing gradient, this due to large depth of these networks, the signal needs to backpropagate across many layers.
Network architecture of DenseUNet [50].
DenseUNet [50] is a modified version of the classical U-Net architecture, it uses Dense Blocks (DB) to create a densely connected U-Net architecture. Similar to U-Net, the architecture of DenseUNet constitutes of contracting and expanding paths. DenseUNet consists of four major core blocks: Down Transition Block (DTB), Up Transition Block (UTB), bottleneck and Dense Block (DB). The DB are added to the network for solving the vanishing gradient problem. The DTB consists of two layers: 2
In this section we describe the various methods and approaches employed in the proposed method. Preprocessing of HS images is an important step for training the Fully Convolutional Networks (FCNs) models. This section describes the various methods employed for preprocessing the HS images. Further, we also discuss the configuration details of the FCNs in great detail.
Data preprocessing
(a) Spectrum plot for cancerous and non-cancerous regions in a sample HS image. (b) Contribution rate for the top 3 components after apply PCA.
Hyperspectral Image preprocessing pipeline.
The proposed method constitutes of various stages that takes place sequentially, the first stage in the proposed method is to preprocess the HS images, this stage is mutual in both the models. A band-selective approach is implemented, wherein specific bands are chosen based on the spectrum curve of the hyperspectral (HS) image. As depicted in Fig. 7(a), bands ranging from 10 to 50 were selected from the original HS image due to the highest intensity observed in this range of the spectral plot. The preprocessing steps is schematically depicted in Fig. 8. The preprocessing start with the original hyperspectral data cubes; these cubes are firstly normalized by removing the mean and scaling to unit variance. The normalization helps in standardizing the dataset, which makes the further processing much more consistent. After normalization is applied, we then proceed to the next stage, which is the applying Principal Component Analysis (PCA). PCA is an effective algorithm that is used for dimensionality reduction. The storage requirements for hyperspectral images are enormous, this makes it difficult to preprocess HS images. Consequently, training models on hyperspectral images becomes resource-intensive due to the large storage requirements. Initially, the hyperspectral (HS) images have dimensions of
We have used a tile size of 256 pixels, with a stride size of 256 pixels, this ensures that there is no overlapping of the tiled images, each preprocessed HS image yields 20 tiles. These tiles have a dimension of
In this study, we have proposed a U-Net based FCN, that is used for performing semantic segmentation on the preprocessed HS image patches. In [47], a classical U-Net model was utilized to execute semantic segmentation on hyperspectral (HS) images depicting Choledochal Cancer tissues. The model exhibited effectiveness in segmenting the images and demonstrated commendable performance across diverse evaluation metrics. This study proposes a similar U-Net model; however, the proposed model was trained on image patches, which made the model less computational resource intensive. The model was trained with the hyperparameters shown in Table 2. The “n_filters” parameter refers to the number of filters, also known as kernels or channels, is applied to the input data. Each filter is responsible for detecting specific patterns or features in the input data. Increasing the number of filters allows the CNN to learn more diverse and complex patterns from the input data, potentially improving its ability to extract meaningful features and make accurate predictions [36]. The “Dropout Rate” parameter is used for regularizing the FCN. The Dropout Rate of 0.5 was selected to avoid overfitting of the model. By training the model on various dropout rate values and evaluating their performance, we identified 0.5 as the optimal choice for avoiding overfitting of the model while maximizing its effectiveness.
Hyperparameters for U-Net model
Hyperparameters for U-Net model
Parameter information for the proposed U-Net model
A convolution layer requires two important parameters, the kernel size and number of channels. The kernel size is responsible for setting the dimensions of the filter matrix. The channels define the depth or number of channels on which the convolutional layer should operate. The parameters information about the convolutional layers used in the proposed classical U-Net model is depicted in Table 3. The U-Net primarily conducts feature extraction via convolutional and pooling layers, whereas the upsampling process is predominantly achieved through inversion techniques. The model takes an input image of size
U-Net is regarded as the state-of-the-art (SOTA) method for biomedical segmentation [38]. The problem of vanishing gradients restricts the U-Net’s training capability, moreover U-Net often includes millions of learnable parameters which requires enormous number of computational resources. Similar to U-Net, the architecture of U-Net DenseUNet also feature a contracting and expanding paths. DenseUNet, a state-of-the-art method, integrates advancements from both U-Net and DenseUNet. DenseUNet requires relatively fewer parameter that makes training the model a less resource-intensive process, this is further discussed in the consequent section.
Parameter information about the proposed DenseUNet model
Parameter information about the proposed DenseUNet model
The exact configuration of the DenseUNet model used in this study is shown in Table 4. The input layer takes a image of dimensions
The Dense Block (DB) contains “L” number of layers, the value of “L” determines the depth or complexity of the Dense Block, which in turn affects the expressive power and representational capacity of the neural network architecture. The growth rate parameter “g” is a key hyperparameter in architectures like DenseUNet. It determines the number of additional feature maps produced by each layer within a block when transitioning from one layer to the next.
Hyperparameters used for the proposed DenseUNet model
Furthermore, DenseUNet yields comparable results to conventional methods with reduced pre- and post-processing requirements [50]. The proposed DenseUNet model was trained on the same dataset of HS image patches. The DenseUNet model incorporates Dropout layers within DTB, UTB and Bottleneck blocks, this eliminates the need for explicitly defining the Dropout Rate hyperparameter for the model. Similar to U-Net, the “n_filter” parameters are the number of filters applied to the input data. In this study we have used a classical DenseUNet model, which was trained using the hyperparameters shown in Table 5.
The dataset was partitioned such that 70% was allocated for training the models, while the remaining 30% was reserved for testing. An additional 10% percent of the training dataset was used for validating the model at the end of each epoch, this ensures that evaluation metrics are consistent throughout the training phase, this validation strategy is called “epoch-wise validation”. Data augmentation was applied to the patches dataset by randomly flipping the images horizontally and vertically this helps the models generalize well to unseen data. Both the models were trained using “P100” GPU accelerator on Kaggle Platform. The models were optimized using the Adam, which is an popular algorithm used for gradient-based optimization of machine learning models [51]. In order to evaluate the performance of the segmentation models, we compute the accuracy, area under the curve (AUC), sensitivity and specificity of both the models. The models were trained on the spectral patches created from the Hyperspectral Dataset Pipeline.
Performance of the proposed models on the discussed evaluation metrics
Performance of the proposed models on the discussed evaluation metrics
Learning curve for U-Net: (a) Loss curve (b) Accuracy curve.
Learning curve for DenseUNet: (a) Loss curve (b) Accuracy curve.
The performance metrics of both the models, in different phases are shown in Table 6. Both the U-Net and DenseUNet model achieved an accuracy of 73.47% and 77.09% respectively, on the testing split. The learning curve for U-Net and DenseUNet models, which constitutes of the training loss and accuracy curve, can be seen in Figs 9 and 10. respectively.
The validation loss steadily decreases as the training proceeds, similarly the accuracy curve steadily increases throughout the training phase. This trend is evident in both the models. Early stopping based on the training loss, was employed while training both the models, this ensures that the models stop training when the training loss does not improve. The performance of the models on different dataset splits are consistent, this indicates that the models does not overfit.
In this section we discuss the results generated by the proposed models. The DenseUNet model exhibits slightly higher on the discussed evaluation metrics when compared to the U-Net model across all the datasets splits, indicating its effectiveness in performing accurate semantic segmentation tasks. The U-Net achieves an overall accuracy of 73.47% on the testing dataset, while DenseUNet achieves an overall accuracy of 77.08%. The Receiver Operating Characteristic - Area Under the Curve (ROC-AUC) is a performance metric used to evaluate the performance of binary classification models. It quantifies the overall performance of the model across all possible threshold values. The AUC value ranges between 0 and 1, where a higher value indicates better performance. The ROC-AUC for U-Net is 72.27%, while DenseUNet achieves an AUC value of 80.10%. The AUC value indicates that both the models can accurately segment the cancerous and non-cancerous regions of a tissue.
Comparison of proposed models with SOTA models
Comparison of proposed models with SOTA models
Comparisons of memory size (in MB) of the proposed models.
As seen in Fig. 11, the training weights of DenseUNet takes around 2.34 MB whereas U-Net takes around 7.02 MB of memory. DenseUNet has 609,377 trainable parameters, while on the other hand U-Net has 1,839,621 trainable parameters, this indicate that DenseUNet is a robust model, that has relatively less trainable parameters, but it outperforms U-Net on the discussed evaluation metrics. This implies that DenseUNet has relatively less trainable parameters which makes it much more memory-efficient when compared to U-Net.
This section, further discusses and interprets the performance of the models and compare with other SOTA models. In [5, 47], results of three of the conventional algorithms, namely, Support Vector Machine (SVM), Neural Net (NN) and PCA based U-Net trained on HS spectral images of Choledochal tissues was reported, the models achieved an overall accuracies of 93.75%, 94.27% and 61.95% respectively. Additionally, a YOLOv5 model trained on HS image skin cancer achieved and accuracy of 78.70%. Although accuracy is an important evaluation metric that indicates the correctness of a model, however accuracy alone could not be used to assess the overall capabilities for a given model. For image segmentation tasks some of the other important evaluation metrics are specificity, sensitivity and AUC (Area under curve).
A comparison of the proposed models with other SOTA models trained on various datasets is shown in Table 7. The comparison of the proposed models with other SOTA models reveals that the segmentation capabilities of the proposed models are comparable with others models. The results of the proposed models also indicates that patch-based image segmentation approach is more superior when compared to the model trained with entire images [47]. The proposed model performs relatively high on the discussed evaluation metrics, however, when comparing U-Net and DenseUNet models, the DenseUNet models outperforms the U-Net model.
In conclusion, our research highlights the potential of HS imaging and advanced segmentation models for early detection of CCA. While the dataset limitations may have impacted performance to some extent, access to a more extensive dataset could further improve model accuracy. Refining these techniques and expanding dataset access could significantly enhance diagnostic capabilities and patient outcomes in CCA detection.
Future discussion
In this section, we explore the potential future applications and challenges that may emerge from the technologies employed in this study. Hyperspectral (HS) images provide enhanced accuracy compared to conventional RGB color space images. However, capturing HS images necessitates a significant investment in proprietary hardware setups, which are costly and not readily accessible. Hence, there is a necessity to develop cost-effective and easily accessible hardware devices to facilitate the advancement of this technology. Hyperspectral (HS) images demand extensive storage, this results in time-consuming processing of HS images. Therefore, there is a need for software algorithms capable of efficiently processing HS images, thereby simplifying their usage. In summary, the findings of this study lay the groundwork for ongoing advancements and refinements in the field, paving the way for a more effective and accessible implementation of hyperspectral imaging technology in various applications.
