Fine grained automatic left ventricle segmentation via ROI based Tri-Convolutional neural networks

Abstract

BACKGROUND:

The left ventricle segmentation (LVS) is crucial to the assessment of cardiac function. Globally, cardiovascular disease accounts for the majority of deaths, posing a significant health threat. In recent years, LVS has gained important attention due to its ability to measure vital parameters such as myocardial mass, end-diastolic volume, and ejection fraction. Medical professionals realize that manually segmenting data to evaluate these processes takes a lot of time, effort when diagnosing heart diseases. Yet, manually segmenting these images is labour-intensive and may reduce diagnostic accuracy.

OBJECTIVE/METHODS:

This paper, propose a combination of different deep neural networks for semantic segmentation of the left ventricle based on Tri-Convolutional Networks (Tri-ConvNets) to obtain highly accurate segmentation. CMRI images are initially pre-processed to remove noise artefacts and enhance image quality, then ROI-based extraction is done in three stages to accurately identify the LV. The extracted features are given as input to three different deep learning structures for segmenting the LV in an efficient way. The contour edges are processed in the standard ConvNet, the contour points are processed using Fully ConvNet and finally the noise free images are converted into patches to perform pixel-wise operations in ConvNets.

RESULTS/CONCLUSIONS:

The proposed Tri-ConvNets model achieves the Jaccard indices of 0.9491 $\pm$ 0.0188 for the sunny brook dataset and 0.9497 $\pm$ 0.0237 for the York dataset, and the dice index of 0.9419 $\pm$ 0.0178 for the ACDC dataset and 0.9414 $\pm$ 0.0247 for LVSC dataset respectively. The experimental results also reveal that the proposed Tri-ConvNets model is faster and requires minimal resources compared to state-of-the-art models.

Keywords

Left ventricle convolutional neural networks deep learning region of interest MRI

1. Introduction

The heart serves as the powerhouse for the human circulatory system and stands as a crucial organ within the human body. Heart disease significantly impacts human well-being and longevity, with coronary heart disease (CHD) ranking among the foremost global causes of mortality [1]. Consequently, comprehending the operational mechanisms and attributes of the heart becomes pivotal in advancing the prevention and action of heart ailments. Medical scanners have advanced rapidly in recent years. For example, multi-slice cardiac CT, cardiac magnetic resonance imaging (MRI), and three-dimensional ultrasound scans and single-dimensional heart scans in cardiac image analysis [2, 3].

Cardiac ultrasound images play a crucial role in evaluating physiological indicators of the heart and diagnosing conditions utilizing deep learning [4]. The ease of operation, reliability, and practicality of cardiac ultrasound technology make it a potent non-invasive method for conducting comprehensive assessments of cardiac function. The systemic blood supply is managed by the left ventricle. Acquiring measurements such as left ventricular end-diastolic volume, left ventricular end-systolic volume, left ventricular ejection fractions, and stroke volume aids in comprehending alterations in the left ventricle [5, 6]. This process supports the establishment of a measurable approach to prevent and address heart ailments, thereby mitigating the risks and mortality associated with cardiovascular ailments. The pivotal point of the heart is the left ventricle, and signs associated with it play a crucial role in diagnosing heart disease [7]. Therefore, obtaining precise information is essential for subsequent prognosis. However, the morphology of the left ventricle varies across these views, necessitating the classification of these views and subsequent left ventricle detection. This detection process significantly reduces the time doctors spend searching for pertinent information within extensive echocardiographic data [8, 9]. Manual description is a time-consuming process and susceptible to both individual observers and between different observers. Depending on the LV segments using short-axis MRI scans, left ventricular medical variables are commonly gathered [10]. Medical professionals find that manually segmenting data to evaluate these functions takes a lot of time and effort when diagnosing heart diseases. Nevertheless, segmenting these images repeatedly by hand reduce the reliability of the diagnosis [11]. For this reason, a fully computerized method is needed to help clinicians perform more effectively. In this paper, we propose a combination of different deep neural networks for semantic LV segmentation based on Tri Convolutional Networks (ConvNets) to obtain highly accurate segmentation. By employing advanced DL models such as CNN [12], LSTM [13], YOLO [14] and so on contribute to detecting different types of diseases. The key contribution of the proposed model is as follows.

Initially, the CMRI images are pre-processed to remove the sound artefacts and enhance the image quality. The RoI based extraction is carried out in three different stages to accurately identify the LV in noise free CMRI.

The initial phase is to generate a bounding box on LV, then cover up the contour edges of LV with canny edge detector and finally the contour points are marked with the mask thresholding.

Then, the extracted features are given as an input to the three different deep learning structures for segmenting the LV in the efficient way. Although a Transformer is utilized for extracting broad features from cardiac MR images, Tri-ConvNets gather fine-grained details and coarse-grained semantics from entire levels.

Afterwards, the contour edges are processed in the standard ConvNet, the contour points are processed using Fully ConvNet and finally the noise free images are converted into patches to perform the pixel-wise operation in ConvNets.

Finally, the three ConvNets performs different operations to give different segmentation results and these results concatenate to calculate the circular similarity for attaining better LV segmentation output.

The remaining portion of the study is planned in the subsequent way: The relevant literature is thoroughly reviewed in Chapter 2, the Tri-ConvNets that are suggested for LV segmentation are explained in depth in Chapter 3, and the trial results and comments are presented in Chapter 4 and discussed in Chapter 5. The work is concluded in Chapter 6, which also suggests directions for further study.

2. Literature survey

Diagnosing coronary artery disease requires examining operational heart parameters such as regional indexes, ejection function, and end-systolic volume sector. Recently, there has been a lot of interest in the segmentation of the left ventricle using cardiac magnetic resonance imaging as an initial phase towards this goal. To determine the exact position of the LV, a number of computer vision techniques based on artificial intelligence have been widely employed.

In 2019 Hsu et al. [15] presented quicker active shape model and region-based convolutional neural network for automatically identify, track, and segment the LV in cardiac illustration patterns. An enhanced adaptive anisotropic diffusion filter to efficiently strengthen image outlines and lower noise. An LV detection rate of 0.88% is achieved during the segmentation process.

In 2019 Hu et al. [16] suggested robust algorithm to increase the coronary MRI’s automated LV segmentation speed. The key techniques are developed in this segmentation algorithm are CNN coarse segmentation of LV images and ROI extraction. Using the SegNet method, the Jaccard indexes and epicardial contours are 0.80 and 0.76. The diverse sources of image variability can hinder analysis, making the assessment of extreme slice images challenging due to unexpected variances.

In 2020 Amer et al. [17] recommended dL framework ResDUnet, a U-net-based system with the dilated convolution united, is utilized. To address the issue of variation in LV dimensions and forms, incorporate a map of features produced by cascaded dilation into U-net’s data synthesis procedure. The proposed model shows a dice similarity at 0.951% for the LV segmentation.

In 2020 Yang et al., [18] recommended using RetinaNet for Multiview echocardiography to recognize A2C, A3C, and A4C visuals and locate the LV. The LV recognition efficiency is measured and the mIOU metrics for A2C, A3C, and A4C are 0.858, 0.794, and 0.838, correspondingly. The feature pyramid network (FPN), is utilized for the classification task.

In 2020 Wu et al. [19] established the CNN to find the ROI and the U-net networks for splitting the LV. using the cardiac MRI data from the MICCAI 2009 LV segment model testing and training. The measurements include Hausdorff distance (HD), volumetric overlap error (VOE), and dice metric (DM), with values of 3.641, 0.053, and 0.951, respectively.

In 2020 Leclerc et al., [20] planned LU-Net to enhance 2D echocardiogram segmentation. The CAMUS dataset is used in the study, and the images are hand chosen ROIs centered on the segmentation masks of guidance. The related images were trimmed from these ROIs to generate fresh data that were analyzed using the standard U-Net structure. Additionally, the technique analyses the left ventricular end-diastolic and end-systolic amounts, resulting in a mean correlation of 0.96 and a mean absolute error of 7.6 ml.

In 2020 Abdeltawab et al. [21] offered a deep learning method for the automatic measurement and segmentation of the left ventricle from cardiac cine magnetic resonance data. FCN design is used for the LV blood pool centre-point’s dependable localization. The LV cavity and myocardium are then segmented using a ROI. The framework was validated using the ACDC-2017 dataset, yielding improved recognition and precise estimate of cardiac factors. The proposed model shows a dice similarity at 98.13% for the LV segmentation.

Figure 1.

Schematic illustration of the proposed Tri-ConvNets model.

In 2021 Amer et al. [22] proposed a fully computerized approach for LV segmentation that manages the LV margins and form variations while accurately delineating the ventricle boundaries. ResDUnet, has been developed using the U-net architecture. By integrating cascaded dilated convolution, the characteristic extraction at various scales is identified. ResDUnet achieves a Dice consistency improvement of 0.95%.

In 2023 Irshad et al. [23] introduced the Light U-Net model, a sophisticated histogram-based image-enhancing method for LV segmentation. Following improvement, the images are sent into U-Net’s encoder-decoder architecture via a brand-new lightweight process paradigm. The MICCAI 2009 dataset yielded a dice factor value of 97.7% and was utilized as a validation tool for the suggested approach.

In 2023 Li et al., [24] created the multi-task neural EchoEFNet for the purpose of classifying and segmenting LVs. The core used for gathering high-dimensional information while preserving spatial characteristics is the ResNet50 model. The biplane Simpson’s approach was used to seamlessly and precisely estimate the LVEF. The suggested technique was evaluated using the CAMUS and CMUEcho datasets, which yielded dice coefficient values of 0.936% and 0.936%, correspondingly.

The study mentioned suggests that DL methods are a basic building block of many modern methods utilized in segment LV patching. It’s less useful for real-time segmentation despite being expensive, difficult in terms of time, and requiring an extended training phase. This process and MRI imaging’s limited capacity to adjust to different conditions are other hindrances. This paper presents a novel Tri-ConvNets approach to address the aforementioned problems.

3. Proposed method

In this paper, we propose a combination of different deep neural networks for semantic segmentation of left ventricle based on Tri Convolutional Networks (ConvNets) to obtain highly accurate segmentation. Initially, the CMRI images are pre-processed to remove the noise artefacts and enhance the image quality.

The RoI based extraction is carried out in three different stages to accurately identify the LV in noise free CMRI. The first stage is to create a bounding box on LV, then cover up the contour edges of LV with canny edge detector and finally the contour points are marked with the mask thresholding. These features are given as an input to the three different deep learning structures for segmenting the LV in the efficient way. Tri-ConvNets have the ability to gather both smooth and coarse-grained semantic information from entire levels. The contour edges are processed in the standard ConvNet, the contour points are processed using Fully ConvNet and finally the noise free images are converted into patches to perform the pixel-wise operation in ConvNets. These three ConvNets performs different operations to give different segmentation results and these results concatenate to calculate the circular similarity for attaining better LV segmentation output. Figure 1 depicts the illustration of proposed Tri-ConvNets model.

3.1 Adaptive bilateral histogram equalization filter (ABIHE)

In this part, adaptive bilateral (AB) and histogram equalization filters are utilized to pre-process the input images to minimize noise distortion and improve image quality. HE is applied after the AB filter has smoothed the image to improve its quality and eliminate additional noise. The pre-processing of the left ventricle is exposed in Fig. 2.

Figure 2.

Schematic illustration of Pre-processing process for left ventricle.

3.1.1 Adaptive bilateral filter

The bilateral filter effectively minimizes noise in images while preserving edge features by selectively allowing pixels to contribute to the scaled average. Its application results in smoother images, and various filters within this category can also identify object boundaries in images. Smooth images with preserved essential data are achieved through the use of the bilateral filter, as illustrated in Fig. 2. The bilateral filter yields optimal results, and pixels on one side of the boundary influence the altered pixel value due to bilateral effects. This process maintains the noise in the image, and the pixel value of the input image is mathematically defined in Eq. (1).

$\displaystyle H_{\left({{o},{l}}\right)}={\cal N}_{{a}{b}}\left({{o},{l}}% \right)\mathop{\sum}\limits_{{x}=-3\sigma}^{3\sigma}\delta_{{a}{b}}\left({{o},% {l},{u},{v}}\right){\cal I}\left({{o}+{u},{l}+{x}}\right)$ (1)

Let ${\cal N}_{{a}{b}}$ represent a normalization bilateral filter, and $\delta_{{a}{b}}$ denote the dimensions corresponding to each pixel in the filter space. The bilateral filter is employed to achieve the smoothing function ${f}_{{z}}\left({{u},{v}}\right)$ for estimating the Gaussian size. Introducing a Gaussian weight factor through a visual inclusion factor enables the flexible selection of pixel values in the scaled overall form $\left({{o},{l},{u},{v}}\right)$ . This, in turn, yields the expression for the bilateral size estimation function as given in Eq. (2).

$\displaystyle\delta_{{a}{b}}\left({{o},{l},{u},{v}}\right)={f}_{{z}}\left({{u}% ,{v}}\right)$ (2)

${\cal N}_{{a}{b}}$ is a normalization factor that is determined by taking the reverse of the total dimensions. It translates the sum of the bilaterally dimensioned forces into an average function. ${\cal N}_{{a}{b}}$ is demonstrated in Eq. (3).

$\displaystyle{\cal N}_{{a}{b}}\left({{o},{l}}\right)=\frac{1}{\mathop{\sum}% \nolimits_{{x}=-3\sigma}^{3\sigma}\mathop{\sum}\nolimits_{{x}=-3\sigma}^{3% \sigma}\delta_{{a}{b}}\left({{o},{l},{u},{v}}\right)}$ (3)

The bilateral filter serves as a nonlinear filter for reducing noise. Filtering and pixel-level considerations impact the outcomes. Equation (4) is employed to calculate the range of filter pixels, taking into account their distance from the point of origin.

$\displaystyle{g}\left({{o}-{l}}\right)=\frac{1}{2}{\cal Q}^{-({o}-{l})^{\wedge% }2\left({\frac{1}{2\tau}}\right)}$ (4)

From the above equation, ${o}-{l}$ be the pixel value of the image position, and $2\tau$ represents the spatial function. The range filter assigns weights to pixels based on the degree of sound variation, as indicated in Eq. (5).

$\displaystyle{T}\left({{h}\left({k}\right)-{h}\left({l}\right)}\right)=\frac{1% }{2}{\cal Q}^{\frac{\left({{h}\left({k}\right)-{h}\left({l}\right)}\right)% \left({{h}\left({k}\right)-{h}\left({l}\right)}\right)}{2\tau_{\varsigma}}}$ (5)

From the equation, ${h}\left(.\right)$ represent the pixel function of the input image, and $2\tau_{\varsigma}$ denotes the filtering set of the image. The images devoid of noise are inputted into the HE filters to improve the quality of the initial images.

3.1.2 Histogram equalization

Pre-processing is conducted on gathered images to safeguard crucial information in the LV. The output pixel value is generated by linearly combining the pixel factors of the input through linear filters. The acquired images undergo initial processing utilizing an ABIHE filter, aiming to enhance image quality while preserving vital information. ABIHE filters prove to be efficient in refining and improving snapshots, particularly when confronted with images exhibiting poor contrast, noise, and variable local characteristics. This preprocessing involves tasks such as noise reduction, picture denoising, and scaling. The objective of these operations is to enhance the overall quality of the final product. Such techniques are crafted to elevate the quality of the end result and ready the image for subsequent processing.

Histogram Equalization is a critical technique in image processing, particularly valuable in improving the contrast of images. Its significance lies in its ability to redistribute the intensity distribution of an image, making the details in darker and lighter regions more visible and distinguishable. This is particularly important in medical imaging, where the clarity and contrast of images directly impact the accuracy of diagnosis and analysis. The segmentation of the left ventricle from CMRI images as discussed in the proposed model, Histogram Equalization can play a pivotal role. By enhancing the contrast, Histogram Equalization make the boundaries between different regions of the heart more defined, which is crucial for accurate edge detection and segmentation.

The ABIHE filtering operates on pixels located in the immediate vicinity of each pixel in the image. The ABIHE filter assigns weights to assess the impact of neighboring pixels on the processed pixel, and these weights are employed to calculate the weighted average of pixel values within the neighbourhood. Image processing employs histogram equalization as a method to enhance contrast by redistributing pixel intensities. Although beneficial in certain image analysis scenarios, it’s not universally required. The Histogram Equalization Module individually equalized each sub-histogram using specific equations. The ultimate output image results from the summation of all generated sub-images.

$\displaystyle{D}=\left\{{{\begin{array}[]{ll}e\ast l&0\leqslant w<z\\ f\ast\left({{w}-{t}}\right)+d&x\leqslant w<z\\ g\ast\left({{w}-{o}}\right)+b&y\leqslant w<e-1\\ \end{array}}}\right.$ (6)

The pixel intensity values ${e},{f},{g}$ and ${d}$ to encompass the entire dynamic range of the images. Let ${b}$ represents the input image, and the calculated intensity range defines the input range. Presently, the improved images are inputted into the Region of Interest (RoI) to extract pertinent features from the image for segmentation.

3.2 Region of interest (RoI) extraction

In this phase, a RoI extraction technique is employed to detect left ventricle segmentation. The RoI extraction method is utilized to segment the LV, as progressive heart diseases resulting from cardiovascular injury. Subsequently, features are extracted from raw pixel intensities through a three different deep learning approaches. Following feature extraction, test images are segments. The ROI refers to an area within an image that proves useful for specific objectives. ROI extraction is crucial for reducing computation time and costs while enhancing accuracy. In the context of CMRI imaging, the region surrounding and encompassing the disc is identified as the ROI of a LV image. R-CNN represented a significant leap forward in the realm of object detection, playing a crucial role in advancing the field. It stood out as one of the pioneering models that demonstrated the capabilities of deep learning in handling object detection tasks. The R-CNN architecture consists of three key stages: the backbone, Region Proposal Network (RPN), and RoI. Convolutional techniques are employed on input images to extract the relevant conceptual information. During the initial stage, the RPN generates suggestions that may encompass foreground elements, while in the subsequent step, the RoI network refines the proposal results and performs segmentation. Both the RPN and RoI networks leverage the convolutional feature maps as their foundational components for these processes.

Left ventricle segmentation in noise-free CMRI involves a three-stage RoI-based extraction process to achieve accurate identification. Initially, a bounding box is created around the LV in the first stage, followed by the application of a Canny edge detector to mask the contour edges of the LV. Subsequently, contour points are identified through mask thresholding. These extracted features serve as input for three distinct deep learning structures, collectively referred to as Tri-ConvNets, aimed at efficiently segmenting the LV. Following the rectification of the image region by the RoI network stage, RoI Align was utilized to obtain improved feature maps. These enhanced feature maps were then inputted into the mask layer for image segmentation. Equations (7) and (8) represent the RPN layer and the RoI layer, respectively, and these layers contribute significantly to the complexity in the loss variation within the RCNN. The loss value of the RPN is formed by adding the regression and region-segmented losses.

$\displaystyle\textit{RPN}_{\textit{ls}}=\mathop{\sum}\limits_{i=1}{\cal Q}_{l}% \left({{\cal N}_{i}}\right)+{\cal P}_{l}\left({{\cal N}_{i}}\right)$ (7)

The RoI layer’s loss rate has been split up into three components: segmentation loss $L_{\textit{los}}$ , regression loss ${\cal P}_{l}$ , and classification loss ${\cal E}_{l}$ . A feature map with different scales will be combined bilinearly by the RoI method to create a uniform scale. The RoI layer’s overall loss is calculated as follows,

$\displaystyle\textit{ROI}_{l}={\cal Q}_{l}+{\cal P}_{l}+{\cal E}_{l}$ (8)

Bilinear interpolation will be performed by the RoI procedure to transform a feature map with different scales into one scale. The segmentation level is also in charge of segmenting the region of interest. To achieve extremely high accuracy in ROI recognition on the datasets implemented.

3.2.1 Extraction of bounding boxes through region of interest (ROI)

In medical image analysis, the extraction of the LV is a vital phase for various applications, such as quantitative analysis and clinical diagnosis. Bounding box pattern in left ventricle extraction refers to the use of bounding boxes to define and extract the ROI comprising the left ventricle from medical images. Before extracting the left ventricle, preprocessing steps applied to enhance the quality of the images. Once the approximate location is identified, a bounding box is defined around the region of interest in left ventricle. A bounding box is a rectangular area that encloses the target object. The coordinates of the bounding box $\left({{o},{p},{q},{r}}\right)$ are determined based on the detected location. The pixels within the bounding box are then extracted to form the ROI. This step essentially isolates the left ventricle from the rest of the image. Once the bounding box is generated for the left ventricle, the contour edge and contour points are estimates using the RoI extraction approach. The bounding box pattern simplifies the process of left ventricle extraction by providing a well-defined region for analysis. In the proposed model, a novel Tri-ConvNets approach is utilized to process CMRI images for the precise segmentation of the left ventricle using Tri-Convolutional Networks (Tri-ConvNets). Initially, contour edges are processed using a standard ConvNet, a method that leverages convolutional layers to detect and enhance the edges of the left ventricle, critical for understanding its shape and boundaries. Following this, the contour points, which are vital for delineating the exact shape of the left ventricle, are processed through a Fully Convolutional Network (Fully ConvNet). This network is adept at handling spatial data for semantic segmentation, allowing for a detailed mapping of the left ventricle’s contours. Finally, to ensure high-precision segmentation down to the pixel level, noise-free images are segmented into smaller patches. These patches are then processed through Convolutional Networks (ConvNets) designed for pixel-wise operations, enabling a fine-grained analysis and segmentation of the left ventricle by evaluating the individual pixel attributes within each patch. This approach allows for efficient and highly accurate segmentation of the left ventricle by leveraging the strengths of different ConvNet architectures tailored to specific aspects of the segmentation task.

3.2.2 Extraction of contour edges through ROI

The contour of an object in an image refers to the outline or boundary that separates it from the background. The contour edge pattern involves the analysis of the patterns formed by the edges of the object. The Canny edge detector is an algorithm used in image processing for edge detection. It is employed in the extraction of the LV from cardiac images. Sensitivity to image noise is a notable characteristic of edge detection. Essentially, the blur is less visible for the smallest kernel. Converting the input image to grayscale by adjusting contrast and brightness is necessary to blur the image and eliminate noise. To make edge location and detection effective, a filter is utilized to remove noise in the main image. The Gaussian filter is commonly used for this purpose and is expressed as follows.

$\displaystyle{g}\left({{X},{y}}\right)=\frac{1}{2\pi\tau^{2}}{\exp}\left(-% \frac{{X}^{2}+{y}^{2}}{2\tau^{2}}\right)$ (9)

From the above equation, ${y}$ specifies the axial position of source, ${y}$ represents the axial location, and $\tau$ is the Gaussian’s variance.

The Canny edge detection operator identifies both the edges and their directional intensities. The points in the direction of the most intensity variation is represented by the gradient, which is normalized as a unit vector. Initially, the vertical and horizontal components of the gradient are determined, followed by the computation of the gradient’s magnitude and orientation. The calculations for gradient magnitude ( ${\Upsilon}$ ) and gradient angle ( ${z}$ ) proceed as follows:

$\displaystyle{\Upsilon}=\sqrt{{X}_{X}^{2}}+{X}_{y}^{2}$ (10) $\displaystyle{z}=\tan\left({\frac{{X}_{y}}{{X}_{x}}}\right)$ (11)

From the above equation, ${X}_{y}$ , and ${X}_{x}$ demonstrate the angles of the image. Finally, the Canny edge detection algorithm utilizes to detect contour edge of LV.

3.2.3 Extraction of contour points through ROI

In this part, the contour points are detected after detecting contour edges. Contour points play a crucial role in extracting and delineating the outer border region of the left ventricle images. In the context of left ventricle image extraction, these points represent the boundary of the left ventricle within the image. These points are usually obtained through image processing techniques that identify edges and contours within the image. Various image processing techniques extract contour points and outline the left ventricle. These identified points form a closed curve representing the ventricle’s boundary. Crucial for image extraction, contour points define the ventricle’s boundaries, essential for quantitative analysis and diagnoses in cardiac imaging. Boundary boxes, contour edges, and contour points are extracted using a RoI extraction model, and these features are fed into three deep learning networks for LV segmentation.

3.3 Tri-convolutional networks (ConvNets)

In this phase, the extracted features are fed input to the three different deep learning structures for segmenting the LV in the efficient way. Generalized characteristics are extracted from cardiac MRI images using a Transformer, and coarse-grained semantics and fine-grained features are extracted from complete scales using Tri-ConvNets. It resolves the issue of poor segmentation accuracy brought on by hazy LV edge data. The contour edges are processed in the standard ConvNet, the contour points are processed using Fully ConvNet and finally the noise free images are converted into patches to perform the pixel-wise operation in ConvNets. These three ConvNets performs different operations to give different segmentation results and these results concatenate to calculate the circular similarity for attaining better LV segmentation output. The logic behind using the Tri-ConvNets for the semantic segmentation of the left ventricle lies in leveraging the complementary strengths of diverse network architectures to address the multifaceted challenges of medical image segmentation. Each component of the Tri-ConvNets is tailored to a specific aspect of the segmentation process: standard ConvNets excel at identifying and enhancing contour edges, capturing the boundary details essential for accurate shape delineation; Fully Convolutional Networks (Fully ConvNets) are applied for processing contour points, providing precise mappings of the ventricle’s contours by efficiently handling spatial data; and ConvNets designed for pixel-wise operations on segmented patches allow for detailed analysis at the pixel level, ensuring high-resolution segmentation outcomes. This multifaceted approach enables a more robust and nuanced analysis of cardiac magnetic resonance imaging (CMRI) images, significantly improving the accuracy and efficiency of left ventricle segmentation.

3.3.1 Standard ConvNets

Contour edge segmentation using Standard CNN involves the extraction of boundaries and edges from an input image. A standard CNN architecture for this task typically consists of several layers. The convolutional layers play a vital part in capturing local patterns and features. Forecasts using feature learning are aided by fully linked layers at the final stage of the structure. Architecture of Standard Convolutional Network shown in Fig. 3.

Figure 3.

Architecture of standard convolutional network.

Let $\alpha$ specifies the input image and $\gamma$ denotes the output image. The convolutional operation is specified in equation (). The dimension of the matrix is denoted as ${z}$ ,and the bias of the image denoted as ${p}$ . The below represented operation is a non-linear activation factor. The ReLU function is denotes as ${{\cal Q}}^{\prime}=f({\cal Q})$ . The output of the minimum value within a local region is expressed as ${A}=\max_{pl}({{\cal Q}}^{\prime})$ .

$\displaystyle{\cal Q}=\alpha\times{z}+{p}$ (12)

For contour edge segmentation, the final layers of the network are designed to produce a binary segmentation map indicating the presence or absence of edges. The output can be obtained using a sigmoid activation function.

$\displaystyle z^{\prime}=\pi({g}^{\prime}\cdot\alpha+p^{\prime})$ (13)

From the above expression, ${{g}}$ and ${{p}}^{\prime}$ specifies the weight and bias of the final layer and $\pi$ be the sigmoid function. The loss function used during training is frequently the binary cross-entropy loss that trials the dissimilarity among the forecast output and the ground truth segmentation map. The standard CNN for contour edge segmentation involves the application of convolutional and pooling operations to extract hierarchical features, followed by fully connected layers to make final predictions. The system is qualified to minimize the binary cross-entropy loss, ensuring accurate edge segmentation in the outcomes.

3.3.2 Fully CNN

An FCN for contour points segmentation is a deep learning architecture designed to address pixel-wise segmentation tasks. FCNs are particularly well-suited for such tasks as they preserve spatial information through the use of convolutional layers. The key innovation of FCNs is their ability to handle input images of arbitrary sizes and produce output segmentation maps with the same spatial dimensions as the input. Architecture of Fully ConvNet shown in Fig. 4.

Let, ${\Upsilon}$ represent the input image, and $\xi$ denote the ground truth map. The encoder consists of convolutional layers ${\cal F}_{\varepsilon}$ that encode spatial information, and the decoder consists of transposed convolutional layers ${\cal F}_{d}$ that up sample the feature maps. The final segmentation map is generated by applying a SoftMax activation function, denoted as ${\cal F}_{\cal S}$ , to obtain class probabilities for each pixel and is expressed in Eq. (14).

$\displaystyle\hat{\xi}={\cal F}_{\cal S}({\cal F}_{d}({\cal F}_{\varepsilon}(% \gamma)))$ (14)

For training purposes, a cross-entropy loss rate generally utilized., which measures the dissimilarity between the predicted segmentation map $\hat{\xi}$ and the ground truth $\xi$ .

$\displaystyle{\cal Q}\left(\hat{\xi},\xi\right)=-\frac{1}{e}\sum^{d}_{q=1}\xi_% {p,q}\log(\hat{\xi}_{p,q})$ (15)

From the above evaluation, $e$ be the sum of pixels, $\varrho$ is the sum of classes, $\xi_{{p},{q}}$ and $\hat{\xi}_{p,q}$ represent the ground truth and predicted probabilities for pixel ${p}$ and ${q}$ , respectively. After segmenting the images with fully convolutional networks, contest patches are generated using pixelwise CNNs.

Table 1

Hyperparameter of proposed Tri-ConvNets

Hyperparameter	Range
Learning rate	0.0001 to 0.1
Batch size	16, 32, 64, 128
Number of epochs	10, 50, 100
Pooling size	(2 $\times$ 2), (3 $\times$ 3)
Activation function	ReLU, Sigmoid, Tanh
Dropout rate	0.0 to 0.5

Figure 4.

Architecture of fully ConvNet.

3.3.3 Pixelwise CNN

Pixelwise CNN are a class of DL models designed for image segmentation tasks, such as contest patches segmentation. These networks are particularly effective in capturing local dependencies within images, making them suitable for tasks where the relationship between neighboring pixels is crucial. In the context of contest patches segmentation, the goal is to classify each pixel into distinct classes, indicating the presence or absence of specific features in the image. The architecture of a Pixel CNN involves stacking multiple convolutional layers to progressively learn hierarchical representations of the input image. Architecture of Pixel wise Convolutional Network shown in Fig. 5.

Figure 5.

Architecture of pixel wise convolutional network.

Each convolutional layer employs a receptive field that captures information from a local neighbourhood of pixels. The output of a pixelwise convolutional layer can be expressed in Eq. (16).

$\displaystyle{d}^{\prime}_{{p},{q}}=\xi\left({\omega\times{d}_{{p},{q}}+% \varsigma}\right)$ (16)

From the above equation, ${d}^{\prime}_{{p},{q}}$ denotes the input pattern at pixel location $\left({{p},{q}}\right)$ , $\omega$ and $\varsigma$ denotes the convolutional filter parameters and bias factor. For contest patches segmentation, the final layer of the Pixel CNN is often a SoftMax layer, which outputs probability distributions over different classes for each pixel.

The segmentation task for pixelwise classification problem, and the cross-entropy loss is commonly utilized to evaluate the dissimilarity among predicted and ground truth probability distributions and is expressed in Eq. (17).

$\displaystyle\delta=-\frac{1}{{\Gamma}}\phi_{{p},{q}}\phi_{o}\upsilon_{{p},{q}% ,o}{\log}\left(\hat{\upsilon}_{p,q,o}\right)$ (17)

From the above equation, ${\Gamma}$ is the sumr of pixels, $\upsilon_{{p},{q},o}$ is the ground truth probability of class $o$ at pixel $\left({{p},{q}}\right)$ , and $\hat{\upsilon}_{p,q,o}$ is the predicted probability of class at $o$ at the same pixel. The network is trained to minimize this loss function, optimizing its parameters to accurately segment contest patches based on the input images.

The Table 1 lists various hyperparameters along with their respective ranges, which are crucial for tuning the performance of neural network models. The learning rate, which influences how much the model’s weights are adjusted during training, varies from 0.0001 to 0.1, indicating a spectrum from very fine to more significant updates. The number of epochs (10, 50, 100) refers to the entire dataset is passed through the network, influencing the extent of training. Activation functions such as, ReLU, Sigmoid, Tanh are affecting how neural signals are transformed, with each option having distinct characteristics suited for different tasks. Finally, the dropout rate range from 0.0 to 0.5 defines the proportion of neurons randomly ignored during training, helping to prevent overfitting by promoting generalization.

4. Results and discussion

In this work, Sunnybrook and York dataset is implemented for the segmentation of LV. The Sunnybrook Cardiac Data is a dataset used in medical imaging research, particularly in the field of cardiac image analysis. It is widely employed for tasks such as LV segmentation and motion estimation in cardiac MRI scans. Cardiovascular disorders can be diagnosed non-invasively using cardiovascular magnetic resonance imaging (CMRI). CMRI is frequently used to evaluate the left and right ventricles’ functional integrity in order to spot structural alterations in the heart. CMRI scans are frequently used to obtain clinical parameters of the left ventricle (LV), such as ejection fraction and LV volumes. Tables 5 and 6 listed the segmentation accuracy from Sunnybrook dataset. The Tables 2 and 3 listed the segmentation accuracy from York dataset.

The study’s experimental setup utilized Spyder, an Anaconda navigator running on a Windows 10 operating system. The PC employed an Intel i5 core processor with a speed of 2.10 GHz and a 16 GB RAM system.

Figure 6.

Sample segmentation results (a) Sunnybrook dataset and (b) York dataset.

Figure 7.

Experimental result of the proposed model.

Figure 6 display the segmentation results for the proposed Tri-ConvNets models. The images are pre-processed and taken as input to the planned network. These inputs are processed in each convolutional layer. The convolutional layers are utilized for the LV segmentation, which is shown in above figure. The Fig. 6(a) demonstrates the segmentation result from the sunny brook dataset and Fig. 6(b) shows the segmentation result of LV from York dataset, respectively.

The trial outcomes of the proposed model is expressed in Fig. 7. The input image (column 1) is pre-processed to remove noise artefacts and enhance image quality using ABIHE filter (column 2). Then ROI-based extraction is done in three stages to accurately identify and segment the LV and segmented results are in column 3.

4.1 Evaluation metrics

Analysing various characteristics, such as accuracy, PSNR, MSE, DI and JI the results of the research demonstrate the dependability of the LV segmentation. Standard parameters such as True Positive ( $T_{r}Pt^{+})$ , True Negative ( $T_{r}Nt^{-})$ , False Positive ( $F_{s}Pt^{+})$ , and False Negative ( $F_{s}Nt^{-})$ can be used to offer the assessment measures that have been stated. The performance has been determined utilising the subsequent statement. PSNR is a data utilized in image analysis to rate the effectiveness of enhancement of images.

$\displaystyle A=\frac{T_{r}\textit{Pt}^{+}+T_{r}\textit{Nt}^{-}}{T_{r}\textit{% Pt}^{+}+T_{r}\textit{Nt}^{-}+F_{s}\textit{Pt}^{+}+F_{s}\textit{Nt}^{-}}\times 100$ (18) $\displaystyle\textit{PSNR}=10\log_{10}\left(\frac{r^{2}}{{m}{s}{\cal E}}\right)$ (19) $\displaystyle\textit{MSE}=\frac{1}{{p}{q}}\mathop{\sum}\limits_{{t}=1}^{q}% \mathop{\sum}\limits_{{f}=1}^{p}({y}\left({{f},{t}}\right)-z\left({{f},{t}}% \right))^{2}$ (20) $\displaystyle\textit{MAE}=\mathop{\sum}\limits_{o=1}^{p}\frac{\left|{{g}_{o}-% \hat{g}_{r}}\right|}{{p}}$ (21)

The DI is a common metric to evaluate the reliability of image segmentation algorithms, used for macular edema segmentation. It measures the overlap between the segmented region in the image, providing high quality of image with better segmentation accuracy. The statistical method used to evaluate the range and consistency in data sources is the JI.

$\displaystyle\textit{DI}=\frac{2{T}_{\textit{pos}}}{{F}_{\textit{pos}}+2{T}_{% \textit{pos}}+{F}_{n}}$ (22) $\displaystyle\textit{JI}=\frac{{T}_{\textit{pos}}}{{T}_{\textit{pos}}+{F}_{% \textit{ng}}+{F}_{\textit{pos}}}$ (23)

True positives and negatives of the sample images are denoted by ${T}_{\textit{pos}}$ and $T_{\textit{ng}}$ , while false positives and negatives are indicated by ${F}_{\textit{pos}}$ and ${F}_{\textit{ng}}$ .

4.2 Comparison analysis for pre-processing filters

Table 2
Analysis of the classic denoising filters performance comparisons

Filter	PSNR	MSE	MAE
Mean	31.25	0.28	0.89
Median	39.17	0.69	68.03
Histogram equalization	45.22	0.71	64.83
Bilateral filter	46.32	0.83	67.42
Proposed ABIHE filter	52.21	0.98	70.31

In this table, several DL based filters are compared with traditional approaches with specific metrics include PSNR, MSE, and MAE, respectively. Table 2 depicts the comparison between denoising models. A minimal quantity of error rates was obtained by the proposed ABIHE filter. On the other hand, PSNR values are obtained higher than the other values. Comparing the ABIHE filter to different denoising strategies, the analysis found that it offers the least MSE. According to this conclusion, the ABIHE filters have the lowest MSE values, and when compared to other denoising filters, the suggested Tri-ConvNets exhibits low error rates for various noise ratios. Figure 8 Visualizes the comparison of denoising filters.

Table 3

Segmentation accuracy of York dataset with MR images

Algorithm	Sensitivity	Specificity	PPV	NPV
Tri-ConvNets	0.9792 $\pm$ 0.0017	0.9936 $\pm$ 0.0001	0.9880 $\pm$ 0.0071	0.9992 $\pm$ 0.0071
CNN $+$ U-Net	0.9726 $\pm$ 0.0025	0.9971 $\pm$ 0.0019	0.9819 $\pm$ 0.0106	0.9989 $\pm$ 0.0106
CPR	0.9662 $\pm$ 0.0062	0.9952 $\pm$ 0.0030	0.9761 $\pm$ 0.0142	0.9985 $\pm$ 0.0142
MA-Shape	0.9643 $\pm$ 0.0175	0.9923 $\pm$ 0.0009	0.9700 $\pm$ 0.0178	0.9981 $\pm$ 0.0178
DLDP	0.9577 $\pm$ 0.0123	0.9908 $\pm$ 0.0086	0.9640 $\pm$ 0.0211	0.9977 $\pm$ 0.0211
Omega-Net	0.9508 $\pm$ 0.0216	0.9900 $\pm$ 0.0032	0.9584 $\pm$ 0.0251	0.9973 $\pm$ 0.0251
FCN	0.9432 $\pm$ 0.0230	0.9941 $\pm$ 0.0062	0.9519 $\pm$ 0.0284	0.9969 $\pm$ 0.0284

4.3 Comparison analysis on York dataset

The suggested Tri-ConvNets segmentation approach perform better than the other segmentation techniques, according to the performance metrics. The superior LV segmentation performance from the Tri-ConvNets approach is indicated by a higher PPV of 0.9880 and greater Jaccard index of 0.9497.

Table 4
York dataset segmentation validation with Dice and Jaccard indices

Algorithm	Dice Index	Jaccard Index
Tri-ConvNets	0.9403 $\pm$ 0.0071	0.9497 $\pm$ 0.0071
CNN $+$ U-Net	0.9365 $\pm$ 0.0106	0.9496 $\pm$ 0.0106
CPR	0.9333 $\pm$ 0.0142	0.9480 $\pm$ 0.0142
MA-Shape	0.9332 $\pm$ 0.0178	0.9479 $\pm$ 0.0178
DLDP	0.9315 $\pm$ 0.0211	0.9476 $\pm$ 0.0211
Omega-Net	0.9322 $\pm$ 0.0251	0.9475 $\pm$ 0.0251
FCN	0.9323 $\pm$ 0.0284	0.9466 $\pm$ 0.0284

The evaluation metrics used in medical image segmentation play a crucial role in assessing the performance of algorithms. The Sensitivity and Specificity are commonly used in this context. Sensitivity measures the proportion of actual positive cases correctly identified by the algorithm. In medical imaging, high sensitivity is essential because missing a true positive could have serious consequences for patient care. Specificity quantifies the ability of the algorithm to correctly identify negative cases. High specificity ensures that true negatives are not misclassified as positive. The Sensitivity emphasizes minimizing false negatives, while Specificity focuses on minimizing false positives. Achieving a balance between these two metrics is crucial for accurate and reliable medical image segmentation.

Figure 9 illustrates the higher similarity between the automated and the manual segmentation on York dataset MR images. Figure 6 depicts sample segmentation results of short axis MR images from all four datasets.

Table 5

Performance comparison of existing techniques with proposed model

Author	Methods	Dice Index	Jaccard Index
Zhang et al. [25]	FC-Dense Net	0.9466 $\pm$ 0.0120	0.9415 $\pm$ 0.0110
Gayathri et al. [26]	ES-FCN	0.9427 $\pm$ 0.0120	0.9449 $\pm$ 0.0120
Proposed model	Tri-ConvNets	0.9403 $\pm$ 0.0071	0.9457 $\pm$ 0.0071

Figure 8.

Comparison of classic denoising filters.

Table 6

Comparison of segmentation accuracy with MR images from Sunnybrook dataset

Algorithm	Sensitivity	Specificity	PPV	NPV
Tri-ConvNets	0.9832 $\pm$ 0.0068	0.9895 $\pm$ 0.0007	0.9861 $\pm$ 0.0080	0.9992 $\pm$ 0.0080
CNN $+$ U-Net	0.9834 $\pm$ 0.0023	0.9931 $\pm$ 0.0083	0.9844 $\pm$ 0.0090	0.9991 $\pm$ 0.0090
CPR	0.9788 $\pm$ 0.0085	0.9928 $\pm$ 0.0008	0.9826 $\pm$ 0.0101	0.9990 $\pm$ 0.0101
MA-Shape	0.9715 $\pm$ 0.0052	0.9904 $\pm$ 0.0046	0.9810 $\pm$ 0.0111	0.9989 $\pm$ 0.0111
DLDP	0.9740 $\pm$ 0.0098	0.9954 $\pm$ 0.0073	0.9793 $\pm$ 0.0120	0.9988 $\pm$ 0.0120
Omega-Net	0.9710 $\pm$ 0.0078	0.9960 $\pm$ 0.0027	0.9775 $\pm$ 0.0130	0.9988 $\pm$ 0.0130
FCN	0.9722 $\pm$ 0.0077	0.9915 $\pm$ 0.0056	0.9757 $\pm$ 0.0140	0.9986 $\pm$ 0.0140

The segmentation metrics computed of the LV images from the York dataset are quantified in Table 5. The performance measurements’ variance and average values are shown. According to the experimental results, the suggested Tri-ConvNets segmentation method outperforms the other models in terms of accuracy. With a higher PPV of 0.9582 and a higher Jaccard index of 0.9477, the Tri-ConvNets approach provides sensitive and effective LV segmentation.

4.4 Comparison analysis on Sunnybrook dataset

The suggested Tri-ConvNets approach has the ability to segment the LV regions more effectively than the other techniques, according to its effectiveness criteria. The increased Jaccard index of 0.9491 and enhanced PPV of 0.9861 demonstrate the Tri-ConvNets technique’s noteworthy LV segmentation ability.

Table 7
Sunnybrook dataset segmentation estimation with dice and Jaccard indices

Algorithm	Dice index	Jaccard index
Tri-ConvNets	0.9389 $\pm$ 0.0080	0.9491 $\pm$ 0.0080
CNN $+$ U-Net	0.9376 $\pm$ 0.0090	0.9489 $\pm$ 0.0090
CPR	0.9371 $\pm$ 0.0101	0.9490 $\pm$ 0.0101
MA-Shape	0.9358 $\pm$ 0.0111	0.9489 $\pm$ 0.0111
DLDP	0.9349 $\pm$ 0.0120	0.9487 $\pm$ 0.0120
Omega-Net	0.9344 $\pm$ 0.0130	0.9489 $\pm$ 0.0130
FCN	0.9330 $\pm$ 0.0140	0.9486 $\pm$ 0.0140

Figure 9.

Linear regression plot depicting the automatic-segmentation volumes against manually obtained volumes with MR images from York dataset.

Table 8

Performance comparison of existing techniques with proposed model

Author	Methods	Dice index	Jaccard index
Seo et al. [27]	2D U-Net	0.9451 $\pm$ 0.0120	0.9435 $\pm$ 0.0110
Habijan et al. [28]	CNN	0.9434 $\pm$ 0.0120	0.9469 $\pm$ 0.0120
Bekouche et al. [29]	U-Net	0.9434 $\pm$ 0.0120	0.9431 $\pm$ 0.0120
Proposed model	Tri-ConvNets	0.9389 $\pm$ 0.0080	0.9491 $\pm$ 0.0080

Figure 10.

Linear regression plot depicting the automatic-segmentation volumes against manually obtained volumes with MR images from Sunnybrook dataset.

Tables 5 and 6 listed the segmentation accuracy from Sunnybrook dataset. Figure 10 illustrates the higher similarity between the automated and the manual segmentation on Sunnybrook dataset MR images.

The segmentation outcomes calculated from the LV images from the York dataset are quantified in Table 8. The performance measurements’ variance and average values are shown. According to the experimental results, the suggested Tri-ConvNets segmentation method outperforms the other models in terms of accuracy. With a higher PPV of 0.9582 and a higher Jaccard index of 0.9477, the Tri-ConvNets approach provides sensitive and effective LV segmentation.

5. Discussion

Table 2 depicts the comparison between denoising models. The several DL based filters are compared with traditional approaches with specific metrics include PSNR, MSE, and MAE, respectively. A minimal quantity of error rates was obtained by the proposed ABIHE filter. On the other hand, PSNR values are obtained higher than the other values. Comparing the ABIHE filter to different denoising strategies, the analysis found that it offers the least MSE. According to this conclusion, the ABIHE filters have the lowest MSE values, and when compared to other denoising filters, the suggested Tri-ConvNets exhibits low error rates for various noise ratios.

Table 3 presents the segmentation accuracy of different algorithms on the York dataset with CMRI images, focusing on several key metrics. Tri-ConvNets exhibit the highest sensitivity (0.9792), specificity (0.9936), with PPV (0.9880) and NPV (0.9992) scores reflecting high reliability in segmentation predictions, respectively. Comparatively, CNN $+$ U-Net and other algorithms like CPR, MA-Shape, DLDP, Omega-Net, and FCN show a gradation in performance across these metrics, with generally lower sensitivity, specificity, PPV, and NPV values than proposed Tri-ConvNets in accurate ventricle segmentation.

Table 4 presents the segmentation validation results of different algorithms on the York Dataset, using the Dice and Jaccard indices as metrics for evaluating segmentation accuracy. The Tri-ConvNets algorithm shows the highest performance with a Dice Index of 0.9403 and a Jaccard Index of 0.9497, indicating superior segmentation precision and overlap between the predicted and ground truth segmentations of the dataset. MA-Shape, DLDP, Omega-Net, CNN $+$ U-Net, CPR algorithms and FCN algorithms also show competitive performance, with their Dice and Jaccard scores closely clustered but progressively decreasing than Tri-ConvNets in semantic segmentation.

The segmentation metrics computed of the LV pictures from the York dataset are quantified in Table 5. The performance measurements’ variance and average values are shown. According to the experimental results, the suggested Tri-ConvNets segmentation method outperforms the other models in terms of accuracy. With a higher PPV of 0.9582 and a higher Jaccard index of 0.9477, the Tri-ConvNets approach provides sensitive and effective LV segmentation.

Table 6 presents a comparison of segmentation accuracy metrics for the left ventricle on MR images from the Sunnybrook Dataset across various algorithms. Tri-ConvNets, the proposed method, achieves high sensitivity (0.9832 $\pm$ 0.0068) and specificity (0.9895 $\pm$ 0.0007), indicating its strong performance in correctly identifying the left ventricle and accurately excluding non-ventricular regions. Algorithms like CNN+U-Net, CPR, MA-Shape, DLDP, Omega-Net, and FCN exhibit varying degrees of sensitivity, specificity, PPV, and NPV, showing the highest specificity but lower sensitivity compared to the proposed Tri-ConvNets the detection of the left ventricle.

Table 7 presents the segmentation performance of various algorithms on the Sunnybrook Dataset, as evaluated by the Dice and Jaccard indices, which are common metrics for assessing the accuracy of semantic segmentation in medical imaging. The Tri-ConvNets approach achieves the highest Dice Index score of 0.9389 with a standard deviation of $\pm$ 0.0080 and a Jaccard Index score of 0.9491 with the same standard deviation, indicating a highly accurate and consistent segmentation of the left ventricle compared to other methods. The CNN+U-Net, CPR, MA-Shape, DLDP, Omega-Net, and FCN algorithms follow closely, with their performance scores also being high but slightly lower than Tri-ConvNets in segmenting the left ventricle from cardiac MR images.

6. Conclusion

In this paper, a combination of different deep neural networks for semantic segmentation of the left ventricle based on Tri-ConvNets to obtain accurate segmentation is proposed. CMRI images are initially pre-processed to remove noise artefacts and enhance image quality, then ROI-based extraction is done in three stages to accurately identify the LV. The extracted features are given as input to three different DL structures for segmenting the LV in an efficient way. The contour edges are processed in the standard ConvNet, the contour points are processed using Fully ConvNet and finally the noise free images are converted into patches to perform pixel-wise operations in ConvNets. These three ConvNets perform different operations to give different segmentation results and these results concatenate to calculate circular similarity for attaining better LV segmentation output. The proposed Tri-ConvNets model achieves the Jaccard indices of 0.9491 $\pm$ 0.0188 for the sunny brook dataset and 0.9497 $\pm$ 0.0237 for the York dataset, and the dice index of 0.9419 $\pm$ 0.0178 for the ACDC dataset and 0.9414 $\pm$ 0.0247 for LVSC dataset respectively. In future researchers enhance model interpretability and generalization by integrating attention mechanisms and self-supervised learning. Exploring multi-modal fusion to improve the robustness and accuracy of left ventricle segmentation for clinical applications. In the preprocessing and ROI-based extraction stages, the performance of these preprocessing steps might vary across different datasets or imaging conditions, potentially limiting the model’s applicability to a wider range of CMRI images or necessitating additional adjustments and validations for each new dataset.

Ethical approval

My research guide reviewed and ethically approved this manuscript for publishing in this Journal.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Funding

None.

Availability of data and material

Data sharing is not applicable to this article as no new data were created or analyzed in this Research.

Informed consent

I certify that I have explained the nature and purpose of this study to the above-named individual, and I have discussed the potential benefits of this study participation. The questions the individual had about this study have been answered, and we will always be available to address future questions.

Footnotes

Acknowledgments

The author would like to express his heartfelt gratitude to the supervisor for his guidance and unwavering support during this research for his guidance and support.

Conflict of interest

This paper has no conflict of interest for publishing.

References

Rashid

Qureshi

Noor

Yaseen

Sheikh

MAA

Malik

. Anxiety and depression in heart failure: An updated review. Current Problems in Cardiology. 2023; 48(11): 101987.

Sundarasekar

Appathurai

. Efficient brain tumor detection and classification using magnetic resonance imaging. Biomedical Physics & Engineering Express. 2021; 7(5): 055007.

Mabel Rose

Vasuki

Bhavana

. Yolo-Vehicle: Realtime Vehicle Licence Plate Detection and Character Recognition Using Yolov7 Network. International Journal of Data Science and Artificial Intelligence. 2024; 2(1): 27-34.

Fenil

Manogaran

Vivekananda

Thanjaivadivel

Jeeva

Ahilan

AJCN

. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Computer Networks. 2019; 151: 191-200.

Steinhaus

Lubitz

Noseworthy

Kramer

. Exercise interventions in patients with implantable cardioverter-defibrillators and cardiac resynchronization therapy: A systematic review and meta-analysis. Journal of Cardiopulmonary Rehabilitation and Prevention. 2019; 39(5): 308.

Ramaswamy

Joe Patrick Gnanaraj

Chandra Sekar

Muthukumaran

. Analysis of Distribution Line in Link with Substation using GSM Technology, International Conference on Sustainable Communication Networks and Application (ICSCNA). Theni, India, 2023; 526-528.

Hassan

Palaskas

Agha

Iliescu

Lopez-Mattei

Chen

Zheng

Yusuf

. Carcinoid heart disease: a comprehensive review. Current Cardiology Reports. 2019; 21: 1-7.

Pastore

Mandoli

Aboumarie

Santoro

Bandera

D’Andrea

Benfari

Esposito

Evola

Sorrentino

Cameli

. Basic and advanced echocardiography in advanced heart failure: an overview. Heart Failure Reviews. 2020; 25: 937-948.

Joyce

Buoso

Stoeck

Kozerke

. Rapid inference of personalised left-ventricular meshes by deformation-based differentiable mesh voxelization. Medical Image Analysis. 2022; 79: 102445.

10.

Curiale

Colavecchia

Mato

. Automatic quantification of the LV function and mass: A deep learning approach for cardiovascular MRI. Computer Methods and Programs in Biomedicine. 2019; 169: 37-50.

11.

Ribeiro

Nunes

. Left Ventricle Segmentation in Cardiac MR: A Systematic Mapping of the Past Decade. ACM Computing Surveys (CSUR). 2022; 54(11s): 1-38.

12.

Sivasankari

Shunmugathammal

Appathurai

Kavitha

. High-Throughput and Power-Efficient Convolutional Neural Network Using One-Pass Processing Elements. Journal of Circuits, Systems and Computers. 2022; 31(13): 2250226.

13.

Fenil

Manogaran

Vivekananda

Thanjaivadivel

Jeeva

Ahilan

AJCN

. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Computer Networks. 2019; 151: 191-200.

14.

Gayathri

Ajitha Gladis

Angel Mary

. Real time masked face recognition using deep learning based yolov4 network. International Journal of Data Science and Artificial Intelligence. 2023; 1(1): 26-32.

15.

Hsu

. Automatic left ventricle recognition, segmentation and tracking in cardiac ultrasound image sequences. IEEE Access. 2019; 7: 140524-140533.

16.

Pan

Wang

Yin

. Automatic segmentation of left ventricle from cardiac MRI via deep learning and region constrained dynamic programming. Neurocomputing. 2019; 347: 139-148.

17.

Yang

Xiao

Liu

Sun

Guo

Cui

Sun

Zhang

Yang

. Deep RetinaNet for dynamic left ventricle detection in multiview echocardiography classification. Scientific Programming. 2020; 2020: 1-6.

18.

Fang

Lai

. Left ventricle automatic segmentation in cardiac MRI using a combined CNN and U-net approach. Computerized Medical Imaging and Graphics. 2020; 82: 101719.

19.

Amer

Zolgharni

Janan

. ResDUnet: Residual dilated UNet for left ventricle segmentation from echocardiographic images. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 2020; 2019-2022. IEEE.

20.

Leclerc

Smistad

Østvik

Cervenansky

Espinosa

Espeland

Berg

EAR

Belhamissi

Israilov

Grenier

Lartizien

. LU-Net: a multistage attention network to improve the robustness of segmentation of left ventricular structures in 2-D echocardiography. IEEE Transactions on Ultrasonics. Ferroelectrics, and Frequency Control. 2020; 67(12): 2519-2530.

21.

Abdeltawab

Khalifa

Taher

Alghamdi

Ghazal

Beache

Mohamed

Keynton

El-Baz

. A deep learning-based approach for automatic segmentation and quantification of the left ventricle from cardiac cine MR images. Computerized Medical Imaging and Graphics. 2020; 81: 101717.

22.

Amer

Janan

. ResDUnet: A deep learning-based left ventricle segmentation method for echocardiography. IEEE Access. 2021; 9: 159755-159763.

23.

Irshad

Yasmin

Sharif

Rashid

Sharif

Kadry

. A Novel Light U-Net Model for Left Ventricle Segmentation Using MRI. Mathematics. 2023; 11(14): 3245.

24.

Wang

Cao

Feng

Yang

. EchoEFNet: Multi-task deep learning network for automatic calculation of left ventricular ejection fraction in 2D echocardiography. Computers in Biology and Medicine. 2023; 156: 106705.

25.

Zhang

Guo

Wang

. Segmentation of biventricle in cardiac cine MRI via nested capsule dense network. PeerJ Computer Science. 2022; 8: e1146.

26.

Gayathri

, Maheswari

Venkatesh

Appathurai

. Automatic left ventricle segmentation via edge-shape feature-based fully convolutional neural network. International Journal of Imaging Systems and Technology.

27.

Seo

Mariano

Beckfield

Madenur

Reina

Bobar

Nguyen

Altintas

. Cardiac MRI image segmentation for left ventricle and right ventricle using deep learning. arXiv preprint arXiv1909.08028. 2019.

28.

Habijan

Leventić

Galić

Babin

. Estimation of the left ventricle volume using semantic segmentation. In 2019 International Symposium ELMAR 2019; 39-44. IEEE.

29.

Bekkouche

Merzoug

Hadjila

Daoud

. Segmentation of the Left Ventricle Using Improved UNET Neural Networks. 2023.

Fine grained automatic left ventricle segmentation via ROI based Tri-Convolutional neural networks

Abstract

BACKGROUND:

OBJECTIVE/METHODS:

RESULTS/CONCLUSIONS:

Keywords

1. Introduction

2. Literature survey

3.1 Adaptive bilateral histogram equalization filter (ABIHE)

3.2.2 Extraction of contour edges through ROI

3.3 Tri-convolutional networks (ConvNets)

3.3.1 Standard ConvNets

Table 2 Analysis of the classic denoising filters performance comparisons

Table 4 York dataset segmentation validation with Dice and Jaccard indices

Table 7 Sunnybrook dataset segmentation estimation with dice and Jaccard indices

6. Conclusion

Ethical approval

Human and animal rights

Funding

Availability of data and material

Informed consent

Footnotes

Acknowledgments

Conflict of interest

References

Table 2
Analysis of the classic denoising filters performance comparisons

Table 4
York dataset segmentation validation with Dice and Jaccard indices

Table 7
Sunnybrook dataset segmentation estimation with dice and Jaccard indices