Coronary artery segmentation in CCTA images based on multi-scale feature learning

Abstract

BACKGROUND:

Coronary artery segmentation is a prerequisite in computer-aided diagnosis of Coronary Artery Disease (CAD). However, segmentation of coronary arteries in Coronary Computed Tomography Angiography (CCTA) images faces several challenges. The current segmentation approaches are unable to effectively address these challenges and existing problems such as the need for manual interaction or low segmentation accuracy.

OBJECTIVE:

A Multi-scale Feature Learning and Rectification (MFLR) network is proposed to tackle the challenges and achieve automatic and accurate segmentation of coronary arteries.

METHODS:

The MFLR network introduces a multi-scale feature extraction module in the encoder to effectively capture contextual information under different receptive fields. In the decoder, a feature correction and fusion module is proposed, which employs high-level features containing multi-scale information to correct and guide low-level features, achieving fusion between the two-level features to further improve segmentation performance.

RESULTS:

The MFLR network achieved the best performance on the dice similarity coefficient, Jaccard index, Recall, F1-score, and 95% Hausdorff distance, for both in-house and public datasets.

CONCLUSION:

Experimental results demonstrate the superiority and good generalization ability of the MFLR approach. This study contributes to the accurate diagnosis and treatment of CAD, and it also informs other segmentation applications in medicine.

Keywords

Coronary artery segmentation CCTA multi-scale feature feature fusion feature correction

1 Introduction

The function of coronary arteries is to provide oxygen and nutrients to the heart, to maintain its powerful continuous beating. Currently, Coronary Artery Disease (CAD) is a predominant contributor to global mortality [1, 2]. Coronary Computed Tomography Angiography (CCTA) has an extensive application in the diagnosis of CAD due to its ability to produce high-resolution images, its non-invasive nature, and its cost-effectiveness [3, 4]. Based on the segmented coronary arteries in CCTA images, the stenosis degree of the vessels can be quantified, the fractional flow reserve value [5] can be calculated, and the anatomical names of coronary arteries can be identified [6], which assist radiologists in judging the severity of CAD and providing reasonable treatment plans. Accurate coronary artery segmentation is crucial for diagnosing and treating CAD. Manual segmentation by radiologists is time-consuming and laborious, and research on automatic segmentation approaches is urgently needed.

Segmentation of coronary arteries in CCTA faces several challenges. The first one is a severe inter-class imbalance problem, as shown in Fig. 1 (a) and (b). The coronary arteries (the red labels in Fig. 1 (a) and (b)) occupy an extremely small proportion (usually less than 0.1%) of the whole CCTA image, thus the segmented coronary arteries are prone to disruption. The second one is the intra-class imbalance problem, as shown in Fig. 1 (a). The diameter of the proximal branches near the ascending aorta is larger than that of other branches. This makes it easier for these small vessel branches to be omitted by the segmentation approach. Third, the coronary artery branches scatter in the CCTA images (as shown in Fig. 1 (b)), and their shape and structure vary significantly from person to person, as shown in Fig. 1 (c) and (d). This makes it difficult for the algorithm to identify coronary arteries accurately.

Fig. 1

Challenges in coronary artery segmentation. (a) and (b) present the coronary arteries from a cross-section perspective, and (c) and (d) show the 3D coronary arteries. The red labels in the sub-figures indicate coronary arteries. In Fig. 1 (a), the yellow arrow and green arrow show the proximal and distal branches, respectively, and it can be seen that the proximal branch has a larger diameter than the other branches (e.g. the distal branch).

Some researchers adopted the traditional segmentation approaches such as the threshold approach [7], the tracking method [8, 9], and the level-set algorithm [10, 11], to depict the coronary artery contour. These traditional approaches do not require high computer resources. However, these algorithms face problems such as low segmentation accuracy, requiring manual intervention, and requiring the construction of a complex model. Shams et al. [7] first applied Hessian-based filtering to enhance the arteries, and then the Ostu threshold approach [12] was exploited to delineate the vessels. Although the threshold approach is easy to execute, due to the presence of many tissues in CCTA images, it is difficult to achieve high segmentation accuracy by only using the threshold approach. Chen et al. [8] combined the geometric moment approach and snake algorithm to track arteries. Zhou et al. [9] exploited the region-growing approach to track vessels. However, for these tracking algorithms, the seed points for the coronary arteries need to be manually placed. Khokhar et al. [10] and Ge et al. [11] integrated the curvature feature constraint and area constraint to the level-set approach to segment coronary arteries, respectively. These algorithms usually require the building of a complex model and constraints to obtain promising performance. Gao et al. [13] first combined the Hough transform and the level-set approach to segment the aorta, then the region-growing algorithm was applied to obtain the connection between the coronary artery and aorta, and finally the arteries were segmented by the projection and dynamic programming approaches. Although segmented arteries by Gao’s approach achieve good agreement with the ground truth, the execution process of the algorithm is complex.

With the improvement of computer hardware, researchers have increasingly adopted deep learning to delineate coronary arteries [14]. Moeskops et al. [15] showed that a single convolutional neural network is suitable for depicting the contour of arteries. Kong et al. [16] combined the Fully Convolutional Network (FCN) [17] with the recurrent neural network to realize the segmentation of arteries. Shen et al. [18] integrated an attention gate into the FCN to attenuate irrelevant regions, and the segmentation results were further refined using the level-set algorithm. Tian et al. [19] first adopted the V-shaped network to segment coronary arteries, and then the region-growing approach was exploited to smooth the contour of the arteries. However, Shen’s and Tian’s approach treated the level-set and region-growing algorithm as an additional post-processing step, respectively, thereby increasing the complexity of the algorithm execution. The expression of multi-scale features can help networks capture objects of different sizes in images and better learn the contextual information contained in images [20, 21], thereby improving the performance of tasks such as detection [22, 23] and small target segmentation [24]. Zhu et al. [25] presented a Feature Fusion Network (FFNet) to segment coronary arteries. The FFNet introduces dilated convolution [26] in the bottom of Unet [27] to utilize the multi-scale information, and deep supervision strategy in the output, to further promote the performance. Dual Attention Unet (DAUnet) [28] exploited the dual-attention mechanism which fuses features between adjacent levels in the skip connection, to enhance the identification capability of vessels. Dong et al. [29] proposed a Coronary Artery Segmentation Network (CAS-Net). CAS-Net extracted the multi-scale information in one of the layers in the encoder. The exploitation of multi-scale information in these existing networks is insufficient, and there is still room for improvement in their segmentation accuracy. This has prompted us to explore the performance of multi-scale feature learning in coronary artery segmentation.

The contributions to this paper are as follows: (1) To address the challenges of coronary artery segmentation in CCTA images, this present study proposes a novel Multi-scale Feature Learning and Rectification (MFLR) network to achieve automatic segmentation of coronary arteries; (2) MFLR network introduces multi-scale modules in both the encoder and decoder, to fully collect the contextual information on the image and better capture the scale and shape changes of coronary arteries; (3) Experimental results show that the MFLR network outperforms other comparison approaches, and it proves the usefulness and necessity of fully exploiting multi-scale features in coronary artery segmentation.

2 Method

Dilated convolution can expand the field of view by setting a large dilation rate (DR), and expressive features can be collected by using multiple dilated convolutions with different DRs [30, 31]. The shortcut connection in Res-Net [32] is beneficial for the propagation of gradients in the network and can improve the reusability of features [33, 34]. Taking guidance from the dilated convolution and Res-Net, the MFLR network was presented to solve the challenges faced in coronary artery segmentation. The structure of the MFLR network is shown in Fig. 2. The MFLR network mainly consists of two parts, i.e., the left encoder and the right decoder. In the encoder, we designed a Multi-scale Feature Extraction (MFE) module that employs dilated convolutions to produce rich feature representations from multiple receptive fields. In the decoder, a Feature Rectification and Fusion (FRF) module is proposed, which exploits high-level features with richer semantic information to guide and correct low-level features [35], and achieves the aggregation between the two-level features.

Fig. 2

Diagram of multi-scale feature learning and rectification network.

2.1 Multi-scale feature extraction module

To strengthen the learning and extraction capabilities of multi-scale features, the MFE module is proposed in the encoder, and its structure is shown in Fig. 3. As shown in Fig. 3, for the input feature map, the MFE module first performs 3 × 3 ×3 convolution with a DR of 1, Instance Normalization (IN), and Rectified Linear Unit (ReLU) activation operations, and passes the input feature map by the shortcut connection and sums it up with the convolved feature map, to promote gradient propagation and the reusability of features. Then, it performs the 1 × 1 ×1 convolution, IN, and ReLU operations to achieve the fusion between the two feature maps. This process can be formulated as:

Fig. 3

Multi-scale feature extraction module.

$f_{1} = ReLU (IN (Con v_{1} (f_{input} + ReLU (IN (Con v_{3, 1} (f_{input}))))))$ (1) where f _input represents the input feature, Conv _3,1 denotes dilated convolutions with kernel size 3 × 3 ×3 and DR 1, Conv ₁ is the convolution with kernel size 1 × 1 ×1, and f ₁ denotes the first fused feature map.

Next, repeat the process similar to Equation (1), but using the dilated convolution with different DR, i.e., for the feature map after the aggregation, the 3 × 3 ×3 convolution with a DR of {2, 3, 4}, IN, ReLU activation, summation, and 1 × 1 ×1 convolution are performed: $f_{2} = ReLU (IN (Con v_{1} (f_{1} + ReLU (IN (Con v_{3, 2} (f_{1}))))))$ (2) $f_{3} = ReLU (IN (Con v_{1} (f_{2} + ReLU (IN (Con v_{3, 3} (f_{2}))))))$ (3) $f_{4} = ReLU (IN (Con v_{1} (f_{3} + ReLU (IN (Con v_{3, 4} (f_{3}))))))$ (4) where f ₂, f ₃, and f ₄ represent the second, third, and fourth fused feature maps, respectively. These fused feature maps contain information from adjacent scales. The main reason for setting dilation rates of 1, 2, 3, and 4 is to avoid the grid effect caused by the dilation convolution [36]. Finally, the input feature map is concatenated with the four fused feature maps, and 1 × 1 ×1 convolution, IN, and ReLU activation is performed to achieve the integration of multi-scale information under receptive fields of various sizes. $f_{output} = ReLU (IN (Con v_{1} (ℂ (f_{input}, f_{1}, f_{2}, f_{3}, f_{4})))) .$ (5) where $ℂ$ represents the feature concatenation operation and f _output denotes the output feature map of the MFE module.

2.2 Feature correction and fusion module

Coronary arteries are scattered in CCTA images and the segmentation of the arteries suffers from severe inter-class and intra-class imbalance problems. Therefore, for coronary artery segmentation, it is necessary to learn richer contextual information to obtain superior segmentation performance [37, 38]. To further exploit multi-scale information, we proposed the FRF module, whose structure is shown in Fig. 4.

Fig. 4

Feature rectification and fusion module.

Let f _high and f _low represent the input high-level and low-level feature maps, respectively. Generally speaking, the features in high-level feature maps are more abstract and have smaller spatial resolution, but have a larger receptive field. In contrast to high-level feature maps, low-level feature maps retain more detailed information in the image, but the receptive field is relatively small. For high-level feature maps, the proposed FRF module adopts Global Average Pooling (GAP) [35], 3 × 3 ×3 convolution, and 1 × 1 ×1 convolution to obtain f _multi, which can capture richer scale information [39]. $\begin{matrix} f_{multi} = Up (IN (Con v_{1} (f_{high}))) + Up (IN (Con v_{3} (f_{high}))) \\ 1 pt 1 pt 1 pt 1 pt 1 pt + Up (IN (Con v_{3} (GAP (f_{high})))) \end{matrix}$ (6) where Conv ₃ represents convolution with kernel size 3 × 3 ×3, GAP is the global average pooling with size 2 × 2 ×2 and stride 2, and Up denotes the trilinear interpolation operator to make f _multi have the same resolution as the low-level feature map.

After performing GAP, 3 × 3 ×3 convolution, and 1 × 1 ×1 convolution operations, f _multi captured the information with different scales. Then, the 1 × 1 ×1 convolution fused information from different scales, and the attention map f _att was obtained through the Sigmoid activation function. $f_{att} = Sig (IN (Con v_{1} (f_{multi})))$ (7) where Sig denotes the Sigmoid activation function. The attention map is then multiplied with the low-level feature to guide and rectify low-level features, and the output feature map f _out of the FRF module is finally obtained. $f_{out} = IN (Con v_{3} (f_{low})) \otimes f_{att}$ (8)

2.3 Combo loss function

Based on the characteristic that the coronary artery in CCTA images belongs to a small target, the combination of Dice Loss [40] and Cross-Entropy (CE) Loss that was suitable for the small target segmentation [41] was adopted as the loss function, i.e., $L_{total} = L_{Dice} + L_{CE}$ (9)

In Equation (9), L _Dice and L _CE represent the Dice Loss and the CE loss, respectively. For the Dice Loss, there is: $L_{Dice} = 1 - \frac{2 \sum_{i = 1}^{N} y_{i} {\tilde{y}}_{i}}{\sum_{i = 1}^{N} y_{i} + \sum_{i = 1}^{N} {\tilde{y}}_{i}}$ (10)

In Equation (10), y _i denotes the label value at the position i, ${\tilde{y}}_{i}$ represents the network prediction value at the position i, and N is the total number of voxels in the image. L _Dice measures the overlap degree between image labels and network prediction results. L _CE is defined as: $L_{CE} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\tilde{y}}_{i}) + (1 - y_{i}) \log (1 - {\tilde{y}}_{i})]$ (11)

In Equation (11), y_i denotes the label at the position i, ${\tilde{y}}_{i}$ represents the probability value of the prediction at the position i, and N denotes the total number of voxels in the CCTA image.

3 Experiments

3.1 Dataset

There are two datasets in this study. The first one is the in-house dataset. 81 CCTA scans from the General Hospital of North Theater Command, Shenyang, China were collected. Their average age, highest age, and lowest age are 56, 80, and 28 years, respectively. Among 81 subjects, 41 are males and 40 are females. The research conducted in this study received approval from the Biology and Medical Ethics Committee of Northeastern University under the ethical review approval number NEU-EC-2023B018 S. The size of each scan in the X and Y directions is 512 × 512, and it consists of 300∼400 slice images in the Z direction. The distance between two adjacent slice images is 0.45 mm. The whole data set was partitioned randomly, with 57 cases for training and 24 cases for testing. The second one is the public dataset, i.e., ASOCA [42], which has 40 CCTA scans. The 40 cases were randomly divided, with 28 cases used for training and 12 cases used for testing.

To reduce the interference, improve the contrast of coronary arteries in the images, and enhance the generalization ability of the model, pre-processing was executed. Pre-processing includes two steps. First, the CT value of the voxels in the collected 3D format data was truncated to [–230, 760] HU. Then, the volume images were normalized with the Z-score normalization, i.e., $img_nor = (img - img_mean) / img_std$ (12) where img is the volume image after the truncation, img_mean represents the mean value of the volume image, img_std denotes the standard deviation of the volume image, and img_nor represents the image after the normalization. Figure 5 shows the comparison of images before and after the pre-processing. According to Fig. 5, after the pre-processing, the contrast of coronary arteries (e.g., the left anterior descending branch) is enhanced, and the interference from the background area is reduced.

Fig. 5

Comparison of images before (a) and after (b) pre-processing. The yellow arrow indicates the left anterior descending branch, and the orange box represents the background area with no coronary arteries.

3.2 Implementation details

The experiments described here are implemented using Python 3.6 and PyTorch 1.10.1 + cu111 on the NVIDIA GeForce RTX 3090 GPU. The Batch Size (BS) and Patch Size (PS) are set to 2 and [224, 224, 128], respectively. The training of the network employs the stochastic gradient descent optimizer. The initial Learning Rate (LR _init) is set as 0.1, and it is updated by $L R_{init} \times {(1 - \frac{t}{t_{\max}})}^{0.9}$ , where t and t _max are the current and final iterations, respectively. t _max is set to 30000 iterations. Data augmentation techniques, including random flipping, random rotating, and random cropping, are used to enhance the diversity of the dataset. During inference, the slide window approach is exploited to obtain the segmented coronary arteries.

3.3 Comparative experimental results

To verify the segmentation performance of the proposed MFLR network, this study compared it with mainstream networks in the field of medical image segmentation, i.e., the 3D U-Net [43], VoxResNet [44], ResUnet [45], CAS-Net [29], CS2-Net [46], FFNet [25], and DAUnet [28]. The evaluation indexes are the Dice Similarity Coefficient (DSC), Jaccard Index (JI), Recall, Precision, F1-score, and 95% Hausdorff Distance (HD95). Based on the in-house dataset, the quantitative comparison results between these methods are shown in Table 1. According to Table 1, among the seven comparison algorithms, CS2-Net achieved higher segmentation performance overall. The MFLR method, due to its more comprehensive utilization of the multi-scale information in the images, improved DSC by 0.92%, JI by 1.38%, Recall by 0.27%, Precision by 1.48%, F1-score by 0.0085, and HD95 decreased by 12.1997 compared with the CS2-Net.

Table 1
Comparison of the MFLR network and various algorithms on the in-house dataset. ↑ indicates the higher the better, and ↓ represents the lower the better

Method Metrics

DSC↑ JI↑ Recall↑ Precision↑ F1-score↑ Hd95↓(mm)

3D U-Net 0.8331 0.7147 0.7974 0.8812 0.8372 17.6535

VoxResNet 0.7847 0.6481 0.7926 0.7917 0.7921 51.9750

ResUnet 0.8349 0.7177 0.8069 0.8734 0.8388 10.2587

CAS-Net 0.8376 0.7219 0.8122 0.8724 0.8412 19.3556

CS2-Net 0.8404 0.7257 0.8273 0.8629 0.8447 21.1403

FFNet 0.8172 0.6921 0.7916 0.8534 0.8213 10.5104

DAUnet 0.7698 0.6291 0.8279 0.7306 0.7762 69.5240

MFLR 0.8496 0.7395 0.8300 0.8777 0.8532 8.9406

Method	Metrics
3D U-Net	0.8331	0.7147	0.7974	0.8812	0.8372	17.6535
VoxResNet	0.7847	0.6481	0.7926	0.7917	0.7921	51.9750
ResUnet	0.8349	0.7177	0.8069	0.8734	0.8388	10.2587
CAS-Net	0.8376	0.7219	0.8122	0.8724	0.8412	19.3556
CS2-Net	0.8404	0.7257	0.8273	0.8629	0.8447	21.1403
FFNet	0.8172	0.6921	0.7916	0.8534	0.8213	10.5104
DAUnet	0.7698	0.6291	0.8279	0.7306	0.7762	69.5240
MFLR	0.8496	0.7395	0.8300	0.8777	0.8532	8.9406

Figure 6 shows the segmentation results of various algorithms from a cross-section perspective. In Fig. 6, the brown-yellow region denotes the overlap region between the predicted results of the network and the ground truth, and the proposed algorithm obtained more brown-yellow regions overall, which means it achieved better segmentation performance than other methods. Because of the exploitation of multi-scale modules, the MFLR network can segment vessels with both larger and smaller diameters simultaneously. Figure 7 shows the 3D prediction results of various algorithms. As shown from the blue circle in Fig. 7, the MFLR network can better identify small branches and segment more complete coronary arteries compared with other algorithms.

Fig. 6

On the in-house dataset, segmentation results in cross-section. The coronary artery slice images in the basal and apical directions are displayed in the first and third columns, respectively, while the coronary artery slice images between the basal and apical directions are shown in the second column. Green () represents the ground truth, red () indicates the predicted results, and brown-yellow () denotes the overlap region between the predicted results and the ground truth.

Fig. 7

On the in-house dataset, 3D segmentation results of different approaches. The areas highlighted by the blue circles indicate regions where the proposed method obtains more complete branches compared with the comparison approaches. The areas highlighted by the yellow circles indicate regions where the proposed method fails to detect vessels when compared with the ground truth.

Table 2 presents the quantitative results of different algorithms on the ASOCA dataset. It indicates that the MFLR network obtains the best performance on DSC, JI, Recall, F1-score, and HD95.

Table 2

Comparison of the MFLR approach and various algorithms on the ASOCA dataset. ↑ indicates the higher the better, and ↓ represents the lower the better

Method	Metrics
	DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	Hd95(mm)↓
3D U-Net	0.8340	0.7193	0.8191	0.8657	0.8418	23.7614
VoxResNet	0.8228	0.7013	0.8401	0.8186	0.8292	27.5326
ResUnet	0.8343	0.7193	0.8207	0.8653	0.8424	16.3879
CAS-Net	0.8384	0.7232	0.8578	0.8340	0.8457	20.5001
CS2-Net	0.8383	0.7249	0.8620	0.8254	0.8433	20.6974
FFNet	0.7945	0.6635	0.7626	0.8533	0.8054	17.4479
DAUnet	0.7732	0.6361	0.8035	0.7704	0.7866	42.9207
MFLR	0.8511	0.7432	0.8714	0.8415	0.8562	16.3660

Figures 8 and 9 show the 2D and 3D segmentation results of various algorithms on the ASOCA dataset, respectively. From the figures, it can be seen that the MFLR network segments more complete vessels than the comparative algorithms, which proves the effectiveness of the MFLR algorithm in addressing the challenges of coronary artery segmentation.

Fig. 8

On the ASOCA dataset, segmentation results in cross-section. The coronary artery slice images in the basal and apical directions are displayed in the first and third columns, respectively, while the coronary artery slice images between the basal and apical directions are shown in the second column. Green () represents the ground truth, red () indicates the predicted results, and brown-yellow () denotes the overlap region between the predicted results and the ground truth.

Fig. 9

3D segmentation results of different approaches on the ASOCA dataset. The areas highlighted by the blue circles indicate regions where the proposed network obtains more complete branches compared with the comparison approaches. The areas highlighted by the yellow circles indicate regions where the proposed network fails to detect vessels when compared with the ground truth.

3.4 Results of the ablation experiment

Table 3 presents the results of using different dilated convolutions in the MFE module. The table shows that the evaluation index of the MFE module with 4 dilated convolutions and the DR of {1, 2, 3, 4} is higher than that of the MFE module with 3 dilated convolutions and the DR of {1, 2, 3}. This is because the increase in the number of dilated convolutions enables the network to leverage information at more scales in the feature map. However, the overall performance of the MFE module with 5 dilated convolutions is not as good as that of the MFE module with 4 dilated convolutions. This may be mainly due to the introduction of some useless information by a larger receptive field [29]. Therefore, in this study, the MFE module uses 4 dilated convolutions with a DR of {1, 2, 3, 4},respectively.

Table 3
Number of dilation convolutions in the MFE module. ↑ indicates the higher the better, and ↓ represents the lower the better

Dilation convolutions Metrics

DSC↑ JI↑ Recall↑ Precision↑ F1-score↑ HD95(mm)↓

DR = {1, 2, 3} 0.8442 0.7313 0.8213 0.8761 0.8478 9.1229

DR = {1, 2, 3, 4} 0.8470 0.7355 0.8269 0.8762 0.8508 8.9406

DR = {1, 2, 3, 4, 5} 0.8465 0.7347 0.8249 0.8769 0.8501 12.8326

Dilation convolutions	Metrics
DR = {1, 2, 3}	0.8442	0.7313	0.8213	0.8761	0.8478	9.1229
DR = {1, 2, 3, 4}	0.8470	0.7355	0.8269	0.8762	0.8508	8.9406
DR = {1, 2, 3, 4, 5}	0.8465	0.7347	0.8249	0.8769	0.8501	12.8326

Table 4 shows the results of the ablation experiment, where the Cascade Dilated Convolution (CDC) module [30] is composed of dilation convolutions with dilation rates of {1, 2, 3, 4}, and there is no feature fusion operation between adjacent dilated convolutions in the CDC module. In Table 4, EN (1-5) represents the use of the corresponding module in each layer of the encoder, while DE (1-4) denotes the use of the corresponding module in each layer of the decoder. From Table 4, it can be seen that the method proposed in this paper (which combines the MFE module and FRF module) achieved the optimal segmentation performance.

Table 4

Ablation study. ↑ indicates the higher the better

Method	CDC module	MFE module	FRF module	Metrics
				DSC↑	JI↑	Recall↑	Precision↑
Baseline	—	—	—	0.8331	0.7147	0.7974	0.8812
Model 1	EN (1-5)	—	—	0.8392	0.7238	0.8122	0.8767
Model 2	—	EN (1-5)	—	0.8470	0.7355	0.8269	0.8762
Model 3	—	—	DE (1-4)	0.8403	0.7257	0.8259	0.8632
Model 4	EN (1-5)	—	DE (1-4)	0.8430	0.7297	0.8179	0.8783
Model 5	—	EN (1-5)	DE (1-4)	0.8496	0.7395	0.8300	0.8777

4 Discussion

In recent years, deep learning has found widespread application in the field of medical image segmentation. In this study, the MFLR network was designed to fully learn contextual information on images and better capture the scale and shape changes of coronary arteries. From Tables 1 and 2, it can be seen that: (1) CAS-Net extracts the multi-scale feature in one layer of the encoder, and it has better performance than the 3D U-Net, however, the collection of multi-scale features in CAS-Net is insufficient, therefore, it does not obtain the best performance; (2) CS2-Net adopts the attention mechanism, but due to the lack of multi-scale feature learning, its performance is still not as good as MFLR network. In Table 4, the CDC module cascades multiple dilated convolutions with different DR, to encode multi-scale context. However, for the CDC module, the feature fusion between different scales is insufficient, therefore, the performance of the MFE module is superior to that of the CDC module. Although the MFLR network obtains a better performance overall than the existing approaches, the yellow circles in Figs. 7 and 9 show that some segmented vessels by our MFLR network remain incomplete. We attribute this observation primarily to the issue of inter-class imbalance. Given that the majority of voxels in CCTA scans represent the background, the network tends to gather more information from the background rather than from the foreground. Although we have adopted a multi-scale strategy to relieve the influence of the class imbalance, however, the challenge still exists. In future work, approaches that can further alleviate the class imbalance, e.g., the extraction of heart VOI [47, 48] and designing the loss function that is more suitable for small targets [49, 50], will be studied with the aim of further enhancing the networks’ performance.

5 Conclusion

This study designs a multi-scale feature learning and calibration network to automatically segment coronary arteries. The network gathers multi-scale information from different fields of view in the encoder. It presents a feature correction and fusion module in the decoder, and the module guides low-level features through high-level features with multi-scale information. The experimental results show that the proposed network achieves the best performance on most of the evaluation metrics compared with other state-of-the-art networks, on both the in-house and public datasets. It reveals the great potential for the proposed network to be used in clinical practice.. In the future, additional types of blood vessels in CCTA images will be segmented to validate and enhance the overall performance of the proposed network.

6 Supplementary material

Based on the in-house dataset, we analyzed the effects of the LR _init, PS, BS, and network layers in the MFLR network. Table 5 shows the quantitative results with BS = 2 and PS = [96, 96, 96], and the LR _init is set to 0.001, 0.01, and 0.1, respectively. Table 5 indicates a larger LR _init can lead to higher metrics for the MFLR approach. If the LR _init is greater than 0.1, there will be a gradient explosion problem. Therefore, the largest LR _init is set to 0.1. Table 6 shows the metrics with BS = 2 and PS = [224, 224, 128], and the same conclusion as Table 5 can be found. These prove the correctness of setting LR _init = 0.1 in the MFLR network.

Table 5

The influence of the LR _init for the MFLR network with BS = 2, PS = [96, 96, 96]. ↑ indicates the higher the better, and ↓ represents the lower the better

LR _init	Metrics
	DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	HD95(mm)↓
0.001	0.5429	0.3764	0.5403	0.5614	0.5506	138.6780
0.01	0.7227	0.5687	0.7161	0.7434	0.7295	89.8975
0.1	0.7651	0.6226	0.7644	0.7804	0.7723	54.8105

Table 6

The influence of the LR _init for the MFLR network with BS = 2, PS = [224, 224, 128]. ↑ indicates the higher the better, and ↓ represents the lower the better

LR _init	Metrics
	DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	HD95(mm)↓
0.001	0.7357	0.5844	0.6905	0.8044	0.7431	44.6401
0.01	0.8201	0.6959	0.7860	0.8676	0.8248	20.6026
0.1	0.8496	0.7395	0.8300	0.8777	0.8532	8.9406

Then, we analyze the influence of the different PS for the MFLR approach with LR _init = 0.1 and BS = 2, and the PS is set as [96, 96, 96], [128, 128, 128], and [224, 224, 128], respectively. From Table 7, we know that, with the increase of the PS, the MFLR algorithm can achieve better performance. This may be mainly due to the large PS containing more complete context information in the CCTA image. Because of the limitation of GPU RAM, the maximum PS is set to [224, 224, 128].

Table 7

The influence of the PS for the MFLR network with LR _init = 0.1 and BS = 2. ↑ indicates the higher the better, and ↓ represents the lower the better

PS	Metrics
	DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	HD95(mm)↓
[96, 96, 96]	0.7651	0.6226	0.7644	0.7804	0.7723	54.8105
[128, 128, 128]	0.8157	0.6902	0.8093	0.8325	0.8207	26.8681
[224, 224, 128]	0.8496	0.7395	0.8300	0.8777	0.8532	8.9406

Based on Table 7, we further increased the BS, and the results are shown in Table 8. Table 8 presents that compared with Table 7, the increase of BS resulted in higher evaluation metrics for experiments with PS = [96, 96, 96] and PS = [128, 128, 128]. However, these results are still not as good as that setting LR _init = 0.1, BS = 2, and PS = [224, 224, 128]. Therefore, in the proposed network, LR _init = 0.1, BS = 2, and PS = [224, 224, 128] are utilized. Additionally, to avoid network normalization failure and the gradient problem, BS = 1 is not used in this paper.

Table 8

The influence of the BS and PS for the MFLR network with LR _init = 0.1. ↑ indicates the higher the better, and ↓ represents the lower the better

BS	PS	Metrics
		DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	HD95(mm)↓
16	[96, 96, 96]	0.8204	0.6967	0.8095	0.8411	0.8250	22.0225
6	[128, 128, 128]	0.8366	0.7202	0.8232	0.8588	0.8406	10.2954
2	[224, 224, 128]	0.8496	0.7395	0.8300	0.8777	0.8532	8.9406

With LR _init = 0.1, BS = 2, and PS = [224, 224, 128], we also analyzed the influence of the layer number of the proposed approach, as shown in Table 9. Table 9 indicates that with the increase of layer number, the performance of the MFLR network has improved. This may be mainly due to the increase in the number of layers, which enhances the learning ability of the network. When the number of network layers is set to 6, there will be a dimension problem caused by the down-sampling. Therefore, the number of layers in the MFLR network is set to 5.

Table 9

The influence of the layer number for the MFLR network. ↑ indicates the higher the better, and ↓ represents the lower the better

Number of layers	Metrics
	DSC↑	JI↑	Recall↑	Precision↑	F1-score↑	HD95(mm)↓
3	0.8385	0.7230	0.8192	0.8669	0.8424	15.2583
4	0.8462	0.7343	0.8289	0.8716	0.8497	11.5798
5	0.8496	0.7395	0.8300	0.8777	0.8532	8.9406

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Conflict of interest

The authors declare there is no conflict of interest.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 62273082, No. 61773110, and No. 11801065), the Natural Science Foundation of Liaoning Province (No. 20170540312 and No. 2021-YGJC-14), the Basic Scientific Research Project (Key Project) of Liaoning Provincial Department of Education (LJKZ00042021), and Fundamental Research Funds for the Central Universities (No. N2119008). This work was also supported by the Shenyang Science and Technology Plan Fund (No. 21-104-1-24, No. 20-201-4-10, and No. 201375), and the Member Program of Neusoft Research of Intelligent Healthcare Technology, Co. Ltd. (No. MCMP062002).

References

Indumathy

Ramesh

Senthilkumar

Sudha

, Investigations on coronary artery plaque detection and subclassification using machine learning classifier, J Xray Sci Technol 30(3) (2022), 513–529. https://doi.org/10.3233/XST-211077.

Wang

Jasim Taher

Al-Fatlawi

Abdullah

B.A.

Khayatovna Ismailova

and Abedi-Firouzjah

, Multi-parametric assessment of cardiac magnetic resonance images to distinguish myocardial infarctions: A tensor-based radiomics feature, J Xray Sci Technol (2024), 1–15. https://doi.org/10.3233/XST-230307.

Militello

Rundo

Toia

Conti

Russo

Filorizzo

, et al., A semi-automatic approach for epicardial adipose tissue segmentation and quantification on cardiac CT scans, Comput Biol Med 114 (2019), 103424. https://doi.org/10.1016/j.compbiomed.2019.103424.

Song

Sun

Bao

Yang

, Segmentation and volume quantification of epicardial adipose tissue in computed tomography images, Med Phys 49 (2022). https://doi.org/10.1002/mp.15965.

Becker

L.M.

Peper

Verhappen

B.J.L.A.

Swart

L.A.

Dedic

van Dockum

W.G.

, et al., Real world impact of added FFR-CT to coronary CT angiography on clinical decision-making and patient prognosis–IMPACT FFR study, Eur Radiol 33 (2023), 5465–5475. https://doi.org/10.1007/s00330-023-09517-z.

Wang

Bai

Ouyang

, et al., Automated anatomical labeling of coronary arteries via bidirectional tree LSTMs, Int J Comput Assist Radiol Surg 14 (2019), 271–280. https://doi.org/10.1007/s11548-018-1884-6.

Shams

Salem

M.A.M.

Hamad

Shedeed

H.A.

, Coronary artery tree segmentation in computed tomography angiography using Otsu method, 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), IEEE, 2017, 416–420.

Chen

Zhang

Pohl

Syeda-Mahmood

Song

Wong

S.T.C.

, Coronary artery segmentation using geometric moments based tracking and snake-driven refinement, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, IEEE, 2010, 3133–3137.

Zhou

Chan

Chughtai

Kuriakose

Agarwal

Kazerooni

E.A.

, et al., Computerized analysis of coronary artery disease: performance evaluation of segmentation and tracking of coronary arteries in CT angiograms, Med Phys 41(8) (2014), 81912. https://doi.org/10.1118/1.4890294.

10.

Khokhar

Talpur

Khowaja

S.A.

Shah

R.A.

, A novel curvature feature embedded level set method for image segmentation of coronary angiograms, Trends and Advances in Information Systems and Technologies: Volume 2 6, Springer, 2018, 831–841.

11.

Shi

Peng

Zhu

, Two-steps coronary artery segmentation algorithm based on improved level set model in combination with weighted shape-prior constraints, J Med Syst 43 (2019), 1–10. https://doi.org/10.1007/s10916-019-1329-y.

12.

Otsu

, A threshold selection method from gray-level histograms, Automatica 11 (1975), 23–27.

13.

Gao

Liu

Hau

W.K.

Zhang

, Automatic segmentation of coronary tree in CT angiography images, Int J Adapt Control Signal Process 33 (2019), 1239–1247. https://doi.org/10.1002/acs.2762.

14.

Gharleghi

Chen

Sowmya

Beier

, Towards automated coronary artery segmentation: A systematic review, Comput Methods Programs Biomed 225 (2022), 107015. https://doi.org/10.1016/j.cmpb.2022.107015.

15.

Moeskops

Wolterink

J.M.

Van Der Velden

B.H.M.

Gilhuijs

K.G.A.

Leiner

Viergever

M.A.

, et al., Deep learning for multi-task medical image segmentation in multiple modalities, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016:19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, Springer, 478–486.

16.

Kong

Wang

Bai

Gao

Cao

, et al., Learning tree-structured representation for 3D coronary artery segmentation, Comput Med Imaging Graph 80 (2020), 101688. https://doi.org/10.1016/j.compmedimag.2019.101688.

17.

Long

Shelhamer

Darrell

, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 3431–3440.

18.

Shen

Fang

Gao

Xiong

Zhong

Tang

, Coronary arteries segmentation based on 3D FCN with attention gate and level set function, IEEE Access 7 (2019), 42826–42835. https://doi.org/10.1109/ACCESS.2019.2908039.

19.

Tian

Gao

Fang

, Automatic coronary artery segmentation algorithm based on deep learning and digital image processing, Appl Intell 51 (2021), 8881–8895. https://doi.org/10.1007/s10489-021-02197-6.

20.

Szegedy

Vanhoucke

Ioffe

Shlens

Wojna

, Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 2818–2826.

21.

Szegedy

Ioffe

Vanhoucke

Alemi

, Inception-v4, inception-resnet and the impact of residual connections on learning, Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 1–7.

22.

Zhao

J.-X.

Cao

Fan

D.-P.

Cheng

M.-M.

X.-Y.

Zhang

, Contrast prior and fluid pyramid integration for RGBD salient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), 3927–3936.

23.

Liu

J.-J.

Hou

Cheng

M.-M.

Feng

Jiang

, A simple pooling-based design for real-time salient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), 3917–3926.

24.

Yan

Wang

Zhang

Luo

, et al., Attention-guided deep neural network with multi-scale feature fusion for liver vessel segmentation, IEEE J Biomed Heal Informatics 25 (2020), 2629–2642. https://doi.org/10.1109/JBHI.2020.3042069.

25.

Zhu

Song

Yang

, Segmentation of coronary arteries images using spatio-temporal feature fusion network with combo loss, Cardiovasc Eng Technol 13 (2022), 407–418. https://doi.org/10.1007/s13239-021-00588-x.

26.

Chen

L.-C.

Papandreou

Kokkinos

Murphy

Yuille

A.L.

, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Trans Pattern Anal Mach Intell 40(4) (2017), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.

27.

Ronneberger

Fischer

Brox

, U-net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015:18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 234–241.

28.

Hong

Chen

Peng

Yang

, A U-shaped network based on multi-level feature and dual-attention coordination mechanism for coronary artery segmentation of CCTA Images, Cardiovasc Eng Technol 14 (2023), 380–392. https://doi.org/10.1007/s13239-023-00659-1.

29.

Dong

Dai

Zhang

, A novel multi-attention, multi-scale 3D deep network for coronary artery segmentation, Med Image Anal 85 (2023), 102745. https://doi.org/10.1016/j.media.2023.102745.

30.

Chen

L.-C.

Papandreou

Schroff

Adam

, Rethinking atrous convolution for semantic image segmentation, arXiv Prepr arXiv170605587 (2017). https://doi.org/10.48550/arXiv.1706.05587.

31.

Chen

L.-C.

Zhu

Papandreou

Schroff

Adam

, Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision (ECCV) (2018), 801–818.

32.

Zhang

Ren

Sun

, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 770–778.

33.

Lyu

Rezaeitaleshmahalleh

Tang

Jiang

, An attention residual u-net with differential preprocessing and geometric postprocessing: Learning how to segment vasculature including intracranial aneurysms, Med Image Anal 84 (2023), 102697. https://doi.org/10.1016/j.media.2022.102697.

34.

Zheng

Qin

Xie

Yang

Sun

, et al., Alleviating class-wise gradient imbalance for pulmonary airway segmentation, IEEE Trans Med Imaging 40(9) (2021), 2452–2462. https://doi.org/10.1109/TMI.2021.3078828.

35.

Liu

J.-J.

Hou

Cheng

M.-M.

Wang

Feng

, Improving convolutional networks with self-calibrated convolutions, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 10096–10105.

36.

Wang

Chen

Yuan

Liu

Huang

Hou

, et al., Understanding convolution for semantic segmentation, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, 1451–1460.

37.

Liu

Zhang

X.-Y.

Bian

J.-W.

Zhang

Cheng

M.-M.

, SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection, IEEE Trans Image Process 30 (2021), 3804–3814. https://doi.org/10.1109/TIP.2021.3065239.

38.

Gao

S.-H.

Cheng

M.-M.

Zhao

Zhang

X.-Y.

Yang

M.-H.

Torr

, Res2net: A new multi-scale backbone architecture, IEEE Trans Pattern Anal Mach Intell 43(2) (2019), 652–662. https://doi.org/10.1109/TPAMI.2019.2938758.

39.

Szegedy

Liu

Jia

Sermanet

Reed

Anguelov

, et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1–9.

40.

Milletari

Navab

Ahmadi

S.-A.

, V-net: Fully convolutional neural networks for volumetric medical image segmentation, 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 2016, 565–571.

41.

Chen

Huang

, et al., Loss odyssey in medical image segmentation, Med Image Anal 71 (2021), 102035. https://doi.org/10.1016/j.media.2021.102035.

42.

Gharleghi

Adikari

Ellenberger

Webster

Ellis

Sowmya

, et al., Computed tomography coronary angiogram images, annotations and associated data of normal and diseased arteries, arXiv Prepr arXiv221101859 (2022). https://doi.org/10.48550/arXiv.2211.01859.

43.

Çiçek

Ö.

Abdulkadir

Lienkamp

S.S.

Brox

and Ronneberger

, 3D U-Net: learning dense volumetric segmentation from sparse annotation, International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2016, 424–432.

44.

Chen

Dou

Qin

Heng

P.-A.

, VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images, Neuroimage 170 (2018), 446–455. https://doi.org/10.1016/j.neuroimage.2017.04.041.

45.

Bhalerao

Thakur

, Brain tumor segmentation based on 3D residual U-Net, International MICCAI Brainlesion Workshop, Springer, 2019, 218–225.

46.

Mou

Zhao

Liu

Cheng

Zheng

, et al., CS2-Net: Deep learning segmentation of curvilinear structures in medical imaging, Med Image Anal 67 (2021), 101874. https://doi.org/10.1016/j.media.2020.101874.

47.

Yang

Cheng

J.-Z.

Xue

Shi

, et al., Segment aorta and localize landmarks simultaneously on noncontrast CT using a multitask learning framework for patients without severe vascular disease, Comput Biol Med 160 (2023), 107002. https://doi.org/10.1016/j.compbiomed.2023.107002.

48.

Diniz

J.O.B.

Ferreira

J.L.

Cortes

O.A.C.

Silva

A.C.

de Paiva

A.C.

, An automatic approach for heart segmentation in CT scans through image processing techniques and Concat-U-Net, Expert Syst Appl 196 (2022), 116632. https://doi.org/10.1016/j.eswa.2022.116632.

49.

Zhang

Sun

Zheng

Yang

J.-K.

Zhu

, TC-Net: A joint learning framework based on CNN and vision transformer for multi-lesion medical images segmentation, Comput Biol Med 161 (2023), 106967. https://doi.org/10.1016/j.compbiomed.2023.106967.

50.

Song

Teoh

J.Y.-C.

Choi

K.-S.

Qin

, Dynamic Loss Weighting for Multiorgan Segmentation in Medical Images, IEEE Trans Neural Networks Learn Syst (2023), 1–12. https://doi.org/10.1109/TNNLS.2023.3243241.