Development and evaluation of deep learning models for automatic coronary stenosis segmentation in X-ray angiography

Abstract

Accurate segmentation of stenosis in X-ray angiography (XRA) images is crucial for the objective assessment of stenosis severity and subsequent treatment planning in coronary artery disease. Current clinical practice primarily relies on subjective visual evaluation, which suffers from significant inter-observer variability. In this work, we propose a deep learning model enhanced with a novel Hybrid Context-Aware Attention (HCA) module. HCA employs a parallel dual-pathway design that integrates global inter-channel attention and grouped multi-scale spatial aggregation. This integration enhances feature discriminability and spatial-context modeling, leading to more accurate and anatomically consistent stenosis segmentation in XRA. Evaluated on three independent datasets, our method achieves competitive performance against existing approaches across multiple metrics, demonstrating consistent leading performance. Ablation and attention visualization studies further confirm the contribution of the designed module to reducing segmentation errors and enhancing focus on stenotic regions. These findings demonstrate that the proposed model is an effective and generalizable approach for stenosis segmentation in XRA, with the potential to support standardized assessment in clinical practice.

Keywords

coronary artery stenosis segmentation XRA deep learning hybrid context-aware attention

Introduction

Accurate assessment of coronary artery stenosis is fundamental for the diagnosis of coronary artery disease and the planning of interventional therapies.¹ In clinical practice, X-ray angiography (XRA) is the preferred modality for stenosis evaluation and revascularization planning, due to its accessibility and real-time imaging capabilities.^2,3 However, the current clinical workflow relies heavily on visual estimation of stenosis from XRA images. This subjective approach introduces significant inter-observer variability, which can directly affect treatment decisions and patient outcomes.^4,5 The growing adoption of robot assisted percutaneous coronary intervention (PCI) further underscores the need for automated, precise, and quantitative stenosis analysis.^6–9 Despite progress, accurate automatic stenosis segmentation remains highly challenging due to complex vascular morphology, non-uniform contrast enhancement, vessel overlap, and subtle stenosis appearances.

Recent studies have proposed various methods to address these issues. Han et al.¹⁰ proposed a transformer-based framework for stenosis detection in coronary XRA sequences, using spatio-temporal feature extraction and long-range dependency modeling. Wang et al.¹¹ proposed an enhanced YOLO model for detecting and classifying coronary artery stenosis in XRA. Molenaar et al.¹² used a U-Net with a ResNet-101 encoder to segment coronary arteries in XRA and detect stenosis via vessel diameter profiling. While several methods can effectively identify lesion regions,^13–16 they often lack the pixel-level delineation needed to quantify key morphological features such as vessel diameter. Since these measurements are essential for clinical assessment, recent research has shifted toward pixel-level stenosis segmentation. The 2023 ARCADE Challenge introduced the coronary artery stenosis segmentation task based on XRA and publicly released a large-scale annotated dataset.¹⁷ For this task, Lee et al.¹⁸ achieved top performance using a semi-supervised YOLOv8 m model. Subsequently, Abedin et al.¹⁹ proposed DenseSelfMA-Net, which employs a DenseNet121 encoder and integrates Self-ONN into multi-scale attention modules in the decoder for enhanced feature representation. Lalinia et al.²⁰ further improved the Swin UNETR model by integrating adapter-efficient tuning and vessel-aware processing. Despite these advances, current methods still face limitations in accuracy and lack thorough validation across independent multi-center datasets.

To address these challenges, we build upon the adaptability of the YOLO series in related medical imaging tasks,^21–24 and propose an enhanced deep learning model for automated coronary stenosis segmentation in XRA. The model is designed to improve both accuracy and generalization across diverse datasets. The main contributions of this study are as follows:

We propose a novel Hybrid Context-Aware Attention (HCA) module. Integrated into a YOLOv11-based segmentation architecture, it jointly captures channel-wise dependencies and multi-scale spatial context to enhance relevant features and suppress noise, thereby improving localization accuracy.

We provide comprehensive multi-center validation. Extensive experiments on three independent datasets demonstrate the model's superior and consistent segmentation performance compared to state-of-the-art methods.

Materials and methods

Datasets

To evaluate the model's performance across diverse data sources, we used three coronary XRA image datasets: two public datasets and one local dataset.

Local dataset: This dataset comprises 557 XRA images from 254 patients (182 male, 72 female; mean age 63 ± 10 years) at a local hospital, with approval from the institutional ethics committee. For each patient, trained radiologists selected 1–3 angiographic sequences (from eight standard views) that clearly displayed stenosis in one of the major coronary branches: the left anterior descending (LAD), left circumflex (LCX), or right coronary artery (RCA). From each sequence, 1–2 key frames with optimal contrast opacification were extracted, resulting in a final set of images with resolutions ranging from 512 × 512 to 1016 × 1016 pixels. Importantly, while the frames selected for annotation follow clinical practice, the underlying angiographic sequences were collected consecutively without preselection. The dataset covers a wide range of clinical conditions, including single-, double-, and triple-vessel diseases, varying stenosis severity (mild, moderate, severe), and morphology (focal, diffuse), as well as diverse angiographic views including left anterior oblique, right anterior oblique, cranial, caudal, and others.

ARCADE-stenosis dataset: We utilized a subset of the public ARCADE dataset,¹⁷ which contains 1500 coronary XRA images (512 × 512 pixels), each annotated with at least one stenotic region. The dataset is officially split into 1000 training, 200 validation, and 300 test images. To ensure clinical diversity, it maintains a balanced distribution of lesions across the three major coronary arteries (LAD, LCX, RCA), with all stenotic regions annotated using polygonal labels.

ICA-stenosis dataset: This subset was derived from the public ICA dataset.²⁵ From the initial 616 images, we applied exclusion criteria based on clinical relevance and image quality: images without significant stenosis (n = 179), with occlusion (n = 2), overlapping vascular structures (n = 15), pseudostenosis due to RCA curvature (n = 4), or poor image quality (n = 36) were removed, yielding a final set of 380 images for evaluation.

Data annotation

Under cardiology expert guidance, trained radiologists performed pixel-level annotations of coronary stenosis for the local and ICA-stenosis datasets using ITK-SNAP 3.8.0.²⁶ Representative ground truth annotations from the public ARCADE-stenosis dataset and the local dataset are shown in Figure 1.

Figure 1.

XRA images and corresponding stenosis annotations. (a, b) Examples from the ARCADE-stenosis dataset showing annotated lesions in the left and right coronary arteries, respectively. (c, d) Corresponding examples from the Local dataset (see online version for color distinctions).

Method overview

The overall workflow of the proposed method is illustrated in Figure 2. It consists of three main stages: preparation of multi-center datasets, development and training of a stenosis segmentation model, and comprehensive evaluation of the model across all datasets with multiple evaluation metrics.

Figure 2.

The workflow of this study.

Network architecture

The architecture of the proposed model, as shown in Figure 3, comprises three main components: the backbone, the neck, and the head. The backbone serves as the primary feature extractor, transforming the input image into a hierarchy of multi-scale feature maps through successive downsampling. The neck fuses and refines these multi-scale features before forwarding them to the head. The head simultaneously generates a bounding box and a segmentation mask for each target based on the refined feature. To enhance feature discriminability, we design the Hybrid Context-Aware Attention (HCA) module and place it between the neck and the head. This placement allows the module to operate on fused multi-scale features, directly refining task-relevant representations before final prediction while avoiding interference with earlier feature extraction or fusion processes. This module adaptively refines features by jointly modeling global channel dependencies and multi-scale spatial context, thereby highlighting stenosis-relevant regions while suppressing irrelevant backgrounds and healthy vessel patterns.

Figure 3.

The overall architecture of the proposed model, consisting of three components: the backbone, the neck, and the head. (a)-(e) detail the inner structures of the Conv, Bottleneck, C3k, C3k2, and Segment blocks, respectively.

Hybrid context-aware attention module

Stenosis occupies a minimal area in coronary XRA images, resulting in a severe class imbalance that biases models towards the majority background class, leading to reduced sensitivity and accuracy. To address this, we propose a Hybrid Context-Aware Attention (HCA) module. As shown in Figure 4, the HCA comprises two parallel branches: the Channel-Context Modeling (CCM) branch and the Spatial-Scale Modeling (SSM) branch. The CCM branch captures global inter-channel dependencies to enhance discriminative features, while the SSM branch models multi-scale spatial-contextual correlations to focus on relevant structures. Their outputs are fused to adaptively highlight stenosis-relevant patterns and suppress irrelevant backgrounds. This parallel design allows the model to simultaneously refine features from both dimensions without compromising either, enhancing sensitivity to small stenosis regions while suppressing irrelevant backgrounds.

Figure 4.

The proposed Hybrid Context-Aware Attention (HCA) module for feature refinement. (a) Overall structure. (b) Channel-context modeling (CCM) branch. (c) Spatial-scale modeling (SSM) branch.

The CCM branch processes the input $X \in R^{C \times H \times W}$ through three separate 1 × 1 convolutions to generate projections V, $V^{'}$ , Q, while $Q^{'}$ is obtained by reshaping $X .$ Here, $V = C_{v} (X)$ , $V^{'} = C_{v^{'}} (X)$ , $Q = C_{q} (X)$ , $Q^{'} = f_{r e s} (X)$ . It computes channel attention through a sequence of operations formulated as:

{\begin{matrix} \hat{V} = C_{w} (f_{u s q} (f_{r e s} (V) \otimes σ_{s m} (f_{r e s} (V^{'})))) \\ \hat{Q} = f_{r e s} (f_{u s q} (σ_{s m} (f_{r e s} (Q))) \otimes f_{u s q} (Q^{'})) \\ C H_{W} = σ_{s i g} (f_{l n} (\hat{V})) + \hat{Q} \\ C H_{W}^{'} = C_{t 2} (σ_{r e l u} (f_{l n} (C_{t 1} (C H_{W})))) \end{matrix}

(1)

where

C_{v}

C_{v^{'}}

and

C_{q}

are 1 × 1 convolutions layers with different output channels, while

C_{w}

C_{t 1}

and

C_{t 2}

have distinct input and output channel dimensions;

σ_{s m}

σ_{s i g}

and

σ_{r e l u}

denote Softmax, Sigmoid and ReLU activation functions, respectively;

f_{u s q}

f_{r e s}

, and

f_{l n}

denote Unsqueeze, Reshape, and LayerNorm operations, respectively;

\otimes

denotes the matrix multiplication.

In parallel, the SSM branch splits the input X into G groups along the channel dimension. For each group $X_{g}$ , it extracts multi-scale features via adaptive pooling and a 3 × 3 convolution, followed by cross-interaction to produce spatial weights. The process is summarized as:

{\begin{matrix} M_{1} = σ_{s i g} {f_{s p} [C_{1 \times 1} (f_{c a t} (P_{w} (X_{g}), P_{h} (X_{g})))]} ⊙ X_{g} \\ M_{2} = C_{3 \times 3} (X_{g}) \\ M_{11} = σ_{s m} (f_{r e s} (P_{g} (f_{g n} (M_{1}))) \\ M_{12} = f_{r e s} (f_{g n} (M_{1})) \\ M_{21} = f_{r e s} (M_{2}) \\ M_{22} = σ_{s m} (f_{r e s} (P_{g} (M_{2}))) \\ S P_{W} = σ_{s i g} (M_{11} \otimes M_{21} + M_{22} \otimes M_{12}) \\ X_{2} = f_{r e s} (S P_{W} ⊙ X_{g}) \end{matrix}

(2)

where

C_{1 \times 1}

and

C_{3 \times 3}

are 1 × 1 and 3 × 3 convolutional layers, respectively;

P_{w}

P_{h}

and

P_{g}

are average pooling layers;

f_{c a t}

f_{s p}

f_{g n}

represent concatenation, split, and group normalization operations, respectively;

⊙

represent element-wise multiplication;

M_{1}

and

M_{2}

denote intermediate feature maps, and

M_{11}

M_{12}

M_{21}

M_{22}

are their transformed versions. Finally, the outputs of both branches are combined to produce the refined feature map:

X^{'} = (C H_{W}^{'} ⊙ X) + X_{2}

(3)

Loss functions

The proposed model's loss function integrates four components for joint optimization: classification loss $L_{c l s}$ , bounding box loss $L_{b o x}$ ,²⁷ distribution focal loss $L_{d f l}$ ,²⁸ and mask loss $L_{m a s k}$ . The total loss is defined as:

L_{t o t a l} = λ_{c} \cdot L_{c l s} + λ_{b} \cdot L_{b o x} + λ_{d} \cdot L_{d f l} + λ_{m} \cdot L_{m a s k}

(4)

Following the baseline implementation,²⁴ the weighting coefficients are set to $λ_{c}$ =0.5, $λ_{b}$ =7.5, $λ_{d}$ =1.0, and $λ_{m}$ =7.5.

Here, the classification loss $L_{c l s}$ is the standard binary cross-entropy loss for stenosis presence classification. It is defined as follows:

L_{c l s} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} l o g (p_{i}) + (1 - y_{i}) l o g (1 - p_{i})]

(5)

where N represents the total number of samples;

y_{i}

denotes the ground-truth label of the i -th sample;

p_{i}

denotes the predicted probability of the i -th sample belonging to the positive class.

The bounding box loss $L_{b o x}$ adopts the complete intersection over union (CIoU) formulation²⁷ to accurately localize lesions by considering overlap, center distance, and aspect ratio. It is defined as:

{\begin{matrix} L_{b o x} = 1 - I o U + \frac{p^{2} (b, b^{g t})}{C^{2}} + a v \\ I o U = \frac{| B \cap B^{g t} |}{| B \cup B^{g t} |} \\ v = \frac{4}{π^{2}} {(a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})}^{2} \\ a = \frac{v}{(1 - I o U) + v} \end{matrix}

(6)

where B and

B^{g t}

denote the predicted bounding box and ground-truth box, respectively; b and

b^{g t}

represent the center coordinates of the predicted and ground-truth boxes, respectively;

p^{2} (b, b^{g t})

denotes the squared euclidean distance between the two centers; C represents the diagonal length of the smallest enclosing rectangle that covers both bounding boxes. The terms

w^{g t}

and w,

h^{g t}

and h represent the width and height of the ground-truth box and predicted box, respectively.

To further refine regression, we employ Distributional Focal Loss $L_{d f l}$ ,²⁸ which directs the model's attention toward the true bounding-box distribution. It is defined as:

L_{d f l} = - ((y_{i + 1} - y) l o g (S_{i}) + (y - y_{i}) l o g (S_{i + 1}))

(7)

where y denotes the ground-truth target value, while

y_{i}

and

y_{i + 1}

represent its left and right nearest discrete bin boundaries, respectively.

S_{i}

and

S_{i + 1}

represent the predicted probability values at

y_{i}

and

y_{i + 1}

locations output by the network.

Finally, the mask loss $L_{m a s k}$ applies pixel-wise binary cross-entropy within the ground truth mask region, promoting precise boundary delineation and regional consistency for stenosis segmentation. It is formally defined as:

{\begin{matrix} L_{m a s k} = \frac{1}{N} \sum_{(x, y) \in m a s k} B C E (p (x, y), g (x, y)) \\ B C E = - [g l o g (p) + (1 - g) l o g (1 - p)] \end{matrix}

(8)

where N is the number of pixels in the ground-truth mask, and “mask” denotes the ground-truth mask region.

p (x, y)

and

g (x, y)

represent the predicted probability of being the target class and the ground-truth label at position

(x, y)

, respectively.

Implementation details

The model was trained, validated, and tested on the pre-split ARCADE-stenosis dataset (1500 images). To assess generalization, performance was further evaluated on two test sets: an independent local dataset and a second public coronary stenosis dataset. All XRA images were uniformly preprocessed by resizing to 640 × 640 pixels, converting to RGB, scaling to [0, 1], and normalizing using the mean and standard deviation to ensure input consistency. Model weights were initialized from COCO pre-trained checkpoints.²⁹ Extensive online augmentation was applied during training, including random rotation (±45°), scaling (0.5 to 1.5), horizontal flipping, HSV adjustments (hue: ±2.7°; saturation: 0.3 to 1.7; value: 0.6 to 1.4), translation (max shift ratio: 0.3), random cropping (crop ratio: 0.8), and median blur (7 × 7 kernel). We used SGD with a base learning rate of 0.01, weight decay of 5 × 10⁻⁴, and a cosine annealing scheduler with warm restarts. Training ran for 100 epochs with a batch size of 4, implemented in PyTorch 2.0.0 on a server with four NVIDIA RTX 4090 GPUs.

Evaluation metrics

Model performance was evaluated using six standard evaluation metrics. Precision, Recall, F1 score, and IoU are computed at the pixel level, while HD95 and ASSD are boundary-based metrics computed from segmentation boundaries.³⁰ For each image, true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) are counted by pixel-wise comparison between the prediction and the ground truth. Based on these values, the corresponding evaluation metrics for the image are computed. The final reported result is the average of these per-image metric values across the entire test set.

Precision, Recall, F1 score, and IoU are region-based metrics that reflect overall segmentation accuracy. Precision reflects the proportion of predicted stenosis regions that overlap with ground truth. Recall indicates the proportion of actual stenosis regions that are correctly detected by the model. The F1 combines both metrics to provide a balanced assessment of the model's overall performance. IoU measures the normalized overlap between predicted and ground-truth stenosis regions. Their definitions are:

{\begin{matrix} P r e c i s i o n = \frac{T P}{T P + F P} \\ R e c a l l = \frac{T P}{T P + F N} \\ F 1 = \frac{2 T P}{2 T P + F P + F N} \\ I o U = \frac{T P}{T P + F P + F N} \end{matrix}

(9)

where TP, FP, and FN denote the numbers of true positive, false positive, and false negative pixels, respectively. These region-based metrics are used to comprehensively evaluate overall segmentation accuracy.

HD95 (95% Hausdorff Distance) and ASSD (Average Symmetric Surface Distance) are introduced as boundary-based metrics to assess the geometric accuracy of segmentation boundaries. HD95 quantifies the boundary distance error by calculating the 95th percentile of the set of all minimum distances between the predicted boundary points and the ground-truth boundary points (in both directions). ASSD measures the average distance between the predicted and ground-truth boundaries. They are defined as:

{\begin{matrix} H D 95 = 95^{t h} p e r c e n t i l e o f {min_{g \in G} d (p, g) : p \in P} \cup {min_{p \in P} d (p, g) : g \in G} \\ A S S D = \frac{1}{| P | + | G |} (\sum_{p \in P} min_{g \in G} d (p, g) + \sum_{g \in G} min_{p \in P} d (g, p)) \end{matrix}

(10)

where P and G represent the boundary pixels of the predicted segmentation and ground truth, respectively, and

d (p, g)

denotes the Euclidean distance between points p and g. These boundary-based metrics effectively assess boundary quality, which is important for downstream clinical quantitative analysis. Since the ARCADE-stenosis and ICA-stenosis datasets do not provide pixel spacing information, HD95 and ASSD are reported in pixels to ensure consistency across all three test sets. The local dataset is also reported in pixels for uniformity.

Statistical analysis

The Wilcoxon signed-rank test was performed to compare the proposed model with each comparative method, based on the results from five repeated runs with different random seeds. Comparisons were made using F1, Precision, Recall, IoU, HD95, and ASSD on each test set separately. Given the limited number of runs, the non-parametric test was adopted without assuming normality of the distribution. Statistical significance was defined as p < 0.05, and the reported results are presented as mean ± standard deviation.

Results

Segmentation performance and efficiency comparison

To evaluate the effectiveness of the proposed model, we conducted comparative experiments with various segmentation networks.^24,31–38 All models were trained and evaluated under identical conditions, using the same dataset, preprocessing pipeline, data augmentation strategies, and training settings. All experiments were repeated five times with different random seeds, and the results are reported as mean ± standard deviation. As shown in Tables 1 and 2, our proposed method achieved better performance than most comparative approaches on the public ARCADE-stenosis dataset across both region-based metrics (F1: 0.6893 ± 0.0079, IoU: 0.5623 ± 0.0072) and boundary-based metrics (HD95: 46.26 ± 1.74 pixels, ASSD: 18.34 ± 1.29 pixels). On the local dataset, our method achieved the best performance across most metrics, including F1, Recall, IoU, HD95, and ASSD, with the only exception being Precision, where YOLOv8 was marginally higher. Wilcoxon signed rank test confirmed significant improvements across most comparisons. The only non-significant comparisons were Precision against YOLOv8 on both datasets, ASSD against nnUNet on ARCADE, and Precision against LightM Unet on the local dataset. Notably, F1 and IoU improvements were significant across all comparisons (p < 0.05), indicating more accurate stenosis localization. The proposed method also achieved an effective balance between accuracy and efficiency. As illustrated in Figure 5, it surpassed computationally heavy models such as U-Mamba-Bot and Mask-Conv2 in F1 score while requiring fewer floating-point operations (lower GFLOPs) and parameters (smaller model size). At the same time, it remained more accurate than lightweight models like LightM-Unet, which sacrificed performance for speed. These results confirm its ability to provide high performance with moderate computational demands.

Figure 5.

Relationship between model performance (F1 score), computational cost (GFLOPs), and model size on the ARCADE-stenosis dataset. Colors indicate model categories; bubble size represents model size (number of parameters); center point positions correspond to F1 score and GFLOPs values.

Table 1.

Performance comparison across methods on ARCADE-stenosis dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
Swin-Unet³¹	0.4278 ± 0.0367*	0.5064 ± 0.0150*	0.4424 ± 0.0583*	0.3051 ± 0.0286*	93.58 ± 8.81*	34.85 ± 9.03*
nnUNetv2³²	0.6126 ± 0.0047*	0.6926 ± 0.0107*	0.6277 ± 0.0091*	0.4705 ± 0.0046*	66.35 ± 1.82*	18.43 ± 1.24
LightM-Unet³³	0.6131 ± 0.0054*	0.6878 ± 0.0218*	0.6298 ± 0.0124*	0.4713 ± 0.0044*	68.77 ± 1.90*	19.60 ± 1.08*
U-Mamba-Bot³⁴	0.6118 ± 0.0079*	0.6826 ± 0.0161*	0.6306 ± 0.0245*	0.4701 ± 0.0062*	68.63 ± 1.76*	19.99 ± 1.61*
Mask R-CNN³⁵	0.4726 ± 0.0324*	0.5041 ± 0.0607*	0.5500 ± 0.0984*	0.3337 ± 0.0290*	125.00 ± 18.32*	36.48 ± 5.34*
Mask-Conv2³⁶	0.5204 ± 0.0293*	0.5682 ± 0.0285*	0.5624 ± 0.0487*	0.3860 ± 0.0213*	96.59 ± 9.25*	37.41 ± 10.39*
YOLOv8³⁷	0.6609 ± 0.0065*	0.7464 ± 0.0103	0.6470 ± 0.0067*	0.5333 ± 0.0074*	56.24 ± 3.11*	23.81 ± 2.73*
YOLOv11²⁴	0.6526 ± 0.0142*	0.7030 ± 0.0178*	0.6564 ± 0.0163*	0.5301 ± 0.0126*	65.21 ± 7.21*	34.62 ± 5.93*
YOLOv13³⁸	0.6421 ± 0.0072*	0.6857 ± 0.0136*	0.6575 ± 0.0085*	0.5175 ± 0.0068*	66.05 ± 4.52*	32.49 ± 3.77*
Ours	0.6893 ± 0.0079	0.7350 ± 0.0149	0.6894 ± 0.0079	0.5623 ± 0.0072	46.26 ± 1.74	18.34 ± 1.29

*p < 0.05 compared with our method.

Table 2.

Performance comparison across methods on local dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
Swin-Unet³¹	0.3891 ± 0.0265*	0.3992 ± 0.0196*	0.4749 ± 0.0566*	0.2598 ± 0.0191*	101.51 ± 9.26*	36.23 ± 9.36*
nnUNetv2³²	0.4580 ± 0.0181*	0.5077 ± 0.0049*	0.4854 ± 0.0373*	0.3233 ± 0.0143*	115.39 ± 7.69*	44.23 ± 4.05*
LightM-Unet³³	0.4284 ± 0.0753*	0.4900 ± 0.0559	0.4457 ± 0.0829*	0.2999 ± 0.0601*	121.83 ± 13.55*	45.95 ± 8.59*
U-Mamba-Bot³⁴	0.4559 ± 0.0206*	0.4997 ± 0.0157*	0.4843 ± 0.0272*	0.3230 ± 0.0162*	116.38 ± 7.79*	45.25 ± 5.45*
Mask R-CNN³⁵	0.4004 ± 0.0655*	0.3811 ± 0.0204*	0.5117 ± 0.1507*	0.2707 ± 0.0473*	143.78 ± 30.17*	52.07 ± 26.19*
Mask-Conv2³⁶	0.4614 ± 0.0432*	0.4787 ± 0.0417*	0.5046 ± 0.0470*	0.3288 ± 0.0347*	119.88 ± 22.8*	60.02 ± 18.39*
YOLOv8³⁷	0.5572 ± 0.0062*	0.5292 ± 0.0039	0.6360 ± 0.0126*	0.4063 ± 0.0053*	63.84 ± 2.57*	23.69 ± 2.15*
YOLOv11²⁴	0.5738 ± 0.0064*	0.5110 ± 0.0101*	0.6944 ± 0.0080*	0.4216 ± 0.0061*	62.93 ± 3.33*	25.59 ± 2.60*
YOLOv13³⁸	0.5802 ± 0.0094*	0.5119 ± 0.0128*	0.7093 ± 0.0116*	0.4263 ± 0.0073*	55.89 ± 5.83*	19.89 ± 3.88*
Ours	0.5901 ± 0.0040	0.5216 ± 0.0055	0.7179 ± 0.0125	0.4338 ± 0.0029	53.43 ± 5.51	16.47 ± 3.00

*p < 0.05 compared with our method.

Visual analysis

Figures 6 and 7 present segmentation results from different models on coronary XRA images. Visual evaluation indicates that our model provided competitive performance across diverse stenosis types, including focal, diffuse, bifurcation, and multi-vessel diseases. The model generated bounding boxes and segmentation masks in parallel based on shared feature representations. This approach effectively reduced interference from non-target areas, ensuring precise localization and anatomically consistent segmentation. In contrast, direct segmentation approaches such as nnUNetv2, Swin-Unet, and LightM-Unet often produced excessive fragmentation and false-positive predictions in non-stenosis regions, leading to a notable decline in accuracy. Compared to existing YOLO-series models, our model reduced segmentation errors and produced more complete stenosis segmentations.

Figure 6.

Visualization comparison of segmentation results on ARCADE-stenosis dataset. Column 1 shows the ground truth; Columns 2–6 represent the outputs of different methods. Red boxes indicate segmentation errors (see online version for color distinctions).

Figure 7.

Visualization comparison of segmentation results on local dataset. Column 1 shows the ground truth; Columns 2–6 represent the outputs of different methods. Red boxes indicate segmentation errors (see online version for color distinctions).

Ablation study

To evaluate the contribution of the HCA modules, we conducted ablation experiments using YOLOv11 as the baseline, with five repeated runs using different random seeds to assess model stability. Four model variants were evaluated: the baseline, the baseline with only the CCM branch, the baseline with only the SSM branch, and the baseline with the integrated HCA module. As summarized in Tables 3 and 4, the CCM branch consistently improved model performance across both datasets. On the ARCADE-stenosis dataset, it increased the F1 score from 0.6526 to 0.6759, reduced HD95 from 65.21 to 50.45 pixels, and reduced ASSD from 34.62 to 20.50 pixels. On the local dataset, it improved F1 from 0.5738 to 0.5809 and reduced ASSD from 25.59 to 20.21 pixels. The SSM branch showed improvement on the ARCADE-stenosis dataset, increasing F1 to 0.6712 and reducing HD95 to 52.73 pixels, but yielded slightly lower performance on the local dataset. Notably, the full HCA module combining both branches achieved the best overall performance across all metrics, with F1 reaching 0.6893 on ARCADE-stenosis and 0.5901 on the local dataset, indicating that CCM and SSM are complementary and their joint use produces a synergistic effect. The results across five repeated runs demonstrate the stability of the proposed model.

Table 3.

Ablation study results on ARCADE-stenosis dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
Baseline	0.6526 ± 0.0142	0.7030 ± 0.0178	0.6564 ± 0.0163	0.5301 ± 0.0126	65.21 ± 7.21	34.62 ± 5.93
Baseline + CCM	0.6759 ± 0.0058	0.7271 ± 0.0110	0.6782 ± 0.0129	0.5472 ± 0.0056	50.45 ± 1.83	20.50 ± 1.33
Baseline + SSM	0.6712 ± 0.0040	0.7157 ± 0.0061	0.6796 ± 0.0037	0.5444 ± 0.0042	52.73 ± 3.06	22.19 ± 3.01
Baseline + HCA	0.6893 ± 0.0079	0.7350 ± 0.0149	0.6894 ± 0.0079	0.5623 ± 0.0072	46.26 ± 1.74	18.34 ± 1.29

Table 4.

Ablation study results on local dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
Baseline	0.5738 ± 0.0064	0.5110 ± 0.0101	0.6944 ± 0.0080	0.4216 ± 0.0061	62.93 ± 3.33	25.59 ± 2.60
Baseline + CCM	0.5809 ± 0.0117	0.5110 ± 0.0120	0.7133 ± 0.0115	0.4266 ± 0.0100	56.39 ± 4.81	20.21 ± 2.83
Baseline + SSM	0.5669 ± 0.0046	0.5070 ± 0.0061	0.6906 ± 0.0068	0.4136 ± 0.0037	62.98 ± 3.12	23.02 ± 2.44
Baseline + HCA	0.5901 ± 0.0040	0.5216 ± 0.0055	0.7179 ± 0.0125	0.4338 ± 0.0029	53.43 ± 5.51	16.47 ± 3.00

Figure 8 presents representative segmentation results from the ablation study. The baseline model showed missed detections and false positives, particularly in regions of mild stenosis, complex branching, or low contrast. After integrating the HCA module, false positives in non-stenotic areas were effectively suppressed, while segmentation of focal and mild stenosis was improved. This led to enhanced localization accuracy, structural consistency, and a reduction in both missed detections and over-segmentation, producing more anatomically plausible results.

Figure 8.

Visualization comparison of stenosis segmentation results in the ablation study. Blue and red boxes indicate false negative and false positive results, respectively (see online version for color distinctions).

Comparison of different attention mechanisms

To evaluate the effectiveness of the proposed HCA module, we compared it against several representative attention modules, including GC,³⁹ PSA,⁴⁰ EMA,⁴¹ and DarkIR,⁴² within the same baseline framework under identical training conditions. All experiments were repeated five times with different random seeds to ensure stability, and the reported results are the averages. We retained the default structures from the original papers to evaluate plug-and-play performance. As summarized in Tables 5 and 6, the HCA module achieved the best F1, Precision, Recall, IoU, HD95, and ASSD scores among all evaluated attention mechanisms on both datasets. These results demonstrate the effectiveness of the HCA module within the YOLOv11 framework under unified training settings. Figure 9 further shows that the high-activation regions (red) in the HCA Grad-CAM visualizations exhibit the closest overlap with the ground truth, indicating precise localization. Meanwhile, the low-activation regions (blue) effectively suppress responses in complex vascular bifurcations and background interference, demonstrating improved specificity.

Figure 9.

Comparison of Grad-CAM from different attention modules. The first column shows the ground truth; subsequent columns display attention activation maps (red: high, blue: low). Red boxes indicate erroneous activations in non-stenosis regions (see online version for color distinctions).

Table 5.

Performance comparison of different attention mechanisms on ARCADE-stenosis dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
GC	0.6666 ± 0.0107	0.7206 ± 0.0107	0.6685 ± 0.0100	0.5391 ± 0.0105	51.53 ± 3.81	20.53 ± 2.70
PSA	0.6471 ± 0.0041	0.7018 ± 0.0075	0.6483 ± 0.0099	0.5221 ± 0.0039	61.83 ± 2.79	29.02 ± 2.30
EMA	0.6712 ± 0.0040	0.7157 ± 0.0061	0.6796 ± 0.0037	0.5444 ± 0.0042	52.73 ± 3.06	22.19 ± 3.01
DarkIR	0.6787 ± 0.0057	0.7165 ± 0.0098	0.6887 ± 0.0060	0.5530 ± 0.0050	51.63 ± 2.74	23.33 ± 2.58
HCA	0.6893 ± 0.0079	0.7350 ± 0.0149	0.6894 ± 0.0079	0.5623 ± 0.0072	46.26 ± 1.74	18.34 ± 1.29

Table 6.

Performance comparison of different attention mechanisms on local dataset.

Method	F1	Precision	Recall	IoU	HD95 (pixels)	ASSD (pixels)
GC	0.5738 ± 0.0063	0.5109 ± 0.0089	0.6986 ± 0.0058	0.4197 ± 0.0055	59.19 ± 2.96	20.95 ± 2.33
PSA	0.5784 ± 0.0068	0.5164 ± 0.0078	0.6991 ± 0.0073	0.4238 ± 0.0063	58.86 ± 3.88	20.79 ± 2.73
EMA	0.5669 ± 0.0046	0.5070 ± 0.0061	0.6906 ± 0.0068	0.4136 ± 0.0037	62.98 ± 3.12	23.02 ± 2.44
DarkIR	0.5641 ± 0.0058	0.5007 ± 0.0040	0.6893 ± 0.0110	0.4137 ± 0.0046	66.44 ± 3.80	29.62 ± 3.29
HCA	0.5901 ± 0.0040	0.5216 ± 0.0055	0.7179 ± 0.0125	0.4338 ± 0.0029	53.43 ± 5.51	16.47 ± 3.00

External test

To further test the generalization and clinical applicability of the proposed method, we evaluated it on the public ICA-stenosis dataset using the model trained solely on ARCADE-stenosis data. All experiments were repeated five times with different random seeds. As shown in Table 7, our method achieved the best F1 score, Recall, IoU, and boundary metrics among all compared approaches, with a competitive Precision. Statistical testing confirmed significant improvements across most metrics, except for Precision. This demonstrates its better generalization capability compared to other methods.

Table 7.

External validation on the ICA-stenosis dataset.

Method	F1	Precision	Recall	IoU	HD95	ASSD
Swin-Unet	0.2943 ± 0.0410*	0.3483 ± 0.0113*	0.3412 ± 0.0712*	0.1968 ± 0.0291*	125.88 ± 15.27*	58.29 ± 16.88*
nnUNetv2	0.3951 ± 0.0085*	0.4758 ± 0.0109	0.4016 ± 0.0124*	0.2819 ± 0.0059*	120.05 ± 4.92*	67.42 ± 5.57*
LightM-Unet	0.3597 ± 0.0561*	0.4268 ± 0.0370*	0.3749 ± 0.0558*	0.2539 ± 0.0469*	129.35 ± 7.23*	74.07 ± 13.52*
U-Mamba-Bot	0.4126 ± 0.0214*	0.4720 ± 0.0200	0.4300 ± 0.0294*	0.2978 ± 0.0170*	119.55 ± 6.79*	68.61 ± 5.83*
Mask R-CNN	0.3293 ± 0.0714*	0.3192 ± 0.0330*	0.4168 ± 0.1383*	0.2255 ± 0.0505*	158.14 ± 31.42*	79.97 ± 35.43*
Mask-Conv2	0.3525 ± 0.0638*	0.3733 ± 0.0505*	0.3846 ± 0.0826*	0.2545 ± 0.0496*	155.83 ± 22.28*	97.60 ± 19.90*
YOLOv8	0.4893 ± 0.0060*	0.4501 ± 0.0047*	0.5831 ± 0.0135*	0.3638 ± 0.0047*	93.57 ± 5.16*	55.78 ± 4.74*
YOLOv11	0.4962 ± 0.0138*	0.4237 ± 0.0121*	0.6402 ± 0.0187*	0.3687 ± 0.0110*	93.30 ± 6.86*	58.93 ± 5.38*
YOLOv13	0.4815 ± 0.0077*	0.4170 ± 0.0077*	0.6144 ± 0.0176*	0.3560 ± 0.0053*	94.63 ± 5.85*	56.23 ± 4.67*
Ours	0.5320 ± 0.0070	0.4538 ± 0.0055	0.6818 ± 0.0176	0.3962 ± 0.0052	72.35 ± 3.45	38.78 ± 2.65

*p < 0.05 compared with our method.

Discussion

Accurate coronary stenosis segmentation is essential for objective assessment of stenosis severity. This study proposed an improved segmentation model enhanced with a Hybrid Context-Aware Attention (HCA) module, which refines features through dual complementary pathways. Comparative experiments demonstrated that the proposed model achieved competitive overall performance across multiple evaluation metrics on three multi-center datasets. On the public ARCADE-stenosis dataset, our model achieved the best overall segmentation performance. It also maintained this leading performance on both the local dataset and the external dataset, demonstrating consistent generalization.

The model's segmentation performance across multi-center data stems from the superior object detection ability of the YOLO framework. This strength is demonstrated by the leading results of this framework in the coronary stenosis segmentation task of the ARCADE Challenge.⁴³ Our experimental results confirmed that this framework selection led to higher segmentation accuracy. It achieved competitive performance against both traditional segmentation models such as U-Net and nnUNet, and recent competitive architectures like Mamba-based networks. Compared to two-stage architectures (e.g., Mask R-CNN) that perform detection before segmentation and thus suffer from inherent cascaded error propagation, our single-stage YOLOv11-based framework performs bounding box regression and mask segmentation simultaneously and in parallel from shared feature representations, avoiding this issue. To further enhance this ability, we integrated the HCA module into the YOLOv11 framework. Its CCM branch enhances feature discriminability through global inter-channel attention, while the SSM branch aggregates multi-scale context, enabling precise localization and effective background suppression. Grad-CAM visualizations show that HCA adaptively assigns higher weights to stenosis regions, emphasizing anatomically relevant structures. Ablation studies confirmed the module's contribution, as its inclusion consistently improves F1 scores and reduces errors in complex regions. The full integration of the HCA module provided the best overall performance.

A current semi-automatic method that incorporates expert input can achieve high boundary precision,⁴⁴ but it requires manual initialization and expert interpretation to locate the stenosis. To enable efficient clinical workflows, this study focused on fully automatic segmentation to support subsequent quantitative clinical assessment. The proposed model represents a substantive step in this direction, achieving competitive performance in segmenting stenotic lesions on multi-center datasets.

Several limitations of the proposed method require further investigation. First, while the model demonstrates consistent generalization across datasets, its segmentation performance declined on the external test sets. This decline primarily stems from the wide variation in coronary anatomy, including vessel tortuosity, branching patterns, and plaque morphology. Additionally, our model has only been evaluated on diagnostically optimal frames, and its performance on suboptimal frames (e.g., motion blur, poor contrast) has not been assessed. Therefore, training with larger and more diverse multi-center cohorts is essential to capture the full range of anatomical variations and ensure reliable segmentation in real-world practice. Second, the current segmentation remains at a coarse anatomical level, as it relies on polygon-level annotations from the public training datasets. While this level of segmentation is sufficient for lesion detection, it is inadequate for precise stenosis quantification, such as calculating the diameter stenosis percentage or minimal lumen area. To mitigate the impact of coarse annotations, we incorporated multiple loss functions during training to guide the model toward more accurate boundary delineation. However, coarse annotations still limit the upper bound of boundary segmentation performance. Pixel-level stenosis boundary annotation is required for clinically accurate measurements. Future re-annotation of the public dataset with pixel-level stenosis boundaries could improve the segmentation accuracy of our method. This would help establish a correlation between segmentation metrics and clinically actionable parameters, thereby facilitating automated stenosis quantification.

Conclusion

In this study, we proposed a deep learning model for coronary artery stenosis segmentation in XRA images, which integrates a novel Hybrid Context-Aware Attention (HCA) module. The dual-pathway design of HCA combines global inter-channel attention with grouped multi-scale spatial aggregation, thereby enhancing feature discriminability and spatial-context modeling in stenotic regions. Evaluation across three independent multi-center datasets demonstrated that our model achieved competitive segmentation accuracy compared to existing approaches. These findings validate the effectiveness of our architecture for stenosis segmentation and highlight its potential to support the development of future automated assessment tools. In the future, we will focus on improving segmentation precision for accurate quantitative measurement, expanding training data with multi-center cohorts to improve generalizability.

Footnotes

Acknowledgments

This work was supported by Guizhou Provincial Basic Research Program (Qiankehejichu-ZK[2021]478, Qiankehejichu MS[2026]563), the Youth Science and Technology Talent Growth Project of Common University in Guizhou Province (Qianjiaohe-KY[2021]180), the National Natural Science Foundation of China (81660298), and the Funding for the Excellent Reserve Talents in the Discipline of Affiliated Hospital of Guizhou Medical University (gyfyxkyc-2023-13).

Ethics approval

Ethical approval for this retrospective study was granted by the Ethics Committee of Guizhou Medical University (Approval No.: 2023-208), which also waived the need for informed consent due to the use of anonymized data.

Author contributions

Manli Zhang: Methodology, Writing – original draft, Validation, Visualization. Fangyan Li: Data collection, Data annotation, Image analysis, Theoretical guidance. Haijun Guo: Image analysis, Professional medical guidance, Annotation supervision. Jian He: Data collection, Data annotation. Menghua Yang: Data collection, Data annotation. Yuehong Miao: Data analysis. Fan Yang: Funding acquisition, Project administration, Methodology, Supervision, Writing – review & editing. Pinggui Lei: Funding acquisition, Supervision, Resources, Writing – review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China, Guizhou Provincial Basic Research Program, Funding for the Excellent Reserve Talents in the Discipline of Affiliated Hospital of Guizhou Medical University, Youth Science and Technology Talent Growth Project of Common University in Guizhou Province, (grant number 81660298, Qiankehejichu MS[2026]563, Qiankehejichu-ZK[2021]478, gyfyxkyc-2023-13, Qianjiaohe-KY[2021]180).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The public dataset is accessible through the cited reference. The local clinical dataset and the associated analysis code are available from the corresponding author upon reasonable request, subject to privacy and compliance agreements.

ORCID iDs

Fan Yang

Pinggui Lei

References

Knuuti

Wijns

Saraste

, et al. 2019 ESC guidelines for the diagnosis and management of chronic coronary syndromes: the task force for the diagnosis and management of chronic coronary syndromes of the European Society of Cardiology (ESC). Eur Heart J 2020; 41: 407–477.

Kravarioti

Chaito

Ouardouz

, et al. Noninvasive assessment of coronary artery disease: recent techniques, diagnostic accuracy, and clinical implications for modern cardiology–A narrative review. Health Sci Rep 2025; 8: e70536.

Najjar

. Clinical applications, safety profiles, and future developments of contrast agents in modern radiology: a comprehensive review. iRADIOLOGY 2024; 2: 430–468.

Gurav

Revaiah

Tsai

, et al. Coronary angiography: a review of the state of the art and the evolution of angiography in cardio therapeutics. Front Cardiovasc Med 2024; 11: 1468888.

Shivaie

Tohidi

Loganathan

, et al. Interobserver variability of coronary stenosis characterized by coronary angiography: a single-center (Toronto general hospital) retrospective chart review by staff cardiologists. Vasc Health Risk Manag 2024; 20: 359–368.

Wagener

Onuma

Sharif

, et al. Features and limitations of robotically assisted percutaneous coronary intervention (R-PCI): a systematic review of R-PCI. J Clin Med 2024; 13: 5537.

Rodriguez

Pappas

Le Hong

, et al. Cardiac imaging for the detection of ischemia: current status and future perspectives. Expert Rev Med Devices 2025; 22: 581–594.

Hoppe

Kellnar

Esser

, et al. Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists. Open Heart 2025; 12: e003316.

Iuvara

Franzino

Carciotto

, et al. Coronary intravascular imaging: a comprehensive review of techniques, applications, and future directions. Medicina (B Aires) 2025; 61: 2019.

10.

Han

, et al. Coronary artery stenosis detection via proposal-shifted spatial-temporal transformer in X-ray angiography. Comput Biol Med 2023; 153: 106546.

11.

Wang

Liang

, et al. Integrated deep learning model for automatic detection and classification of stenosis in coronary angiography. Comput Biol Chem 2024; 112: 108184.

12.

Molenaar

Hebbo

Selder

, et al. Deep learning–based segmentation of coronary arteries and stenosis detection in X-ray coronary angiography. JACC Adv 2025; 4: 102360.

13.

Ovalle-Magallanes

Avina-Cervantes

Cruz-Aceves

, et al. LRSE-Net: lightweight residual squeeze-and-excitation network for stenosis detection in X-ray coronary angiography. Electronics (Basel) 2022; 11: 3570.

14.

Chen

Zhang

Yang

, et al. A coronary artery stenosis detection method based on coarse-to-fine network structure. In: 2023 8th IEEE International Conference on Network Intelligence and Digital Content, 2023, pp.51–55.

15.

Jayasree

Koteswara Rao

. Deep belief Bayesian joint conditional detection of coronary artery plaque and stenosis in X-ray angiography images. Multimed Tools Appl 2025; 84: 9963–9983.

16.

Huang

Luo

Wei

, et al. Deep learning model for coronary artery segmentation and quantitative stenosis detection in angiographic images. Med Phys 2025; 52: e17970.

17.

Popov

Amanturdieva

Zhaksylyk

, et al. Dataset for automatic region-based coronary artery disease diagnostics using X-ray angiography images. Sci Data 2024; 11: 20.

18.

Lee

Shin

Lee

Y-H

, et al. SSASS: Semi-supervised approach for stenosis segmentation. arXiv preprint 2023; 2311.10281.

19.

Abedin

AJM

Sarmun

Mushtak

, et al. Enhanced coronary artery segmentation and stenosis detection: leveraging novel deep learning techniques. Biomed Signal Process Control 2025; 109: 108023.

20.

Lalinia

Almasganj

Seyyedsalehi

. Stenosis detection of coronary arteries in X-ray angiography images using Swin UNETR with self-supervised pre-training approach and PEFT strategy. Biomed Signal Process Control 2026; 112: 108617.

21.

Akgül

Kozan

Hİ

Akyürek

, et al. Automated stenosis detection in coronary artery disease using yolov9c: enhanced efficiency and accuracy in real-time applications. J Real-Time Image Proc 2024; 21: 177.

22.

Ren

Jing

, et al. LASF: a local adaptive segmentation framework for coronary angiogram segments. Health Inf Sci Syst 2025; 13: 19.

23.

Sutradhar

Fahad

Raiaan

MAK

, et al. Cervical spine fracture detection utilizing YOLOv8 and deep attention-based vertebrae classification ensuring XAI. Biomed Signal Process Control 2025; 101: 107228.

24.

Jocher

Qiu

. Ultralytics YOLO11, https://github.com/ultralytics/ultralytics (2024, accessed 2 August 2024).

25.

Zhao

Vij

Malhotra

, et al. Automatic extraction and stenosis evaluation of coronary arteries in invasive coronary angiograms. Comput Biol Med 2021; 136: 104667.

26.

Yushkevich

Piven

Hazlett

, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. NeuroImage 2006; 31: 1116–1128.

27.

Zheng

Wang

Liu

, et al. Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp.12993–13000.

28.

Wang

, et al. Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell 2023; 45: 3139–3153.

29.

Lin

Maire

Belongie

, et al. Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, 2014, pp.740–755.

30.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011; 12: 2825–2830.

31.

Cao

Wang

Chen

, et al. Swin-Unet: unet-like pure transformer for medical image segmentation. In: Computer Vision – ECCV 2022 Workshops 2023, pp.205–218.

32.

Isensee

Jaeger

Kohl

SAA

, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021; 18: 203–211.

33.

Liao

Zhu

Wang

, et al. LightM-UNet, https://github.com/MrBlankness/LightM-UNet (2024, accessed 1 January 2025).

34.

Wang

. U-Mamba, https://github.com/bowang-lab/U-Mamba (2024, accessed 1 January 2025).

35.

Gkioxari

Dollár

, et al. Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision, 2017, pp.2980–2988.

36.

Pokhrel

Bhandari

Vazquez

, et al. ConvNeXtv2 Fusion with Mask R-CNN for automatic region based coronary artery stenosis detection for disease diagnosis. arXiv preprint 2023; 2310.04749.

37.

Jocher

Chaurasia

Qiu

. Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics (2023, accessed 1 June 2024).

38.

Lei

, et al. YOLOv13. https://github.com/iMoonLab/yolov13 (2025, accessed 1 July 2025).

39.

Cao

Lin

, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019, pp.1971–1980.

40.

Liu

Fan

, et al. Polarized self-attention: towards high-quality pixel-wise mapping. Neurocomputing 2022; 506: 158–167.

41.

Ouyang

Zhang

, et al. Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp.1–5.

42.

Feijoo

Benito

Garcia

, et al. DarkIR, https://github.com/cidautai/DarkIR (2025, accessed 1 August 2025).

43.

ARCADE Challenge. Final phase stenosis detection algorithm submission leaderboard, https://arcade.grand-challenge.org/evaluation/final-phase-stenosis-detection-algorithm-submission/leaderboard/ (2023, accessed 2 December 2025).

44.

Mahendiran

Thanou

Senouf

, et al. Angiopy segmentation: an open-source, user-guided deep learning tool for coronary artery segmentation. Int J Cardiol 2025; 418: 132598.