Graph theory and its potential in the automatic detection of left bundle branch block

Abstract

Accurate differentiation between Left Bundle Branch Block (LBBB) and its strict subtype (sLBBB) is essential for optimizing patient selection for Cardiac Resynchronization Therapy (CRT), yet remains clinically challenging. This study proposes and compares two graph-theory-based pipelines for automated classification of 12-lead electrocardiograms (ECGs) into Healthy, LBBB, and sLBBB categories. Functional connectivity graphs were constructed from inter-lead measures, including Pearson correlation, cross-correlation, and phase difference. The first approach combines Graph Signal Processing (GSP) with machine learning. Graph filtering was performed via spectral decomposition of the Laplacian matrix, selecting dominant eigenmodes and reconstructing signals through the inverse Graph Fourier Transform—integrating spatial and temporal features. The second approach converted connectivity matrices into grayscale images, classified using a Convolutional Neural Network (CNN), and incorporated Explainable AI (XAI) via Grad-CAM to visualize inter-lead interactions and enhance model transparency. The GSP-based method using phase difference and a Support Vector Machine achieved the highest performance (mean balanced accuracy $=$ $0.8317$ ), while the CNN-based approach with cross-correlation images reached $0.7646$ , offering improved interpretability. Both methods distinguished pathological from healthy cases, but precise classification between LBBB and sLBBB remains challenging. These results highlight the complementary value of graph-based ECG analysis and support future hybrid models for CRT stratification.

Keywords

Left bundle branch block cardiac resynchronization therapy graph fourier transform convolutional neural networks

1. Introduction

The heart functions as a hemodynamic pump whose performance relies on the coordinated activation of its electrical conduction system, allowing synchronous contraction and efficient myocardial tissue perfusion. However, abnormalities such as Left Bundle Branch Block (LBBB) disrupt this sequence, causing delayed activation of the left ventricle. This dyssynchrony can not only impair systolic performance (approximately a 20% reduction) in patients with heart failure but also contributes to the gradual decline of cardiac function.¹ Furthermore, the underlying pathophysiology often involves structural and functional remodeling, potentially leading to abnormal conduction-induced cardiomyopathy.² Given its prevalence and prognostic significance, a comprehensive understanding and accurate classification of Left Bundle Branch Block (LBBB) is essential for improving patient stratification and guiding treatment strategies.³

Cardiac Resynchronization Therapy (CRT) has proven beneficial in treating ventricular dyssynchrony,⁴ improving morbidity and mortality in selected patients. Nevertheless, a persistent clinical challenge lies in the accurate selection of CRT candidates, as not all patients respond favorably to the therapy (approximately 40% are considered nonresponders).^5,6 To address this limitation, Strauss et al. introduced stricter electrocardiographic criteria to define LBBB (sLBBB), identifying a subgroup with greater mechanical dyssynchrony^7,8 and, therefore, a higher likelihood of benefiting from CRT.^9,10 Consequently, optimizing patient selection remains an active area of research, exploring biomarkers and advanced imaging techniques.¹¹

In recent decades, artificial intelligence (AI), and particularly Deep Learning (DL), has revolutionized the analysis of electrocardiogram (ECG) signals.¹² Various studies and systematic reviews highlight the potential of models such as Convolutional Neural Networks (CNNs) and other DL architectures for the automatic detection of arrhythmias, the identification of structural heart disease, and the prediction of clinical outcomes from ECG.¹³ These models can identify subtle patterns, often imperceptible to the human eye, demonstrating utility in classifying various cardiac conditions and predicting events such as sudden cardiac death using novel signal processing models.¹⁴

However, the application of DL in cardiology faces significant challenges. One of the main issues is clinical interpretability: many DL models function as ”black boxes,” making it difficult to understand how they reach their conclusions, which limits trust and adoption in clinical practice.^15,16 Furthermore, while CNNs are powerful for analyzing data with a grid-like structure (such as images), they may not optimally capture the complex temporal and spatial interdependencies inherent in multi-lead ECG signals. In response to these limitations, recent research has focused on hybrid techniques that combine different DL architectures (e.g., CNN-Transformer, CNN-LSTM¹⁷)) or integrate advanced signal processing to improve diagnostic accuracy for arrhythmias and other cardiac pathologies.

Moreover, a growing body of research explores advanced neural network training techniques,¹⁸ imputation methods for time series,¹⁹ explainability through saliency maps,²⁰ neural architecture search,²¹ spatio-temporal deep learning,²² and handling imbalanced data.²³ These efforts highlight the ongoing drive to refine computational methods applicable to complex biomedical data like ECGs. Alongside these, newer classification approaches are being developed, such as Neural Dynamic Classification,²⁴ Dynamic Ensemble Learning,²⁵ Finite Element Machines for fast learning,²⁶ and particularly self-supervised learning methods which show promise in leveraging unlabeled biosignal data.^27,28 Although these hybrid approaches have shown promising results in general ECG classification, their specific application for the fine differentiation between LBBB and sLBBB, a crucial problem for CRT selection, has been less explored.^29,30

Concurrently, Graph Theory has emerged as a powerful paradigm for modeling complex systems and intricate relationships in biomedical data.³¹ In the analysis of physiological signals like EEG and fNIRS, graph-based approaches, including Graph Neural Networks (GNNs), have enabled the capture of functional and structural connectivity patterns (related techniques also investigate structural complexity in brain activity using weighted graphs³²). Although its application in cardiology is less frequent than in neuroscience, some pioneering studies have suggested its potential. Building on this, Qiang et al.³³ proposed a combined CNN and GNN approach to leverage temporal and spatial features of the ECG. Other recent studies have explored GNNs or hybrid graph-temporal approaches for arrhythmia detection, modeling inter-lead interactions.^34,35 Ultimately, these graph-based methods offer a natural way to represent the relationships between ECG leads, potentially capturing complex spatio-temporal dynamics missed by traditional methods.³⁶

Addressing the critical need for interpretability, Explainable Artificial Intelligence (XAI) techniques, such as gradient-based attribution methods like Grad-CAM (Gradient-weighted Class Activation Mapping),³⁷ SHAP,³⁸ or LIME, are becoming important tools for validating and understanding DL models in medical applications. These techniques allow visualization of which parts of the input are most influential for the model’s decision, providing a visual justification that can be evaluated by clinicians.^39,40 Specifically in ECG analysis, XAI helps bridge the gap between high-performance models and clinical trust.⁴¹

The research problem addressed in this study is focused on the need to develop automatic, accurate and interpretable methods for the classification of ECG into healthy subjects, patients with LBBB and patients with sLBBB, with the ultimate goal of improving risk stratification and the selection of candidates for CRT. Existing methods, although advanced,^42,43 often lack the specificity necessary to distinguish sLBBB or do not provide sufficient clinical interpretability.

This study introduces and evaluates two distinct methodological pathways that leverage Graph Theory for the automatic classification of ECG signals into healthy, LBBB, and sLBBB categories. The first pathway employs Graph Signal Processing (GSP) to extract features from graph representations of inter-lead ECG connectivity, which are subsequently used to train traditional machine learning (ML) classifiers. The second pathway transforms graph connectivity information into image representations that serve as input for a Convolutional Neural Network (CNN) model. Crucially, we incorporate explainability into the CNN pathway using visualization techniques, specifically employing the Grad-CAM algorithm to highlight key activation zones.

Although previous studies, such as Macas et al.,⁴⁴ have addressed explainability in models for LBBB detection, our second pathway offers a potentially more intuitive interpretation by applying XAI to visual representations derived from graph-modeled interactions.

The main contribution of this work lies in the proposal and comparative evaluation of these two graph-based approaches for the challenging LBBB/sLBBB classification task. While previous works like Reznichenko et al.⁴² and Karatzia et al.⁴³ have explored automated LBBB detection, and others like Macas Ordoñez et al.⁴⁴ have incorporated explainability, our study uniquely investigates: (1) a GSP-ML pipeline leveraging features derived from the graph spectral domain for classification, and (2) a hybrid Graph-CNN approach focused on the fine-grained LBBB/sLBBB distinction using images derived from inter-lead adjacency matrices, providing visual explanations via Grad-CAM applied to these structural representations. We aim to improve diagnostic accuracy and, potentially, CRT candidate selection by leveraging both temporal and spatial (inter-lead) characteristics of the ECG in a synergistic manner across these explored methodologies, assessing their respective strengths in terms of performance and interpretability. This comparative investigation represents an advance by exploring different ways to combine the representational power of graphs with feature extraction and classification capabilities, addressing the clinical need for both accuracy and interpretability.

2. Materials and methods

Graph Signal Processing (GSP) offers a complementary framework for analyzing data with inherent network structures, such as the relationships between ECG leads.³¹ Left Bundle Branch Block (LBBB), as an electrical conduction abnormality altering spatio-temporal activation patterns,⁴⁵ may be well-suited to graph-based analysis that models inter-lead functional connectivity.³⁵ This study proposes and evaluates a methodology integrating graph theory and two distinct classification approaches for automated LBBB and strict LBBB (sLBBB) detection from 12-lead electrocardiogram (ECG) signals. We hypothesize that representing inter-lead functional connectivity as a graph, and subsequently processing this graph representation, can capture discriminative patterns for LBBB more effectively than analyzing individual lead morphologies alone.

2.1. Data acquisition and cohort definition

2.1.1. Data sources and initial cohort selection

Electrocardiogram (ECG) data from two primary sources were utilized for this study, from which distinct patient cohorts were established:

Chapman University, Shaoxing People’s Hospital and Ningbo First Hospital ECG Database⁴⁶: This database was the source for the Healthy Control group. It provides 10-second, 12-lead ECG recordings sampled at a frequency ( $f_{s}$ ) of $500 Hz$ and stored in .mat format. From this database, an initial cohort of $N_{Healthy} = 299$ recordings annotated as ”normal sinus rhythm” was selected. This selection was subsequently confirmed by cardiologist review, which verified the absence of significant abnormalities.

MADIT-CRT (Multicenter Automatic Defibrillator Implantation Trial with Cardiac Resynchronization Therapy) Dataset⁴⁷: Accessed via The Telemetric and Holter ECG Warehouse (THEW), this dataset provided data for the Left Bundle Branch Block (LBBB) and strict LBBB (sLBBB) cohorts. These recordings are also 10-second, 12-lead ECGs, originally sampled at $f_{s} = 1000 Hz$ and provided in ISHNE format.

2.1.2. Cohort definition and characteristics

The study cohorts were defined as follows, based on expert annotations provided within the source databases:

Healthy Control (Healthy): Comprising $N_{Healthy} = 299$ individuals, as described above, characterized by normal sinus rhythm without significant ECG abnormalities.

LBBB Cohort: This group consisted of $N_{LBBB} = 192$ recordings from the MADIT-CRT dataset, classified as ”LBBB” but not meeting the strict criteria for sLBBB. The classification relied on expert interpretations within the database (”ISCE LBBB unblinded final with comments”), which considered criteria such as QRS duration, V1/V2 morphology, and the presence of mid-QRS notching or slurring.⁹

Strict LBBB Cohort (sLBBB): This group included $N_{sLBBB} = 301$ recordings from the MADIT-CRT dataset, classified as ”strict LBBB” according to the same expert criteria and source documentation.⁹

Crucially, this study adopted these pre-existing expert diagnostic labels for cohort assignment.

2.1.3. Dataset heterogeneity and potential confounding factors

The utilization of distinct data sources introduces a degree of heterogeneity. Specifically, the Healthy Control cohort was sourced from a general ECG database, whereas the LBBB and sLBBB cohorts were derived from the MADIT-CRT trial. The MADIT-CRT trial participants were patients with heart failure,¹⁰ which represents a significant difference from the healthy control population. This disparity is a potential source of confounding variables. However, the methodological approach of this study was designed to focus on isolating and analyzing electrophysiological features. This limitation is further acknowledged and discussed in the Discussion section of this work.

2.1.4. Ethical considerations and data privacy

All data utilized in this study were obtained from publicly accessible repositories (Chapman University database) or through permissioned access to research databases (THEW for MADIT-CRT). The original data acquisition processes for both source databases received ethical approval from their respective Institutional Review Boards. Furthermore, all patient data were anonymized prior to their inclusion in these repositories and before being accessed for this study.

2.2. ECG signal preprocessing and representative beat standardization

The raw 12-lead ECG recordings from both the Healthy and LBBB/sLBBB datasets underwent several preprocessing steps to obtain standardized, representative heartbeat signals suitable for subsequent analysis. These steps were performed independently for each subject and each lead.

Baseline Wander Removal: To remove low-frequency baseline wander, an approach equivalent to zero-phase high-pass filtering was implemented. The baseline component was first estimated by applying a 4th-order Butterworth low-pass filter with a cutoff frequency ( $f_{c}$ ) of $0.5$ Hz to the raw electrocardiogram signals. Forward-backward digital filtering ensured zero phase distortion in this estimation. The resulting estimated baseline was then subtracted from the original signal, effectively attenuating signal components below $0.5$ Hz and yielding the baseline-corrected signal.

R-Peak Detection and Beat Segmentation: Individual heartbeats were extracted from the aforementioned baseline-corrected signals. R-peaks were identified using an established ECG processing toolkit.⁴⁸ The first and last detected R-peaks in each recording were discarded to avoid boundary effects from incomplete beats. Fixed-size windows were then segmented around each valid R-peak: 400 samples (175 samples pre-R-peak, 225 samples post-R-peak) for the Healthy dataset (originally sampled at $500$ Hz), and 800 samples (350 samples pre-R-peak, 450 samples post-R-peak) for the LBBB/sLBBB dataset (originally sampled at $1000$ Hz).

Representative Beat Generation via Woody Alignment: To obtain a stable, noise-reduced morphology for each lead, the Woody alignment method⁴⁹ was applied to the ensemble of segmented beats from that lead. This iterative procedure involves aligning each segmented beat to an evolving template by maximizing their cross-correlation function, and then averaging all aligned beats to update the template for the next iteration. This process was repeated for five iterations. The final averaged template served as the representative heartbeat morphology for that specific lead and subject.

Resampling to Standardized Frequency: Following the generation of representative heartbeats for each of the 12 leads (initially 400 or 800 samples long, reflecting their original sampling rates of $500$ Hz and $1000$ Hz, respectively), these beats were resampled to a common frequency. This resampling step was intentionally performed after critical preprocessing stages like R-peak detection, beat segmentation, and representative beat alignment. Conducting these initial stages at the original, higher sampling rates leveraged their greater temporal resolution, contributing to more precise morphological characterization and robust representative beat generation.

A target frequency ( $f_{s, f i n a l}$ ) of $200$ Hz was chosen for standardization. Downsampling to $200$ Hz, rather than standardizing to one of the original higher frequencies, provided a substantial reduction in data dimensionality and subsequent computational demands for the graph-based and machine learning pipelines. This frequency was determined to adequately preserve essential QRS characteristics, which are critical for LBBB analysis, while significantly reducing the computational burden. Fourier-based resampling was utilized for this process. The resampling yielded standardized 12-lead representative beats, each comprising $N_{f i n a l} = 160$ samples. These $12 \times 160$ signals (organized as a matrix $X \in R^{12 \times 160}$ per subject) formed the input for the subsequent classification pathways.

2.2.1. Classification pathways overview

Starting from the standardized 12-lead representative beats ( $X$ ), two distinct pathways were investigated for classifying subjects into Healthy, LBBB, or sLBBB categories, as depicted in Figure 1:

Feature extraction using Graph Signal Processing (GSP) followed by traditional Machine Learning (ML) classification.

Transformation of graph connectivity information into image representations for classification by a Convolutional Neural Network (CNN), incorporating explainability analysis.

The subsequent sections detail the specific methods employed within each pathway.

Figure 1.

Schematic representation of the proposed method.

2.3. Pathway 1: GSP feature extraction and ML classification

This pathway focuses on extracting features from the graph spectral domain of the standardized 12-lead representative beats.

Initial Signal Conditioning for Pathway 1: The standardized representative beats $X$ first underwent an additional zero-phase band-pass filtering stage, employing a 4th-order Butterworth filter with a passband of $0.05 - 40$ Hz. For connectivity measures known to be sensitive to signal amplitude variations, specifically Pearson correlation and phase difference, the band-pass filtered signals were subsequently subjected to a logarithmic amplitude normalization. Conversely, for cross-correlation analysis, which primarily relies on waveform morphology, the band-pass filtered signals were used directly without this logarithmic transformation. The resulting conditioned signals from this stage are denoted as $X^{'}$ for the subsequent GSP steps.

Graph Construction: Adjacency and Laplacian Matrices: Functional connectivity graphs $G = (V, E)$ were constructed for each subject, where $V$ represents the 12 ECG leads. Three types of weighted adjacency matrices $A \in R^{12 \times 12}$ were computed using the appropriately conditioned signals $X^{'}$ . The specific computations for these adjacency matrices are detailed further in the description of Pathway 2. For each adjacency matrix, the normalized graph Laplacian ( $L$ ) was then derived using the formulation $L = I - D^{- 1 / 2} A D^{- 1 / 2}$ , where $I$ is the identity matrix and $D$ is the diagonal degree matrix.

Graph Fourier Transform (GFT): The GFT was used to decompose the conditioned 12-lead signal matrix ( $X^{'}$ ) onto the spectral basis defined by the eigenvectors $U$ of the corresponding graph Laplacian $L$ . The eigenvectors $U$ and eigenvalues $Λ$ were obtained from the eigendecomposition $L = U Λ U^{T}$ . The resulting GFT coefficients, forming a matrix ${\hat{X}}^{'} \in R^{12 \times 160}$ , were computed as ${\hat{X}}^{'} = U^{T} X^{'}$ .

Graph Filtering via Eigenvector Selection: Filtering in the graph spectral domain was performed by selecting a subset of $k = 6$ eigenvectors, forming a matrix $U_{f i l t} \in R^{12 \times 6}$ . These eigenvectors corresponded to the smallest non-zero eigenvalues (specifically, eigenvalues $λ > 10^{- 10}$ ) of the respective Laplacian matrix. This value of $k$ was chosen based on preliminary analyses and common GSP practices, aiming to retain dominant graph spectral components while reducing noise.

Inverse Graph Fourier Transform (iGFT): The signal was reconstructed in the vertex domain using only the GFT coefficients corresponding to the $k = 6$ selected eigenvectors. Let ${\hat{X}}_{f i l t}^{'} \in R^{6 \times 160}$ denote these selected GFT coefficients (i.e., the rows of ${\hat{X}}^{'}$ corresponding to the eigenvectors in $U_{f i l t}$ ). The graph-filtered signal in the vertex domain, $X_{f i l t}^{'} \in R^{12 \times 160}$ , was then obtained by applying the iGFT: $X_{f i l t}^{'} = U_{f i l t} {\hat{X}}_{f i l t}^{'}$ . This procedure yielded three distinct sets of graph-filtered signals per subject, one for each connectivity method.

Feature Extraction and Dimensionality Reduction: Each of the $12 \times 160$ graph-filtered signal matrices ( $X_{f i l t}^{'}$ ) was flattened into a 1920-dimensional feature vector. Principal Component Analysis (PCA) was then applied to reduce the dimensionality of these feature vectors to $n_{components} = 50$ .

Classification Models and Evaluation: Four types of ML classifiers were trained using the 50-dimensional PCA features: Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, k-Nearest Neighbors (KNN) with $k = 5$ neighbors, Gaussian Naive Bayes (GaussianNB), and a Gradient Boosting machine (specifically, LightGBM⁵⁰). Model evaluation was performed using a repeated (200 iterations) stratified train-test split methodology (70% for training, 30% for testing). To handle class imbalance, three strategies were applied exclusively to the training set of each split: random oversampling, random undersampling, or using classifier-inherent class weighting. Performance was primarily reported as the average Balanced Accuracy across the 200 iterations on the test sets.

2.4. Pathway 2: Graph-CNN classification and explainability

This pathway involved transforming inter-lead connectivity information into image representations for classification by a Convolutional Neural Network (CNN), starting from the standardized 12-lead representative beats ( $X \in R^{12 \times 160}$ , each sample at $200$ Hz).

Band-Pass Filtering: Initially, each standardized representative beat underwent zero-phase band-pass filtering using a 4th-order Butterworth filter. This filter, with a passband of $0.05 - 40$ Hz, was applied to mitigate noise, consistent with standard ECG analysis practices.⁵¹ Let the output of this step be termed the filtered representative beat.

Connectivity Matrix Calculation with Tailored Normalization: Adjacency matrices ( $A \in R^{12 \times 12}$ ), representing inter-lead connectivity, were computed from the filtered representative beats using three distinct methods, with input signal representation adapted for each:

Pearson Correlation (PC) and Phase Difference (PD): As these methods can be sensitive to amplitude variations, the filtered representative beats first underwent a logarithmic amplitude normalization prior to adjacency matrix calculation. This transformation is defined by preserving the original sign of each signal sample while applying a natural logarithm to one plus its absolute value. The Pearson correlation coefficient⁵² or the FFT-based average phase difference was then calculated between pairs of these normalized lead signals to form the adjacency matrix $A_{P C}$ or $A_{P D}$ , respectively.

Cross-Correlation (CC): To preserve the original waveform morphology, the filtered representative beats were used directly without logarithmic normalization. The maximum value of the standard cross-correlation function between pairs of leads, normalized by signal length, was used to populate the adjacency matrix $A_{C C}$ .

Graph Representation as Image: The calculated adjacency matrix $A$ (from PC, CC, or PD) was transformed into a single-channel grayscale image suitable for CNN input. This involved applying a logarithmic scaling (e.g., $\log (1 + A)$ for non-negative $A$ , or $\log (1 + | A |)$ more generally if $A$ could contain negative values, followed by appropriate handling to maintain interpretability) to manage potential variations in connectivity strength, followed by min-max normalization to scale values to the range [0, 1]. The resulting $12 \times 12$ scaled matrix was then resized to $224 \times 224$ pixels using bilinear interpolation.⁵³

CNN Architecture: The CNN architecture employed consisted of an input layer accepting $224 \times 224 \times 1$ grayscale images, followed by five convolutional blocks. Each block comprised a 2D convolutional layer ( $3 \times 3$ kernel, with padding to maintain feature map size, and an increasing number of filters: 32, 64, 128, 256, and 512, respectively for each block), a LeakyReLU activation function (with a negative slope $α = 0.1$ ), and a max-pooling layer ( $2 \times 2$ pool size). After the convolutional blocks, the feature maps were flattened. This was followed by a fully connected dense layer (1024 units, ReLU activation), a dropout layer (rate of 0.4 for regularization⁵⁴), and finally an output dense layer (3 units, corresponding to Healthy, LBBB, sLBBB classes) with softmax activation.

CNN Training and Evaluation with Cross-Validation: A 5-fold stratified cross-validation strategy was employed for robust model evaluation. Within each fold, the data was split into training and testing sets. The training set was balanced using random oversampling. The CNN model was compiled using the Adam optimization algorithm (learning rate $1 \times 10^{- 4}$ ) and categorical cross-entropy loss. Models were trained for a maximum of 40 epochs with a batch size of 64, using early stopping and learning rate reduction on plateau to optimize training and prevent overfitting. Performance was evaluated on the unseen test set of each fold, and results were aggregated.

Explainability using Grad-CAM: To enhance clinical interpretability, the Gradient-weighted Class Activation Mapping (Grad-CAM) technique³⁷ was applied. Grad-CAM produces heatmaps highlighting image regions (i.e., pairwise lead interactions in the graph image) most influential for a given class prediction.

Baseline Spectrogram Model for Comparison: To contextualize the graph-CNN performance, a baseline model using a CNN on single-lead ECG spectrograms was developed. For each subject, the representative beat from Lead I (160 samples at 200 Hz) was transformed into a spectrogram via Short-Time Fourier Transform (STFT), using a Hann window of 64 samples and 50% overlap to provide adequate time-frequency resolution for QRS complex morphology. The resulting spectrogram was resized to $224 \times 224$ pixels and fed into a simplified CNN architecture. This model was trained and evaluated using the same 5-fold cross-validation and balancing strategies as the graph-CNNs.

2.5. Model evaluation metrics

The performance of both classification pathways was evaluated on their respective test sets using a comprehensive set of standard metrics suitable for multi-class classification, including: Accuracy, Balanced Accuracy, Precision, Recall (Sensitivity), F1-score, Confusion Matrix, and Area Under the Receiver Operating Characteristic Curve (AUC). Balanced Accuracy was chosen as the primary metric for comparison in Pathway 1 due to its robustness in the presence of class imbalance.

Accuracy: The overall proportion of correctly classified instances.

Balanced Accuracy: The average of recall obtained on each class, providing a better assessment of performance across classes, irrespective of their size. This was the primary metric for Pathway 1 comparison.

Precision, Recall (Sensitivity), F1-score: Calculated per class (Healthy, LBBB, sLBBB) to evaluate performance on each specific category.

Precision: The proportion of true positive predictions among all instances predicted as positive for a class.

Recall (Sensitivity): The proportion of actual positive instances for a class that were correctly identified.

F1-score: The harmonic mean of precision and recall, balancing both metrics.

Confusion Matrix: A table visualizing the classification performance, showing counts of true positives, true negatives, false positives, and false negatives for each class. Aggregated confusion matrices (summed across folds or iterations) were analyzed. True Positive Rate (TPR, equivalent to Recall) and False Negative Rate (FNR, calculated as 1 - TPR) were also considered.

Area Under the Receiver Operating Characteristic Curve (AUC): Calculated per class using a one-vs-rest (OvR) strategy. It measures the model’s ability to distinguish between classes across various decision thresholds, with a value of 1.0 representing perfect discrimination.

These metrics collectively provide a robust assessment of the models’ classification capabilities and their effectiveness in handling the specific challenges of LBBB detection.

3. Results and discussion

This study investigates two distinct pathways leveraging graph theory to discern spatio-temporal ECG features potentially missed by conventional analyses for the automated detection of LBBB and sLBBB. The overall methodological framework is depicted in Figure 1.

3.1. Pathway 1: Graph-based filtering and machine learning classification

The first pathway employed graph signal processing (GSP) for robust feature extraction from 12-lead ECGs, followed by classification using established machine learning (ML) algorithms. This approach aims to harness the inter-lead relationships to build discriminative feature sets.

Graph Construction and Spectral Basis. For both pathways explored in this work, functional connectivity graphs were constructed to model the interactions between the 12 ECG leads. Three distinct methods were utilized to compute the weighted adjacency matrices ( $A \in R^{12 \times 12}$ ): Pearson Correlation (PC), which measures linear correlation; Cross-Correlation (CC), which identifies temporal shifts and morphological similarities; and Phase Difference (PD),⁵⁵ which captures phase synchrony between lead signals.

The graph structures derived from these adjacency matrices are visualized in Figure 2. These graphs depict the ECG leads as nodes and the computed inter-lead connectivity strengths (weights of the adjacency matrix $A$ ) as weighted edges. For illustrative purposes in these visualizations (Figure 2), connections with weights surpassing a predefined threshold (e.g., 0.4) are highlighted (e.g., in red) to emphasize dominant interactions, while weaker connections are also shown (e.g., in gray). It is important to clarify that this threshold-based highlighting of edges is solely for visualization purposes in Figure 2 and does not influence the subsequent processing steps; the full, unthresholded adjacency matrices (after setting the diagonal to zero and appropriate handling of negative values based on the connectivity method) are used for the computation of the graph Laplacian in Pathway 1. The considerable structural differences observed in the graphs generated by PC, CC, and PD reiterate that each method offers a distinct perspective on inter-lead dynamics. Compared to approaches like using mutual information for adjacency,³⁵ our chosen methods (PC, CC, PD) specifically target linear correlation, waveform similarity including time lags, and phase relationships, respectively. These aspects are particularly pertinent for characterizing the conduction delays and altered activation sequences inherent in LBBB and sLBBB.

Figure 2.

Graphs Representing ECG Lead Interactions for Different Patient Groups (Healthy, LBBB, sLBBB) and Connectivity Methods (PC, CC, PD). Connections with weights exceeding 0.4 are highlighted in red, general connections in gray. The structure varies significantly based on the group and the chosen connectivity metric.

From these adjacency matrices, the normalized graph Laplacians ( $L = I - D^{- 1 / 2} A D^{- 1 / 2}$ ) were derived. The Laplacian matrix is fundamental in GSP as its eigendecomposition provides the spectral basis (eigenvectors and eigenvalues) for graph-based filtering and analysis.^32,56 Figure 3 illustrates representative Laplacian matrices computed using PC, CC, and PD for subjects from the Healthy, LBBB, and sLBBB cohorts. Visual inspection of these Laplacians reveals distinct patterns across groups and connectivity methods. For instance, the Healthy group often exhibits patterns (e.g., predominantly yellow tones in the provided visualization scheme) suggestive of synchronized and temporally aligned cardiac conduction. Conversely, LBBB and sLBBB groups tend to show patterns (e.g., more green and blue tones) indicative of desynchronization, temporal misalignments, and phase instability, which are characteristic of conduction abnormalities. These qualitative differences underscore that each connectivity metric, as reflected in both the adjacency-based graph structures and their corresponding Laplacians, captures unique aspects of inter-lead electrophysiological relationships, forming a diverse basis for subsequent feature extraction.

Figure 3.

Laplacian Matrices of ECG Lead Relationships for Healthy, LBBB, and sLBBB Groups. Visual differences (e.g., color patterns indicated by red dots signifying strong connections or overall hues) reflect varying levels of synchronization and temporal alignment captured by different connectivity methods (Pearson Correlation, Cross-Correlation, Phase Difference).

Graph filtering, a core GSP technique, transforms signals by leveraging the relational structure of the graph. For ECG analysis, this means that the filtering process considers not only the individual lead morphologies but also how these leads are interconnected (Figure 2), as defined by the Laplacian (Figure 3). Filtering operators, designed based on the Laplacian’s spectral properties (eigenvectors and eigenvalues),⁵⁶ can selectively attenuate or amplify signal components based on their variation across the graph. This allows, for example, low-pass graph filtering to emphasize smooth, global patterns and reduce noise, or high-pass filtering to enhance localized variations.

GSP-ML Pipeline Implementation and Classification Outcomes. The GSP-ML pipeline for Pathway 1 was implemented in Python, primarily utilizing libraries such as NumPy,⁵² SciPy,⁵⁷ Scikit-learn,⁵⁸ and Imbalanced-learn.⁵⁹ The process for each subject’s 12-lead representative beat ( $X \in R^{12 \times 160}$ ) involved:

Computing the Graph Fourier Transform (GFT) of $X$ using the eigenvectors of the corresponding Laplacian $L$ , i.e., $\hat{X} = U^{T} X$ .⁵⁶

Applying a low-pass filter in the graph spectral domain by selecting and retaining only the $N_{modes} = 6$ GFT coefficients corresponding to the smallest non-zero eigenvalues of $L$ . This step emphasizes global, smoother patterns of electrical activity across the leads.

Reconstructing the filtered signal in the original (vertex) domain via the inverse GFT (iGFT) using only the selected components: $X_{f i l t} = U_{f i l t} {\hat{X}}_{f i l t}$ .

Flattening the resulting $12 \times 160$ filtered signal matrix $X_{f i l t}$ into a 1920-dimensional feature vector.

Applying Principal Component Analysis (PCA)⁶⁰ for dimensionality reduction, retaining the top $N_{PCA} = 50$ principal components. These 50 components typically captured approximately 95% of the variance in the GSP-derived features and constituted the final feature set for classification.

The initial cohort consisted of

N_{total} = 792

recordings. After all preprocessing steps outlined in Section 2.2, including the removal of 6 records that failed critical stages such as the GSP processing, the final dataset utilized for this pathway comprised 786 valid ECG recordings. This final cohort was distributed as: 300 Healthy (38.17%), 184 LBBB (23.41%), and 302 sLBBB (38.42%). The minor shifts in individual class counts from the initial selection are due to the specific records that were excluded.

For robust performance evaluation, this dataset underwent $N = 200$ iterations of stratified random 70%/30% train/test splitting. Stratification ensured that class proportions were maintained in both training and testing sets across iterations. Four distinct ML classifiers were evaluated: Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, k-Nearest Neighbors (KNN, with $k = 5$ ), Gaussian Naive Bayes (NB), and Gradient Boosting (GB, implemented via LightGBM⁵⁰). Given the inherent class imbalance in the dataset, three balancing strategies were applied exclusively to the training set of each iteration: random oversampling of minority classes, random undersampling of majority classes, and classifier-inherent class weighting (i.e., class_weight=’balanced’ for SVM and GB; for KNN and NB, this was termed ’no_balance’ as it represented training without explicit over/undersampling).

Table 1 summarizes the classification performance, reporting the Mean Balanced Accuracy (BalAcc)⁶¹ and its Standard Deviation (SD) over the 200 iterations. BalAcc was chosen as the primary performance metric due to its suitability for imbalanced datasets. Each entry in the table reflects the performance achieved using the best-performing balancing strategy for that specific combination of connectivity method and classifier. The overall best-performing combination is highlighted in bold.

Table 1.

Classification results for pathway 1 (GSP-ML).

Conn. Method	Clsf.	Mean BalAcc	SD	Best Bal. Strat.
Pearson Corr.	SVM	0.8000	0.0226	CW
	KNN	0.7682	0.0246	OS
	GB	0.7691	0.0249	CW
	NB	0.7253	0.0270	NoBal
Cross Corr.	SVM	0.8220	0.0231	CW
	KNN	0.7714	0.0249	OS
	GB	0.7777	0.0221	OS
	NB	0.7200	0.0283	NoBal
Phase Diff.	SVM	0.8317	0.0214	CW
	KNN	0.7855	0.0248	OS
	GB	0.7917	0.0226	CW
	NB	0.7461	0.0279	NoBal

Balanced accuracy (mean $\pm$ SD) is reported for each connectivity method and classifier, using its best-performing balancing strategy over 200 iterations.

Abbreviations: SVM: Support Vector Machine; KNN: K-Nearest Neighbors; GB: Gradient Boosting; NB: Naive Bayes; CW: class_weight; OS: oversampling; NoBal (Strategy): no_balance.

3.2. Discussion of GSP-ML pathway results

The classification results obtained via the GSP-ML pathway (Table 1) demonstrate the utility of integrating graph theory with machine learning for automated LBBB and sLBBB detection from 12-lead ECGs.³² The performance metrics provide several key insights.

Impact of Connectivity Metrics. A notable finding is the differential performance associated with the choice of connectivity metric, a distinction also visually suggested by the diverse graph structures in Figures 2 and 3. Graphs derived from Phase Difference (PD) and Cross-Correlation (CC) consistently outperformed those based on Pearson Correlation (PC). The highest overall Mean Balanced Accuracy of $0.8317 \pm 0.0214$ was achieved using PD-derived graph features classified by an SVM. CC-based features also yielded strong performance (Mean BalAcc $0.8220 \pm 0.0231$ with SVM), while the best PC-based result was slightly lower (Mean BalAcc $0.8000 \pm 0.0226$ with SVM).

This suggests that metrics capturing temporal dynamics and synchrony—such as phase relationships (PD) or time-lagged similarities (CC)—are more effective in distinguishing LBBB and sLBBB conduction patterns than simpler linear correlation (PC). These observations are consistent with the pathophysiology of LBBB, which involves significant electrical dyssynchrony and altered ventricular activation sequences,^3,45 potentially leading to mechanical dyssynchrony.⁷

Classifier Performance Comparison. Among the ML algorithms evaluated on the GSP-PCA features, SVM with an RBF kernel consistently delivered the highest Balanced Accuracy across all three connectivity types. This underscores SVM’s capability in handling complex, potentially high-dimensional feature spaces and defining effective non-linear decision boundaries.⁵⁸ Gradient Boosting (GB), an ensemble learning technique known for its robustness,^25,50 was generally the second most effective classifier. In contrast, KNN and Gaussian NB yielded comparatively lower performance for this specific task and feature set.

Importance of Handling Class Imbalance. The investigation into balancing strategies confirmed their critical role in achieving optimal performance with imbalanced datasets.⁵⁹ For SVM and GB, applying some form of balancing, particularly the class_weight=’balanced’ option, generally led to improved Mean Balanced Accuracy compared to training on the imbalanced data directly. This strategy was part of the top-performing models (Table 1). Recent advancements in handling imbalanced multi-label classification also offer potential avenues for exploration.²³

Persistent Challenge: LBBB versus sLBBB Discrimination. Despite the promising overall accuracies, effectively distinguishing between the LBBB and sLBBB classes remains the most significant challenge for this pathway. Figure 4 displays typical ECG morphologies for Healthy, LBBB, and sLBBB subjects in leads V4, V5, and V6. While the Healthy group shows normal QRS duration (<120ms, highlighted by a red circle), both LBBB and sLBBB exhibit prolonged QRS duration (>120ms). Critical differentiating features, such as QRS notching (marked by red circles), are subtle and vary in presentation, as per criteria like Strauss et al.⁹.

Figure 4.

Temporal and Morphological Analysis of Electrocardiographic Signals in Healthy, LBBB, and sLBBB Groups. Examples shown for leads V4, V5, and V6. Red circles highlight key features: normal QRS duration (¡120ms) in Healthy vs. prolonged QRS (¿120ms) in LBBB/sLBBB, and characteristic QRS notches in LBBB/sLBBB.

The graph-based approach aims to capture these complex inter-lead relationships (Figure 2) to better detect conduction delays and accentuate morphological differences by analyzing synchrony.

However, for the best-performing model in this pathway (PD + SVM + class_weight=’balanced’), further analysis of the detailed per-class metrics reveals a mean F1-score for the LBBB class of $0.6822$ . This is considerably lower than the F1-score for the Healthy class ( $0.9881$ ) and the sLBBB class ( $0.8299$ ). Similarly, the mean Area Under the ROC Curve (AUC) for the LBBB class ( $0.8959$ ) was lower than for Healthy ( $0.9987$ ) and sLBBB ( $0.9332$ ). This difficulty is quantitatively detailed in the aggregated confusion matrix from the 200 test splits, presented in Table 2.

Table 2.

Aggregated confusion matrix for the top-performing model in pathway 1 (PD-SVM).

	Pred. Healthy	Pred. LBBB	Pred. sLBBB
True Healthy	17926	74	0
True LBBB	354	7210	3436
True sLBBB	5	2841	15354

Rows represent true labels and columns predicted labels over 200 test splits.

The matrix clearly shows a higher rate of confusion between LBBB and sLBBB (e.g., 3436 LBBB instances misclassified as sLBBB, and 2841 sLBBB instances misclassified as LBBB). This difficulty likely arises from the inherent morphological similarities between LBBB and sLBBB (Figure 4), suggesting that the GSP-PCA features, while informative, may not fully encapsulate all subtle differentiating characteristics defined by strict criteria.⁹

Furthermore, the heterogeneity in data sources (Healthy controls from a general ECG database versus LBBB/sLBBB patients from the MADIT-CRT trial who had heart failure¹⁰) is a potential confounding factor. The model might inadvertently learn population-level differences beyond pure LBBB electrophysiology, which could affect its generalization to other LBBB populations. Accurate LBBB vs. sLBBB differentiation is clinically crucial for optimizing Cardiac Resynchronization Therapy (CRT) patient selection.⁶²

Robustness and Comparative Significance. The cross-validation strategy involving 200 stratified train-test splits indicates good robustness for the top-performing models, as evidenced by the relatively small standard deviations in Balanced Accuracy (e.g., $\pm 0.0214$ for the best model, Table 1).

The achieved Balanced Accuracy of up to $0.8317$ demonstrates the potential of the GSP-ML pipeline for differentiating these ECG categories. While the distinction from the Healthy class benefits overall accuracy, this aspect should be interpreted in light of the aforementioned dataset differences. Nevertheless, this performance represents a substantial improvement over simpler heuristic baselines relying solely on QRS duration or basic morphology criteria.⁹ However, a comprehensive assessment of this pathway’s true standing requires direct benchmarking against a wider range of contemporary state-of-the-art LBBB/sLBBB detection algorithms reported in the literature.^12,63

Contribution of GSP Feature Extraction. This pathway effectively demonstrates the benefit of explicitly modeling inter-lead electrophysiological relationships as graph structures.³² The application of GSP-based filtering, particularly with connectivity defined by PD and CC, appears successful in extracting physiologically relevant features reflecting cardiac synchrony and dyssynchrony. The low-pass graph filtering (retaining $N_{modes} = 6$ components) likely emphasizes global activation patterns and coherent variations across leads, rather than focusing on purely local, single-lead features.⁵⁶

Limitations of the GSP-ML Pathway. While promising, this pathway has several limitations. The primary one, as discussed, is the residual difficulty in robustly discriminating LBBB from sLBBB. Secondly, the features generated through GFT followed by PCA, while effective, offer limited direct clinical interpretability; understanding precisely which electrophysiological phenomena are captured by the top principal components can be challenging. Thirdly, while the 200-split validation demonstrates robustness on the current dataset, generalization to entirely new, unseen datasets from different clinical settings or demographics requires further investigation. The aforementioned dataset heterogeneity is a key aspect of this limitation.

Concluding Remarks and Future Directions for Pathway 1. The GSP-ML pathway proves to be a viable and robust method for automated LBBB/sLBBB classification, achieving a Mean Balanced Accuracy of up to $0.8317$ . The results highlight the superiority of connectivity measures that capture temporal and phase dynamics (PD, CC) and the effectiveness of SVM classifiers for this task. Future research for this pathway should focus on several areas:

Enhanced Feature Engineering: Exploring combinations of features derived from different connectivity methods (PC, CC, PD) simultaneously, or incorporating other graph-theoretical measures.

Advanced Classifiers: Benchmarking against and potentially integrating more recent and sophisticated supervised machine learning algorithms. This includes exploring Neural Dynamic Classification,²⁴ Dynamic Ensemble Learning algorithms,²⁵ Finite Element Machines for fast learning,²⁶ and techniques leveraging self-supervised learning.^27,28

Interpretability: Investigating methods to improve the interpretability of GSP-derived features.

Comparative Studies: Conducting direct comparisons with a broader range of existing LBBB/sLBBB detection algorithms, including those based on different signal processing or machine learning paradigms.¹²

Alternative Graph Representations and GNNs: While Pathway 2 explores CNNs on graph images, future extensions of Pathway 1 could directly employ Graph Neural Networks (GNNs) on the constructed graphs,^21,33,34 which are inherently designed to learn from graph-structured data. Alternative ECG representations, such as those explored by Zhou et al.²² or Alonso et al.,¹⁹ could also be considered in a graph context.

Addressing these areas could further enhance classification accuracy, particularly for the challenging LBBB vs. sLBBB discrimination, and improve the clinical applicability of GSP-based ECG analysis.

3.3. Pathway 2: Graph-CNN classification and explainability

The second pathway investigated in this study involved transforming the inter-lead functional connectivity information into 2D image representations, which were subsequently classified using a Convolutional Neural Network (CNN). This approach aimed to harness the potent hierarchical feature extraction capabilities of CNNs for graph-derived images⁶⁴ and to enhance model transparency through established explainability techniques. The dataset specifically utilized for this pathway, after ensuring valid adjacency matrix generation for all subjects across the three connectivity methods (Pearson Correlation, Cross-Correlation, and Phase Difference), comprised 786 recordings (Healthy: 300, LBBB: 184, sLBBB: 302). The performance of CNNs operating on these graph-derived images was evaluated against a baseline CNN that processed single-lead spectrogram images, representing a direct ECG segment-to-image conversion strategy.

Table 3.
Classification results for pathway 2 (Graph-CNNs and baseline spectrogram CNN).

Metric Pearson Cross-Corr. PhaseDiff Spectrogram

BalAcc 0.76 0.76 0.72 0.74

Acc 0.79 0.79 0.75 0.78

F1-H 0.97 0.97 0.93 0.95

F1-LBBB 0.56 0.57 0.53 0.48

F1-sLBBB 0.74 0.74 0.70 0.77

AUC-OvR 0.91 0.91 0.89 0.90

SD-BalAcc 0.03 0.05 0.04 0.03

SD-Acc 0.03 0.05 0.04 0.03

Metric	Pearson	Cross-Corr.	PhaseDiff	Spectrogram
BalAcc	0.76	0.76	0.72	0.74
Acc	0.79	0.79	0.75	0.78
F1-H	0.97	0.97	0.93	0.95
F1-LBBB	0.56	0.57	0.53	0.48
F1-sLBBB	0.74	0.74	0.70	0.77
AUC-OvR	0.91	0.91	0.89	0.90
SD-BalAcc	0.03	0.05	0.04	0.03
SD-Acc	0.03	0.05	0.04	0.03

Metrics are reported as mean (rounded to two decimals). Best graph-based CNN (highest balAcc) is highlighted in bold.

Abbreviations: BalAcc: Balanced Accuracy; Acc: Accuracy; F1-H: F1-score Healthy; F1-LBBB: F1-score LBBB; F1-sLBBB: F1-score sLBBB; AUC-OvR: macro-averaged AUC (One-vs-Rest); SD: Standard Deviation.

Image Representation of Graph Connectivity. A critical step in this pathway was the conversion of the $12 \times 12$ weighted adjacency matrix for each subject ( $A_{P C}$ , $A_{C C}$ , or $A_{P D}$ ), derived from appropriately preprocessed 12-lead representative beats (using log-normalized signals for PC/PD and band-pass filtered signals for CC, as detailed in the Connectivity Matrix Calculation paragraph within Section 2.4), into a single-channel grayscale image. This transformation was performed as follows:

Logarithmic Scaling: The transformation $A^{'} = \log (1 + A)$ was applied to each adjacency matrix. This is a common technique to manage skewed data, stabilize variance, and reduce the disproportionate impact of very high connectivity values, thereby making relative differences in connectivity strengths more apparent before subsequent normalization for image generation.³²

Min-Max Normalization: The logarithmically scaled matrix $A^{'}$ was then normalized to the range [0, 1]. This is a standard preprocessing step for image data, ensuring that input pixel values to the CNN are within a consistent and well-behaved range.

Resizing: The resultant $12 \times 12$ normalized matrix was resized to $224 \times 224$ pixels using bilinear interpolation (using cv2.INTER_LINEAR, as specified in the Graph Representation as Image paragraph within Section 2.4). Bilinear interpolation was selected as it provides a smoother image output compared to nearest-neighbor methods and represents a computationally efficient compromise for increasing spatial resolution for standard CNN input dimensions.

It is important to note that, unlike the graph filtering process in Pathway 1, no explicit value-based thresholding was applied to the adjacency matrices in Pathway 2 prior to their conversion into images. All calculated connectivity values (after the described scaling and normalization) were preserved to enable the CNN to learn potentially complex patterns from the full spectrum of inter-lead relationships. The specific parameters for these image generation steps (logarithmic scaling, min-max normalization, bilinear interpolation) were chosen based on common practices in signal-to-image conversion for deep learning applications and findings from initial exploratory analyses, rather than an exhaustive hyperparameter optimization of the image generation process itself, which could be an avenue for future refinement.

Performance of Graph-CNN Models and Baseline. The classification efficacy of Pathway 2 was rigorously evaluated using a 5-fold stratified cross-validation strategy. The detailed performance metrics for the CNNs operating on images derived from Pearson Correlation (PC), Cross-Correlation (CC), and Phase Difference (PD) adjacency matrices, alongside the baseline spectrogram CNN, are presented in Table 3. The CNN architecture employed for the graph-image classifiers is detailed in Section 2.4 (under the CNN Architecture paragraph).

Among the CNNs leveraging graph-derived images, the model utilizing images from Cross-Correlation (CC) adjacency matrices demonstrated the highest Mean Balanced Accuracy ( $0.7646 \pm 0.0475$ ). This performance was marginally superior to the Pearson Correlation (PC) based CNN ( $0.7616 \pm 0.0261$ ). The Phase Difference (PD) based CNN exhibited a comparatively lower Mean Balanced Accuracy ( $0.7233 \pm 0.0360$ ).

The baseline model, which employed a simpler CNN architecture on spectrograms generated from a single ECG lead (Lead I), achieved a Mean Balanced Accuracy of $0.7358 \pm 0.0285$ . This baseline is representative of fiducial-free, direct ECG segment-to-2D image conversion methods, an increasingly explored strategy in automated ECG analysis that avoids reliance on precise wave delineation and has shown promise in various studies.⁶⁵ The graph-CNNs, particularly those based on CC and PC, showed competitive, and in terms of balanced accuracy, slightly superior performance compared to this specific single-lead spectrogram baseline, indicating that the encoded inter-lead relationship patterns within the graph-derived images provide valuable discriminative information.

Analysis of Classification Performance and Comparison with Pathway 1. The best-performing graph-CNN model (Cross-Correlation CNN) yielded a Mean Accuracy of $0.7938 \pm 0.0485$ . While this model identified the Healthy class with high efficacy (F1-score: $0.9721$ ), the accurate differentiation between LBBB and sLBBB classes remained the most significant challenge, as reflected by their respective F1-scores of $0.5742$ for LBBB and $0.7383$ for sLBBB. This difficulty is further detailed by the aggregated confusion matrix for the Cross-Correlation CNN model presented in Figure 5.

Figure 5.

Performance of the Cross-Correlation CNN model detailed via an aggregated confusion matrix from 5-fold cross-validation (total N=786 subjects: 300 Healthy, 184 LBBB, 302 sLBBB). The matrix displays absolute counts for true versus predicted labels. Corresponding True Positive Rates (TPR) and False Negative Rates (FNR) for each class are also presented graphically.

The confusion matrix highlights considerable overlap between the LBBB and sLBBB classes. Specifically, 76 out of 184 LBBB instances (summed across the 5 test folds) were misclassified as sLBBB, while 69 out of 302 sLBBB instances were misclassified as LBBB. This underscores that the subtle distinctions underpinning the strict LBBB criteria⁹ are not fully resolved by the CNN when operating on these image-based representations of graph connectivity.

When juxtaposing the outcomes of Pathway 2 with those of Pathway 1 (GSP-ML), the latter’s optimal model (PD + SVM) achieved a notably higher Mean Balanced Accuracy ( $0.8317 \pm 0.0214$ ) compared to the best Graph-CNN model from Pathway 2 (CC-CNN, $0.7646 \pm 0.0475$ ). This disparity suggests that, for the current dataset and chosen model configurations, the features extracted via GSP and classified by a conventional SVM were more effective in overall class discrimination. Nevertheless, Pathway 2 offers a distinct methodological paradigm, particularly concerning the potential for visual interpretation of the CNN’s learned features.

Explainability using Grad-CAM. To address the critical need for model interpretability in clinical AI applications, Gradient-weighted Class Activation Mapping (Grad-CAM)³⁷ was implemented. Grad-CAM generates heatmaps that visually identify the regions within an input image—in this context, the specific inter-lead connectivities within the $12 \times 12$ graph image–that were most influential in the CNN’s decision-making process for a given class. Figure 6 illustrates representative Grad-CAM heatmaps (rendered using a JET colormap as per the analysis script) for correctly classified examples from the Healthy, LBBB, and sLBBB groups, across the three graph connectivity methods. The use of such saliency mapping techniques is increasingly recognized as valuable for introspecting deep learning models in medical image analysis and related fields.²⁰

Figure 6.

Grad-CAM heatmaps of CNN activation zones. Columns define subject class (Healthy, LBBB, sLBBB); rows define the connectivity metric (Pearson Correlation, Cross-Correlation, or Phase Difference) for the input graph image. Displayed examples are correctly classified instances with high prediction confidence. JET colormap (red indicates high importance, blue low) on $12 \times 12$ inter-lead connectivity images (axes: standard ECG leads I-V6).

Qualitatively, these heatmaps visually indicate which pairwise lead interactions the CNN focused on. For Healthy subjects, the activations across all methods are generally diffuse, possibly reflecting the overall coherence of normal cardiac conduction without strong localized points of discriminative stress for the CNN. However, for LBBB and sLBBB classifications, more distinct patterns emerge (Figure 6).

Specifically, images derived from Pearson Correlation and Phase Difference tend to elicit CNN activations that emphasize widespread, global connections, particularly highlighting interactions between frontal plane (limb) leads and precordial leads, as well as strong inter-limb lead activations. Phase Difference, consistent with its sensitivity to temporal phase shifts, generally shows broader and more intense activation patterns across these connections compared to Pearson Correlation, potentially capturing more extensive alterations in electrical synchrony. In contrast, images from Cross-Correlation appear to guide the CNN to focus on patterns that include strong inter-precordial lead relationships (e.g., within V1-V3 or V4-V6, and between these groups), alongside significant interactions between these precordial leads (especially V4-V6) and lateral limb leads like I and aVL. This might reflect the CNN leveraging CC’s strength in identifying morphological similarities and temporal shifts, which are key in localizing altered ventricular activation sequences in LBBB and sLBBB.⁴⁵ These observed differences in activation foci across connectivity metrics underscore that each method provides a unique electrophysiological perspective that the CNN learns to utilize for discrimination. This visual feedback, while requiring careful interpretation, is a valuable step towards demystifying the CNN’s decision process for this complex classification task.

Discussion of Pathway 2 Strengths, Limitations, and Broader Context. Pathway 2’s methodology of transforming graph structures into an image domain for CNN-based classification represents an innovative strategy. A primary strength of this approach is its inherent potential for direct visual interpretability of model decisions through techniques like Grad-CAM, thereby addressing the crucial requirement for transparency in AI models intended for clinical settings. The adoption of a 5-fold stratified cross-validation scheme for the 786 subjects (Healthy: 300, LBBB: 184, sLBBB: 302) provides an initial measure of the models’ robustness and generalization capability on the available data. Standard practices such as data augmentation (RandomOversampling of the training sets within each fold) and regularization methods (Dropout, Early Stopping with restoration of best weights) were implemented to mitigate the risk of overfitting, a common challenge in the training of complex neural networks, especially with moderate-sized datasets.¹⁸

Despite these strengths, Pathway 2 exhibits several limitations. Firstly, the overall classification performance, particularly in the challenging task of distinguishing LBBB from sLBBB, did not surpass that achieved by Pathway 1. This suggests that the conversion to an image format, while enabling the use of CNNs, might lead to a loss of some nuanced graph topological information that GSP-ML could exploit, or that the chosen CNN architecture could be further optimized for this specific type of image data. Secondly, the transformation of a compact $12 \times 12$ adjacency matrix into a much larger $224 \times 224$ image via bilinear interpolation, while standard for many CNN inputs, could potentially introduce interpolated artifacts or obscure very fine-grained structural details. The performance of the CNN is also intrinsically linked to the discriminative quality of the input graph images; if the underlying connectivity metric does not effectively capture class-specific patterns, the subsequent CNN classification will invariably be constrained.

Furthermore, as with Pathway 1, the heterogeneity of the data sources (Healthy controls from a general ECG database versus LBBB/sLBBB patients with pre-existing heart failure from the MADIT-CRT trial¹⁰) constitutes a potential confounding factor that could impact the models’ generalization to different patient populations. Although class imbalance was addressed during the training phase through oversampling and class weighting strategies, its residual impact on multi-class performance, especially when dealing with subtly different subclasses like LBBB and sLBBB, warrants continuous consideration. Advanced techniques, such as those employing fuzzy deep neural networks for imbalanced multi-label classification problems, might offer further improvements in such scenarios.²³

A comprehensive comparative analysis against a broader spectrum of state-of-the-art LBBB/sLBBB detection methodologies, particularly those employing deep learning, is still an essential future step. Our baseline (single-lead spectrogram CNN) offers one point of reference, aligning with fiducial-free ECG imaging approaches.⁶⁵ The observation that graph-CNNs (notably CC-CNN and PC-CNN) achieved competitive or slightly improved balanced accuracy over this specific baseline suggests the added value of incorporating multi-lead inter-relationships, even when transformed into an image.

Concluding Remarks and Future Directions for Pathway 2. The Graph-CNN pathway (Pathway 2) successfully demonstrated the feasibility of classifying LBBB and its strict variant by translating inter-lead cardiac electrophysiological connectivity into an image format suitable for CNNs, achieving a peak Mean Balanced Accuracy of $0.7646$ with Cross-Correlation derived images. Although this performance was quantitatively lower than the GSP-ML approach of Pathway 1, Pathway 2 offers significant advantages in terms of providing visual insights into model decision-making via Grad-CAM, thereby enhancing interpretability. The precise differentiation between LBBB and sLBBB remains the primary area for improvement. Future research avenues for this pathway include:

Advanced CNN Architectures and Training Strategies: Investigating more sophisticated CNN architectures, potentially incorporating attention mechanisms or transformer-based models adapted for image data. Furthermore, Self-Supervised Learning (SSL) techniques could be explored for pre-training the CNN component on larger, unlabeled ECG datasets, which may lead to more robust and generalizable feature representations.^27,28

Alternative Graph and Signal-to-Image Mappings: Investigation of diverse methodologies for the transformation of graph-derived connectivity information or multi-lead ECG signals into optimized image representations. Such explorations could encompass the application of alternative normalization and scaling strategies for adjacency matrices, the evaluation of different interpolation algorithms for image resizing to preserve salient features, or the development of direct signal-to-image conversion techniques. Examples of the latter include the construction of spatio-temporal image matrices by concatenating the 12 lead segments (yielding dimensions of $12 \times N_{samples}$ ) prior to resizing, or the utilization of methods such as Recurrence Plots to generate textural image representations that effectively encode complex temporal dynamics inherent in the signals.

Direct Graph Neural Network (GNN) Application: An important research direction, consistent with advancements in graph machine learning, involves the direct application of Graph Neural Networks (GNNs) to the constructed functional connectivity graphs (PC, CC, PD).^21,34,35 This approach would obviate the image conversion step, potentially enabling superior preservation of intricate graph topological information and allowing the leverage of architectures specifically engineered for graph-structured data.

Enhanced Explainability Techniques: Augmenting Grad-CAM with other explainability methods, such as SHAP (SHapley Additive exPlanations),³⁸ could provide more granular, feature-level (or ”pixel-level” for the graph images) importance scores, offering deeper insights into the model’s behavior.

Robust Validation and Comprehensive Benchmarking: Conducting rigorous testing on larger, diverse, and fully external datasets is imperative to ascertain the true generalizability of the models. Additionally, systematic benchmarking against a wider array of contemporary LBBB/sLBBB detection algorithms reported in the literature is necessary.

Hybrid Model Development: Investigating the potential of fusing features learned via the GSP-ML pathway (Pathway 1) with those extracted by the Graph-CNN pathway (Pathway 2) to create synergistic hybrid models that might offer enhanced classification accuracy and robustness.

Systematic exploration of these directions could significantly refine the Graph-CNN approach, improving its discriminatory capabilities for challenging LBBB sub-typing and bolstering its potential for clinical translation.

4. Conclusions

This study developed and comparatively evaluated two distinct graph-theory-based methodologies for the automated classification of 12-lead ECG signals into healthy, Left Bundle Branch Block (LBBB), and strict LBBB (sLBBB) categories. The overarching goal was to enhance risk stratification and aid in the selection of candidates for Cardiac Resynchronization Therapy (CRT).

The first pathway, leveraging Graph Signal Processing (GSP) for feature extraction and employing traditional machine learning (ML) classifiers, demonstrated superior overall classification accuracy. Specifically, a Support Vector Machine (SVM) model, utilizing features derived from graphs constructed via Phase Difference (PD) connectivity, achieved the highest mean Balanced Accuracy of $0.8317$ . This approach underscored the efficacy of graph spectral features, particularly those derived from connectivity metrics capturing temporal and phase dynamics, in discriminating altered cardiac conduction patterns.

The second pathway transformed inter-lead functional connectivity matrices into 2D image representations for classification by a Convolutional Neural Network (CNN), integrating eXplainable AI (XAI) through Gradient-weighted Class Activation Mapping (Grad-CAM). While this pathway yielded a mean Balanced Accuracy of $0.7646$ with Cross-Correlation-derived images, its significant contribution lies in providing visual interpretability. Grad-CAM highlighted the specific inter-lead interactions most influential to the CNN’s diagnostic decisions, thereby enhancing model transparency.

A critical finding common to both pathways is that while the models effectively distinguished healthy subjects from those with LBBB or sLBBB, the precise differentiation between the LBBB and sLBBB subtypes remains the most substantial challenge. This study contributes to the field by demonstrating the distinct advantages and trade-offs of these graph-based approaches: the GSP-ML pathway offers higher diagnostic accuracy, while the Graph-CNN pathway provides crucial interpretability for potential clinical adoption.

Future research should focus on refining these methodologies. This includes exploring hybrid models that synergize the strengths of both pathways, incorporating more advanced graph representation learning techniques such as direct Graph Neural Network (GNN) applications, and conducting rigorous validation on larger, more diverse, and external datasets. Such advancements are essential to translate these promising techniques into robust clinical tools capable of improving CRT candidate selection and ultimately, patient outcomes.

Footnotes

Acknowledgments

This project was partially funded by the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 899287 (NeuraViPeR), by the European Comission MSCA-SE EPISTEAM and by Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo (CYTED) (through Red 225RT0169). This research has been also funded by a PhD scholarship from the National Council of Science and Technology (CONICET) and by Grant 26-DI-FEIRNNR-2023 from Universidad Nacional de Loja (Ecuador).

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Moya

Buytaert

Penicka

, et al. State-of-the-art: Noninvasive assessment of left ventricular function through myocardial work. J Am Soc Echocardiogr 2023; 36: 1027–1042.

Huizar

Kaszala

Tan

, et al. Abnormal conduction-induced cardiomyopathy: JACC review topic of the week. J Am Coll Cardiol 2023; 81: 1192–1200.

Tan

Witt

, et al. Left bundle branch block. Circ Arrhythm Electrophysiol 2020; 13: e008239.

Puvrez

Duchenne

Donal

, et al. Mechanical dyssynchrony as a selection criterion for cardiac resynchronization therapy: Design of the AMEND-CRT trial. ESC Heart Fail 2024; 11: 4390–4399.

Daubert

Behar

Martins

, et al. Avoiding non-responders to cardiac resynchronization therapy: a practical guide. Eur Heart J 2017; 38: 1463–1472.

Ortega

Barja

Logarzo

, et al. Non-selective his bundle pacing with a biphasic waveform: enhancing septal resynchronization. EP Europace 2017; 20: 816–822.

Andersson

Wieslander

, et al. Left ventricular mechanical dyssynchrony by cardiac magnetic resonance is greater in patients with strict v’s nonstrict electrocardiogram criteria for left bundle-branch block. Am Heart J 2013; 165: 956–963.

Macas

Ferrández

Orellana

, et al. A model of mechanical dyssynchrony based on ECG features. In: Ballina FE, Armentano R, Acevedo RC et al. (eds.) Advances in bioengineering and clinical engineering. SABI 2023, IFMBE Proceedings, Vol. 114, 2024, pp.88–94. Cham: Springer. ISBN 978-3-031-61973-1. DOI: 10.1007/978-3-031-61973-1_9.

Strauss

Selvester

Wagner

. Defining left bundle branch block in the era of cardiac resynchronization therapy. Am J Cardiol 2011; 107: 927–934.

10.

Zusterzeel

Vicente

Ochoa-Jimenez

, et al. The 43rd international society for computerized electrocardiology (ISCE) ECG initiative for the automated detection of strict left bundle branch block (LBBB). J Electrocardiol 2018; 51: S25–S30.

11.

Stoiculescu

Hadareanu

, et al. Refining cardiac resynchronization therapy: a comprehensive review on the role of advanced multimodality imaging. Front Cardiovasc Med 2024; 11: 1406899.

12.

Martis

Acharya

Adeli

. Current methods in electrocardiogram characterization. Comput Biol Med 2014; 48: 133–149.

13.

Ansari

Mourad

Qaraqe

, et al. Deep learning for ECG arrhythmia detection and classification: an overview of progress for period 2017-2023. Front Physiol 2023; 14: 1246746.

14.

Amezquita-Sanchez

Valtierra-Rodriguez

Adeli

, et al. A novel wavelet transform-homogeneity model for sudden cardiac death prediction using ECG signals. J Med Syst 2018; 42: 176.

15.

Ayano

Schwenker

Dufera

, et al. Interpretable machine learning techniques in ECG-based heart disease classification: A systematic review. Diagnostics (Basel) 2022; 13: 111.

16.

Górriz

Álvarez Illán

Álvarez Marquina

, et al. Computational approaches to explainable artificial intelligence: Advances in theory, applications and trends. Inf Fusion 2023; 100: 101945.

17.

Madan

Singh

, et al. A hybrid deep learning approach for ECG-based arrhythmia classification. Bioengineering (Basel) 2022; 9: 152.

18.

Michailidis

Gkelios

, et al. Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous manner. Integr Comput Aided Eng 2023; 31: 19–41.

19.

Alonso

Morán

Pérez

, et al. Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder. Integr Comput Aided Eng 2023; 31: 157–172.

20.

Fan

Song

, et al. Look inside 3D point cloud deep neural network by patch-wise saliency map. Integr Comput Aided Eng 2023; 31: 197–212.

21.

Malkova

Amini

Denis

, et al. Neural architecture search for radio map reconstruction with partially labeled data. Integr Comput Aided Eng 2024; 31: 285–305.

22.

Zhou

Fan

Neri

. A spatio-temporal fusion deep learning network with application to lightning nowcasting. Integr Comput Aided Eng 2024; 31: 233–247.

23.

Succetti

Rosato

Panella

. Multi-label classification with imbalanced classes by fuzzy deep neural networks. Integr Comput Aided Eng 2024; 32: 25–38.

24.

Rafiei

Adeli

. A new neural dynamic classification algorithm. IEEE Trans Neural Netw Learn Syst 2017; 28: 3074–3083.

25.

Alam

KMR

Siddique

Adeli

. A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 2020; 32: 8675–8690.

26.

Pereira

Piteri

Souza

, et al. FEMa: a finite element machine for fast learning. Neural Comput Appl 2020; 32: 6393–6404.

27.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for electroencephalography. IEEE Trans Neural Netw Learn Syst 2024; 35: 1457–1471.

28.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for near-wild cognitive workload estimation. J Med Syst 2024; 48: 107.

29.

Yang

Gregg

Babaeizadeh

. Detection of strict left bundle branch block by neural network and a method to test detection consistency. Physiol Meas 2020; 41: 025005.

30.

Sadeghi

Rezaee

Hajati

. Deep conv-attention model for diagnosing left bundle branch block from 12-lead electrocardiograms. arXiv preprint arXiv:221204936 2023.

31.

Calazans

MAA

Ferreira

FABS

Santos

FAN

, et al. Machine learning and graph signal processing applied to healthcare: A review. Bioengineering (Basel) 2024; 11: 671.

32.

Ahmadlou

Adeli

. Complexity of weighted graph: A new technique to investigate structural complexity of brain activities with applications to aging and autism. Neurosci Lett 2017; 650: 103–108.

33.

Qiang

Dong

Liu

, et al. Conv-RGNN: An efficient convolutional residual graph neural network for ECG classification. Comput Methods Programs Biomed 2024; 257: 108406.

34.

Han

Zhang

, et al. An arrhythmia intelligent recognition method based on a multimodal information and spatio-temporal hybrid neural network model. Comput Mater Continua 2025; 82: 3443–3465.

35.

Andayeshgar

Abdali-Mohammadi

Sepahvand

, et al. Developing graph convolutional networks and mutual information for arrhythmic diagnosis based on multi-channel ECG signals. Int J Environ Res Public Health 2022; 19: 10511.

36.

Ordoñez

BCM

Villavicencio

DVO

Ochoa

MAS

, et al. Application of Graph Fourier Transform (GFT) in the Diagnosis of Left Bundle Branch Block (LBBB) from Electrocardiographic (ECG) Signals, Lecture Notes in Computer Science, volume 14675, 2024, pp.495–503. Cham: Springer. ISBN 978-3-031-61136-0. DOI: 10.1007/978-3-031-61137-7_46.

37.

Selvaraju

Cogswell

Das

, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis 2019; 128: 336–359.

38.

Lundberg

Lee

. A unified approach to interpreting model predictions. In: Guyon I, Von Luxburg U, Bengio S et al. (eds.) Advances in neural information processing systems, Volume 30, 2017, Curran Associates, Inc.

39.

Hicks

Isaksen

Thambawita

, et al. Explaining deep neural networks for knowledge discovery in electrocardiogram analysis. Sci Rep 2021; 11: 10949.

40.

Vilone

Longo

. Explainable artificial intelligence: A systematic review. arXiv preprint arXiv:200600093 2020.

41.

Hughes

Olgin

Avram

, et al. Performance of a convolutional neural network and explainability technique for 12-lead electrocardiogram interpretation. JAMA Cardiol 2021; 6: 1285–1295.

42.

Reznichenko

Whitaker

, et al. Comparing ECG lead subsets for heart arrhythmia/ECG pattern classification: Convolutional neural networks and random forest. CJC Open 2024; 7: 176–186.

43.

Karatzia

Aung

Aksentijevic

. Artificial intelligence in cardiology: Hope for the future and power for the present. Front Cardiovasc Med 2022; 9: 945726. DOI: 10.3389/fcvm.2022.945726

44.

Macas Ordoñez

Garrigós

Martínez

, et al. An explainable machine learning system for left bundle branch block detection and classification. Integr Comput Aided Eng 2024; 31: 43–58.

45.

Pérez-Riera

Barbosa-Barros

de Rezende Barbosa

MPC

, et al. Left bundle branch block: Epidemiology, etiology, anatomic features, electrovectorcardiography, and classification proposal. Ann Noninvasive Electrocardiol 2019; 24: e12572.

46.

Zheng

Zhang

Danioko

, et al. A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. Sci Data 2020; 7: 48.

47.

Moss

Brown

Cannom

, et al. Multicenter automatic defibrillator implantation trial-cardiac resynchronization therapy (MADIT-CRT): design and clinical protocol. Ann Noninvasive Electrocardiol 2005; 10: 34–43.

48.

Biosppy Developers. Biosppy: Biosignal processing in python - ECG module. https://biosppy.readthedocs.io/en/stable/biosppy.signals.html (2023, accessed: 24 Apirl 2025).

49.

Woody

. Characterization of an adaptive filter for the analysis of variable latency neuroelectric signals. Med Biol Eng 1967; 5: 539–554.

50.

Meng

Finley

, et al. LightGBM: A highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, volume 30, 2017, pp.3146–3154.

51.

Kligfield

Gettes

Bailey

, et al. Recommendations for the standardization and interpretation of the electrocardiogram: Part I: The electrocardiogram and its technology. A scientific statement from the American heart association electrocardiography and arrhythmias committee, council on clinical cardiology; the American college of cardiology foundation; and the heart rhythm society. endorsed by the international society for computerized electrocardiology. Circulation 2007; 115: 1306–1324.

52.

Harris

Millman

van der Walt

, et al. Array programming with numPy. Nature 2020; 585: 357–362.

53.

Bradski

Kaehler

. Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media Inc., 2008.

54.

Srivastava

Hinton

Krizhevsky

, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15: 1929–1958.

55.

Aydore

Pantazis

Leahy

. A note on the phase locking value and its properties. Neuroimage 2013; 74: 231–244.

56.

Shuman

Narang

Frossard

, et al. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Mag 2013; 30: 83–98.

57.

Virtanen

Gommers

Oliphant

, et al. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat Methods 2020; 17: 261–272.

58.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: Machine learning in python. J Mach Learn Res 2011; 12: 2825–2830.

59.

Lemaître

Nogueira

Aridas

. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 2017; 18: 1–5.

60.

Jolliffe

Cadima

. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 2016; 374: 20150202.

61.

Brodersen

Gallusser

Koechlin

, et al. The balanced accuracy and its posterior distribution. In: Proceedings of the 20th international conference on pattern recognition (ICPR), 2010, pp.3121–3124. IEEE. DOI: 10.1109/ICPR.2010.764.

62.

Ponnusamy

Vijayaraman

Ellenbogen

. Left bundle branch block-associated cardiomyopathy: A new approach. Arrhythm Electrophysiol Rev 2024; 13: e15.

63.

Sankari

Adeli

. HeartSaver: a mobile cardiac monitoring system for auto-detection of atrial fibrillation, myocardial infarction, and atrio-ventricular block. Comput Biol Med 2011; 41: 211–220.

64.

Guo

. Deep learning and electrocardiography: systematic review of current techniques in cardiovascular disease diagnosis and management. Biomed Eng Online 2025; 24: 23.

65.

Gregg

Bailey

, et al. Left bundle branch block detection in 12-Lead ECG using End-to-End deep learning with explainability. In: Proc. Computing in Cardiology (CinC). DOI: 10.22489/CinC.2024.067.