An explainable machine learning framework for identifying left bundle branch block mechanisms via vectorcardiographic features

Abstract

Left Bundle Branch Block (LBBB) diagnosis is crucial for patient stratification and the selection of individuals who are likely to respond to Cardiac Resynchronization Therapy (CRT). The pathophysiological distinction between LBBB and strict LBBB (sLBBB) is investigated in this research with a view to optimizing diagnostic criteria and therapy. ECG signals were transformed into the vectorcardiographic (VCG) domain, where QRS loops were divided into two halves at the time of the velocity peak computed over the discrete derivates of the x, y, and z leads. From each half, angles and norms were extracted in all VCG planes, along with ratios between VCG peak velocity and VCG fidutial points. These were used to train machine learning models for classification into Healthy, LBBB, and sLBBB categories. The analysis identified four most significant features for the discrimination task: (1,2) peak velocity time relative to QRS onset/offset, (3) maximum norm of the early QRS loop in the frontal plane, and (4) QRS angle in the horizontal plane. These features preserved essential differences in conduction dynamics and electrical disturbances among the three groups. In particular, the time from velocity peak to QRS offset was the most discriminative feature, with progressive prolongation from Healthy to LBBB to sLBBB classes. This reduced 4-feature set achieved an accuracy of 0.85 and an F1-score of 0.83, which was on par with 15-feature-based models. Finally, the integration of explainable artificial intelligence (xAI) into these simplified models enabled the derivation of transparent diagnostic rules for LBBB, improving clinical interpretability on more reliable diagnostic decisions.

Keywords

LBBB diagnosis cardiac impulse propagation velocity vCG cRT

1. Introduction

The definition of strict left bundle branch block (LBBB) was first introduced by Strauss in 2011 based on the analysis of the ECG phenotypes of patients who significantly benefited from cardiac resynchronization therapy (CRT) in the MADIT-CRT trial.¹ This subgroup exhibited specific characteristics including a QRS duration $\geq$ 140 ms in men or $\geq$ 130 ms in women, QS- or rS-configurations of the QRS complex in leads V1 and V2, and mid-QRS notching or slurring in $\geq$ 2 of leads V1, V2, V5, V6, I, and aVL.² Since then, the diagnosis of strict LBBB has emerged as a potent predictor of CRT efficacy.

The objective of this study is to investigate the pathophysiological aspects associated with left bundle branch block (LBBB) and strict left bundle branch block (sLBBB), aiming to refine treatment strategies and improve clinical outcomes in patients affected by these conditions.

Incorporating findings related to maximum conduction velocity might provide important insights. Understanding whether LBBB shows delayed peak conduction velocity compared to sLBBB could imply a more proximal occurrence of these blocks, with profound clinical implications. Confirmation of such distinctions would not only deepen our understanding of the underlying mechanisms but also their associations with underlying heart diseases.

While refining treatment strategies may be a longer-term outcome, the immediate value lies in advancing our physiological understanding of LBBB and sLBBB. This could pave the way for more targeted therapies, thereby enhancing clinical practice and patient outcomes.

1.1. Related work and contribution

The differentiation of Left Bundle Branch Block (LBBB) from its strict subtype (sLBBB) is a critical challenge in cardiology, as it directly impacts the selection of patients for Cardiac Resynchronization Therapy (CRT).¹ While strict electrocardiographic criteria were introduced to identify patients more likely to respond to CRT, their application can be subjective and may not fully capture the underlying electrophysiological dynamics.³ Moreover, a more accurate LBBB diagnosis might help identify candidates for other cardiac stimulation schemes, such as His bundle pacing or other biventricular configurations.^4,5

To address these challenges, many studies have turned to machine learning (ML) and deep learning (DL) for the automated analysis of ECG signals.^6–10 The field has seen significant advances in computational methods for cardiac monitoring, ranging from mobile cardiac monitoring systems⁸ to sophisticated prediction models for sudden cardiac death.^9,10 Initial efforts in automated sLBBB detection, such as those presented in the International Society for Computerized Electrocardiology (ISCE) initiative, reported accuracies around 82%,² demonstrating the potential of computational approaches.

However, a significant limitation of many advanced models is their ”black box” nature. This lack of transparency makes it difficult for clinicians to understand the reasoning behind a diagnosis, which hinders clinical trust and adoption.^11,12 In response, the field of Explainable Artificial Intelligence (xAI) has become crucial across various medical domains, including EEG analysis for impulsivity classification,¹³ cognitive impairment assessment through clock drawing tests,¹⁴ neonatal seizure detection,¹⁵ and Alzheimer’s disease evaluation.¹⁶

Recent works have begun to incorporate explainability into LBBB classification. For instance, Macas et al. (2024) developed a system using a reduced set of bio-inspired features derived from the vectorcardiogram (VCG), such as QRS-T angles and areas. They employed XAI techniques like SHAP to provide feature-level explanations, successfully identifying the most influential parameters for a ternary classification task ( $H e a l t h y$ , $L B B B$ , $s L B B B$ ) and achieving an accuracy of up to 82.63%. While providing valuable insights into feature importance, this approach’s explanations are based on feature attribution scores, which can still be complex to interpret directly in a clinical workflow.¹⁷

Concurrently, novel signal representation methods and advanced learning algorithms have been explored. Graph theory has emerged as a powerful paradigm to model the complex inter-lead relationships within a 12-lead ECG. In this line, Macas Ordóñez et al. (2025) proposed and compared two distinct graph-based methodologies. The first approach, using Graph Signal Processing (GSP) and a Support Vector Machine, achieved superior diagnostic accuracy (mean balanced accuracy of 0.8317) by leveraging features from the graph spectral domain, although these features can be difficult to interpret clinically. The second approach converted connectivity matrices into images for a Convolutional Neural Network (CNN), incorporating xAI via Grad-CAM to visualize the inter-lead interactions influencing the model’s decision. While this second method offered enhanced visual interpretability, its accuracy was lower, and like other xAI methods, it did not generate simple, direct clinical rules.¹⁸

The field has also seen advances in neural dynamic classification algorithms,¹⁹ finite element machines for fast learning,²⁰ dynamic ensemble learning,²¹ and self-supervised learning approaches for electrophysiological data.²² However, while these previous works have made significant strides, a gap remains in developing a method that is both highly accurate and provides explanations in the form of simple, verifiable rules specifically for LBBB differentiation.

Our current study addresses this gap by introducing a framework that leverages a novel set of VCG features based on conduction velocity dynamics, specifically the timing of the peak velocity ( $t_{V_{m a x}}$ ). Our main contribution is the subsequent application of advanced xAI techniques, such as the Anchors method, to distill the model’s complex decision-making process into a concise set of high-precision, human-readable rules. This approach moves beyond feature importance scores or complex spectral analysis to provide clinicians with transparent, physiologically grounded, and directly applicable criteria for distinguishing between $H e a l t h y$ , $L B B B$ , and $s L B B B$ patients, bridging the gap between predictive accuracy and true clinical interpretability.

Relative to our 2023 ICAE study (Macas et all, 2023), the present work introduces: (i) a velocity-based VCG loop segmentation (d1/d2) that yields physiologically grounded features; (ii) a 4-feature model that attains comparable performance to the 15-feature set while improving interpretability; (iii) consolidated, high-precision Anchor rules that provide global rule sets from local explanations; and (iv) by-class xAI analyses and spectral clustering that align with Strauss-defined phenotypes. Together these advances reduce model complexity, improve transparency, and sharpen physiologic insight.

Figure 1.

Methodology workflow: preprocessing, feature extraction/selection, ML, and XAI.

2. Materials and methods

The overall workflow of the methodology, from data preprocessing to model interpretation, is summarized in Figure 1. Two databases were utilized in this study. The first database, E-OTH-12-0602-024,²³ was obtained from the Telemetric and Holter ECG Warehouse (THEW) as part of the initiative of the International Society for Computerized Electrocardiology (ISCE) in 2018.² This database comprises 602 ECG records from the MADIT-CRT clinical trial, conducted at the University of Rochester (Rochester, NY). It includes 331 cases of strict left bundle branch block (sLBBB) and 192 cases of incomplete LBBB and 79 cases of other cardiopathies ( $n o L B B B$ ), each recording with a duration of 10 seconds, sampled at a frequency of 1 kHz with an amplitude resolution of 3.75 $μ$ V. The $n o L B B B$ class was ruled out from analysis because it contained extremely heterogeneous pathologies. Also, a preliminary analysis of the dataset revealed the presence of 41 duplicate cases in this dataset, 36 of which corresponded to the $L B B B$ and $s L B B B$ classes used in this study. We identified the 36 exact duplicate pairs and retained only one instance per pair, resulting in 184 $L B B B$ and 302 $s L B B B$ records. Due to the anonymization requirements of the database, it was not possible to determine if any records belonged to the same patient at different stages of the pathology (e.g., progressing from $L B B B$ to $s L B B B$ ).

The second database, a Large Scale 12-lead Electrocardiogram Database for Arrhythmia Study from PhysioNet,²⁴ includes ECGs from 45,152 patients in resting conditions, encompassing 64 different types of arrhythmias as well as healthy ECG records. The data were collected from Shaoxing People’s Hospital and Ningbo First Hospital. The ECGs are sampled at 500 Hz and have a duration of 10 seconds. From this database, 299 randomly selected ECG records of healthy subjects were extracted.

2.1. ECG Preprocessing

To ensure consistency in the analysis, all signals were resampled to 200 Hz. The ECG data were transformed into the vectorcardiographic (VCG) space using the inverse Dower matrix²⁵ and the derived VCG signals were delineated using a wavelet-based algorithm implemented with the WT-delineator library in Python.²⁶ This algorithm identified the onset and offset of the QRS complex, enabling the construction of QRS loops. For ECGs with abnormal morphologies, manual corrections were frequently necessary to ensure accurate delineation. For each patient, a unique mean loop was obtain by averging loops across all beats. From these mean loops, a set of 15 features were derived, as explained in the following sections.

Figure 2.

a) Morphological segmentation of the VCG lead $z$ according to the three fiducial points $Q R S_{o n s e t}$ , $Q R S_{o f f s e t}$ and $t_{V_{max}}$ . Vectorcardiographic loops segmented into $d_{1}$ and $d_{2}$ intervals for b) $H e a l t h y$ , c) $L B B B$ , and d) $s L B B B$ patients.

2.1.1. Cardiac velocity and loop segmentation

Cardiac velocity $V (t)$ was computed as the norm of the discrete derivatives of the VCG leads $x (t), y (t),$ and $z (t)$ . Afterwards, the time to peak ( $\arg max_{t} [V (t)]$ ) was estimated. This time reflects the instant when the instantaneous velocity reaches its maximum amplitude and was used to partition the VCG loops into two halves, as explained below.

Once the time of maximum velocity $t_{V_{max}}$ during the QRS complex was found, segmentation occurred as follows. The first segment, $d_{1}$ , spanned from the onset of the QRS complex to the time of maximum velocity $t_{V_{max}}$ . This period is characterized by rapid depolarization and high velocity, indicating early ventricular activation. The second section, $d_{2}$ , extended from the time of maximum velocity $t_{V_{max}}$ to the offset of the QRS complex. This interval represents the slower repolarization processes. The separation of these segments is detailed in Figure 2(a). Here, the segmentation for lead z is depicted according to the three fiducial points, $t_{QRS\_onset}$ , $t_{QRS\_offset}$ and $t_{V_{max}}$ . Consistently, the segmentation on the loops is shown in Figure 2(b), for the $x z$ plane in the case of a $H e a l t h y$ subject, while the $L B B B$ and $s L B B B$ counterparts are shown in Figure 2(c) and 2(d) respectively. The features peak velocity intervals, maximum norms and QRS angles were then computed for each half of the VCG waveforms and loops, totaling 15 features.

2.1.2. Peak velocity intervals

Based on the time for peak velocity $t_{V_{max}}$ obtained, three temporal parameters were constructed. The first temporal parameter was $t_{V_{max}}$ itself, while the remaining two $t_{V_{max} - Q R S_o n s e t}$ and $t_{Q R S_{o f f s e t} - V_{max}}$ are references of the time for peak velocity with respect to the fiducial points $t_{Q R S_o n s e t}$ and $t_{Q R S_o f f s e t}$ respectively.

\begin{aligned} t_{V_{max} - Q R S_{o n s e t}} & = t_{V_{max}} - t_{Q R S_o n s e t} \end{aligned}

(1)

\begin{aligned} Q R S_{o f f s e t} - t_{V_{max}} & = t_{Q R S_o f f s e t} - t_{V_{max}} \end{aligned}

(2)

Figure 3 depicts the intervals constructed from the VCG fiducial points ( $Q R S_{o n s e t}$ and $Q R S_{o f f s e t}$ ) and the time of peak velocity $T_{V_{m a x}}$ . In this particular example, lead $z$ (green) is shown simultaneously to cardiac velocity (blue). Notice the stretching of the $Q R S_{o f f s e t}$ - $T_{V_{m a x}}$ or the shortening of the $T_{V_{m a x}}$ - $Q R S_{o n s e t}$ intervals with conduction impairment.

Figure 3.

Comparison of VCG lead $z$ morphology (top traces) and electrical conduction velocity constructed over the norm of the $x, y, z$ discrete derivatives (bottom traces) for representative patients: a) $H e a l t h y$ , b) $L B B B$ , and c) $s L B B B$ . Panel a) illustrates the segmentation. The time from peak velocity to QRS offset ( $t_{Q R S_{o f f s e t}} - t_{V_{max}}$ ) visibly increases from $H e a l t h y$ to $s L B B B$ .

2.1.3. Maximum Norms

For each time $t_{j}$ within the intervals $d_{1}$ and $d_{2}$ , the magnitude of the VCG vector in the $x y$ plane, $‖ v_{x y} (t_{j}) ‖$ , was calculated as:

‖ v_{x y} (t_{j}) ‖ = \sqrt{x (t_{j})^{2} + y (t_{j})^{2}}

(3)

The calculated norms were divided into two groups corresponding to the $d_{1}$ and $d_{2}$ intervals and the respective maximum norm for each interval was obtained as follows:

\begin{aligned} ‖ v_{d 1 x y}^{max} ‖ & = max_{t_{j}} ‖ v_{x y} (t_{j}) ‖ for t_{j} \in [t_{QRS\_onset}, t_{V_{max}}] \\ ‖ v_{d 2 x y}^{max} ‖ & = max_{t_{j}} ‖ v_{x y} (t_{j}) ‖ for t_{j} \in [t_{V_{max}}, t_{QRS\_offset}] \end{aligned}

(4)

These maximum norms represent the dominant vectors in $d_{1}$ and $d_{2}$ . Similar computations were applied to the $xz$ and $yz$ planes.

2.1.4. QRS Angles

The angle $φ_{d_{1}}$ between the dominant vector $v_{d 1 x y}^{max}$ , extending from the origin of the VCG loop to the point of maximum norm $‖ v_{d 1 x y}^{max} ‖$ , and the horizontal axis, was determined using the dot product:

φ_{d_{1} x y} = (\frac{180}{π}) \arccos (\frac{⟨ v_{d_{1} x y}^{max}, i ⟩}{‖ v_{d_{1} x y}^{max} ‖ \cdot ‖ i ‖})

where

i

represents the unit vector along the horizontal axis

x

. The angle

φ_{d_{2}}

was calculated similarly for the vector

v_{d 2}

and planes

x z

and

y z

Figure 2 shows a representative example of loop segmentation based on peak velocity for the three classes. Notice how the d2 segment (blue) grows with pathology, from $H e a l t h y$ to $s L B B B$ . Also, note the characteristic rotation of the loop in the xz plane for the $s L B B B$ class.

2.2. Data conditioning for machine learning models

We initiated the process with a tabular dataset comprising 785 rows and 16 columns. Each of the 785 rows represented a unique patient, characterized by 15 features and a label column indicating the class, which was assigned one of three values: 0, 1, and 2 representing the categories $H e a l t h y$ , $L B B B$ , and $s L B B B$ , respectively. The distribution of subjects was balanced by class, with 299 $H e a l t h y$ , 184 $L B B B$ , and 302 $s L B B B$ . Therefore, no synthetic resampling or class-weighting technique was applied in order to preserve the natural composition of the clinically validated dataset.

Six out of fifteen features accounted for maximum norms at $d_{1}$ and $d_{2}$ in planes $x y$ , $x z$ and $y z$ . Another six features consisted of QRS angles at $d_{1}$ and $d_{2}$ in every plane, and finally, the last three features were time-related ones, consisting of the time for peak velocity $t_{V_{m a x}}$ , the interval from the beginning of the QRS to $t_{V_{m a x}}$ : $t_{V_{max} - Q R S_o n s e t}$ and the interval from the time for $t_{V_{m a x}}$ to QRS offset: $Q R S_{o f f s e t} - t_{V_{max}}$ .

Following the feature engineering stage, we proceeded to a multicollinearity detection phase. Multicollinearity is detrimental to machine learning models as it can lead to unstable and unreliable estimates, inflated standard errors, and difficulties in interpreting the importance of individual predictors. To address this, we conducted a correlation analysis of the features and calculated the Variance Inflation Factor (VIF). VIF quantifies how much the variance of a regression coefficient increases due to multicollinearity among predictors, thereby measuring the extent of redundancy in the model.²⁷ High VIF values indicate strong correlations between predictors and suggest the need for feature removal or transformation to stabilize model interpretation.

2.3. Statistical analysis

To assess statistical differences in the QRS angles $φ_{d_{1}}$ and $φ_{d_{2}}$ in every plane, maximum norms $‖ v_{d 1 x y}^{max} ‖$ and $‖ v_{d 2 x y}^{max} ‖$ in every plane and $t_{V_{max}}$ intervals between $H e a l t h y$ subjects and those patients with $L B B B$ or $s L B B B$ , we performed a non-parametric Kruskal-Wallis test. The Dunn’s test with Holm-Bonferroni correction was then employed for post-hoc pairwise comparisons when Kruskal-Wallis showed significant differences. Statistical significance was considered for $p < 0.05$ .

3. Results

The segmentation of the VCG waveforms and loops produced clearly separated patterns across $H e a l t h y$ , $L B B B$ and $s L B B B$ patients. In healthy subjects, the time for peak velocity appeared in the delayed portions of the QRS complex, with a mean of 50 ms before QRS offset on average. This produced $d_{1}$ intervals much longer than their $d_{2}$ counterparts. Conversely, the time for peak velocity of $L B B B$ patients appeared earlier and those of $s L B B B$ even much earlier, at a mean of 91 ms and 134 ms with respect to the QRS offset, respectively. Figures 3(a)3(c) compare lead $x$ and electrical conduction velocity for those patients chosen in Figure 3. Once again, the increasingly longer $t_{V_{max} - Q R S_{o f f s e t}}$ is evident from $H e a l t h y$ to $s L B B B$ patients. A significantly extended $d_{2}$ interval might indicate a more proximal disruption in electrical conduction. These findings highlight differences in electrical conduction dynamics among the groups, consistent with the variations observed in the previously described VCG loops.

In addition, to compare $t_{V_{max}}$ loop segmentation among the three classes analyzed, Figure 2 presents a paradigmatic segmented VCG loop for each cardiac condition: a $H e a l t h y$ subject (Figure 2(b)), a $L B B B$ (Figure 2(c)), and a $s L B B B$ patient (Figure 2(d)). In panel 2(b), $d_{1}$ is the longest segment, reflecting preserved velocity activation during ventricular depolarization. In contrast, panel 2(c) shows a shorter $d_{1}$ interval, indicating a premature delay in ventricular conduction for this representative $L B B B$ subject. Consistently, Panel 2(d) demonstrates an even shorter $d_{1}$ segment, signifying even more premature conduction impairment for a paradigmatic $s L B B B$ patient. The opposite occurred with $d_{2}$ intervals. Panel 2b presents the shortest duration, indicating a rapid transition from $t_{V_{max}}$ to the QRS offset. Conversely, in the $L B B B$ patient (panel 2(c)), the $d_{2}$ segment is longer, and even longer in the $s L B B B$ patient (panel 2(d)), suggesting a progressively slower transition. These findings highlight critical differences in QRS complex dynamics among the groups.

Consistently, Figure 4 represents the generalization of Figure 3 to the entire population. Here, an increasing $Q R S_{o f f s e t} - t_{V_{max}}$ can be noted from $H e a l t h y$ to $s L B B B$ passing through the $L B B B$ classes. This measure was significantly different in pairwise comparisons among all three classes by the Dunn’s test with Bonferroni correction, following the Kruskal-Wallis test.

Figure 4.

Boxplot representation of time to peak velocity with respect to QRS offset ( $Q R S_{o f f s e t} - t_{V_{max}}$ ) for $H e a l t h y$ , $L B B B$ , and $s L B B B$ patients. $^{*}$ $p < 0.05$ among all classes, from Dunn’s test with Bonferroni correction, following the Kruskal-Wallis test.

Figure 5a shows the boxplots of the maximum norms from $V_{max}$ to the end of the QRS complex. A statistical analysis of these norms was conducted among the groups of subjects with $L B B B$ , $s L B B B$ , and $H e a l t h y$ controls, using the Kruskal-Wallis test and Dunn’s post-hoc test with Bonferroni correction. The Kruskal-Wallis test for $d_{1}$ yielded a statistic $H = 364.84$ with a $p$ -value $< 5.98 \times 10^{- 80}$ , indicating significant differences between the groups. The Dunn’s test with Holm-Bonferroni correction for $d_{1}$ revealed significant differences between $H e a l t h y$ and $L B B B$ subjects (p = 7.24e-30), $H e a l t h y$ and $s L B B B$ (p = 1.65e-78), and $L B B B$ and $s L B B B$ (p = 3.06e-06). For $d_{2}$ , the Kruskal-Wallis test showed $H = 93.98$ with a $p$ -value $< 3.91 \times 10^{- 21}$ . The Dunn’s test with Holm-Bonferroni correction indicated significant differences between $H e a l t h y$ and $L B B B$ (p = 1.36e-12) and between $H e a l t h y$ and $s L B B B$ (p = 5.26e-19), while no significant differences were observed between $L B B B$ and $s L B B B$ (p = 1.00).

Figure 5.

Boxplots for physiological features in $H e a l t h y$ , $L B B B$ , and $s L B B B$ classes. (a) Dunn’s test with Bonferroni correction following Kruskal-Wallis. d1: H $=$ 364.84, p < 1.00e-95; d2: H $=$ 93.98, p < 1.00e-20. Significant differences noted for Healthy vs $L B B B$ (p $=$ 7.24e-30) and $H e a l t h y$ vs $s L B B B$ (p $=$ 1.65e-78) in d1; $H e a l t h y$ vs $L B B B$ (p $=$ 1.36e-12) and $H e a l t h y$ vs $s L B B B$ (p $=$ 5.26e-19) in d2. (b) Denotes significant differences (p ¡ 0.05) for $H e a l t h y$ vs $L B B B$ (p $=$ 2.48e-59) and $H e a l t h y$ vs $s L B B B$ (p $=$ 1.06e-79), but not $L B B B$ vs $s L B B B$ (p $=$ 1.00).

Analogously, Figure 5b illustrates the statistical analysis of QRS angles in the $x z$ plane, showing significant differences among the study groups: $H e a l t h y$ , with left bundle branch block ( $L B B B$ ), and with strict left bundle branch block ( $s L B B B$ ), evaluated at two distinct time points ( $d_{1}$ and $d_{2}$ ). The Kruskal-Wallis test for $d_{1}$ revealed a statistic $H = 434.57$ with a $p$ -value $< 4.32 \times 10^{- 95}$ , indicating substantial variability between groups. Dunn’s post-hoc tests highlighted significant differences between $L B B B$ and $s L B B B$ (p = 1.06e-79), as well as between $H e a l t h y$ and $L B B B$ (p = 2.48e-59). However, for $d_{2}$ , the analysis found no significant differences (p = 0.39).

3.1. Identification of relevant bio-inspired features

In the context of classification models, situations where two or more features are highly correlated with each other can lead to several issues that affect the performance and interpretation of the model. Our multicollinearity analysis revealed high correlations (exceeding 75% in some cases) between several maximum norms and angles, respectively. Consequently, we decided to eliminate some of these features. To determine which ones to remove, we defined and fitted several basic models (based on XGBoost and Random Forest) and performed a feature importance analysis using various methods (SHAP Feature Importance,²⁸ Random Forest Split Entropy Importance²⁹ and Univariate Selection Tests,³⁰ among others. This comprehensive approach ensured that we retained the most informative features while mitigating the effects of multicollinearity on our models.

Feature elimination was conducted in multiple complementary steps to ensure statistical rigor. First, pairwise correlations were inspected, and one feature of each pair with $| r | > 0.75$ was removed. The Variance Inflation Factor (VIF) was then computed for the remaining features, and only those with extremely high VIF values (VIF > 50) were excluded to preserve potentially informative but correlated features for further analysis. Subsequently, several independent methods were applied to evaluate feature relevance:

Random Forest feature importance, which ranks predictors according to their mean contribution to impurity reduction across all decision trees, reflecting how frequently and effectively each variable is used to split the data.

SHAP values, which quantify each feature’s mean marginal contribution to the model prediction, allowing class-specific interpretability.

Univariate selection tests (ANOVA), which assess the statistical association between each feature and the output class.

Recursive Feature Elimination (RFE) with cross-validation, a backward selection procedure that iteratively removes the least important features based on model performance. The cross-validation accuracy curve exhibited a clear inflection (“elbow”) at four features, indicating that this number achieved nearly the highest accuracy while minimizing model complexity.

Additionally, kernel density estimates (KDE) by class were inspected to confirm discriminative patterns, particularly between $L B B B$ and sLBBB. The final subset of four features was selected by consensus among these methods, prioritizing variables recurrently ranked as most important and physiologically interpretable. This multi-method approach aligns with the recommendations of Guyon and Elisseeff,³¹ who emphasize that there is no single universally valid framework for feature selection. Instead, combining complementary statistical, wrapper, and embedded methods provides a more robust and generalizable basis for selecting relevant predictors.

From the former analysis, we extracted a reduced set of just 4 features that achieved comparable classification metrics as the full 15-feature dataset, as will be explained in the next section. The reduced subset of features was composed of:

the time of peak velocity to QRS onset ( $t_{V_{max} - Q R S_o n s e t}$ )

the time of peak velocity to QRS offset ( $Q R S_{o f f s e t} - t_{V_{max}}$ )

the maximum norm of the $d_{1}$ segment in the $x y$ plane ( $‖ v_{d 1 x y}^{max} ‖$ )

the angle of the $d_{1}$ dominant vector in the $x z$ plane ( $φ_{d_{1} x z}$ )

3.2. Models

To evaluate the performance of various machine learning models on our 15-feature dataset, we employed an automated machine learning (AutoML) tool, specifically PyCaret,³² to systematically assess a wide range of algorithms. Tree-based ensemble models such as Random Forest, Gradient Boosting, and XGBoost were prioritized because extensive empirical evidence shows that they outperform deep neural networks on structured, low-dimensional tabular datasets typical of biomedical studies.^33–35 Deep learning architectures were not adopted since our dataset comprised fewer than one thousand subjects, a regime where such models tend to overfit and provide limited interpretability. The experimental setup utilized a 10-fold cross-validation (CV) strategy with an 80/20 train-test split. We noticed a notable superiority of ensemble-based decision tree models and gradient boosting algorithms in this particular classification task. Specifically, the Gradient Boosting Classifier (GBC) demonstrated the highest overall performance, achieving an accuracy of 0.8324, matched by its sensitivity, along with a precision of 0.8382 and an F1-score of 0.8297. The same procedure was applied to the reduced 4-feature dataset (described in the section “3.1”), with the results shown in Table 1. The outcomes were remarkably similar to those obtained with the extended dataset, suggesting that the reduced 4-feature model retains the discriminatory power of the extended dataset.

Table 1.
Comparison of 4-feature machine learning models.

Abrev. Model Accuracy Recall Prec. F1

et Extra Trees Classifier 0.8343 0.8343 0.8331 0.8281

gbc Gradient Boosting Classifier 0.8270 0.8270 0.8286 0.8244

rf Random Forest Classifier 0.8269 0.8269 0.8258 0.8211

lda Linear Discriminant Analysis 0.8216 0.8216 0.8151 0.8135

catboost CatBoost Classifier 0.8178 0.8178 0.8183 0.8142

xgboost Extreme Gradient Boosting 0.8124 0.8124 0.8134 0.8095

lightgbm Light Gradient Boosting Machine 0.8033 0.8033 0.8074 0.8030

qda Quadratic Discriminant Analysis 0.7997 0.7997 0.7989 0.7896

nb Naive Bayes 0.7906 0.7906 0.7893 0.7832

ada Ada Boost Classifier 0.7687 0.7687 0.7820 0.7707

dt Decision Tree Classifier 0.7505 0.7505 0.7606 0.7528

ridge Ridge Classifier 0.7233 0.7233 0.6297 0.6456

lr Logistic Regression 0.7160 0.7160 0.6753 0.6823

knn K Neighbors Classifier 0.6811 0.6811 0.6496 0.6576

svm SVM - Linear Kernel 0.6141 0.6141 0.5567 0.5328

Abrev.	Model	Accuracy	Recall	Prec.	F1
et	Extra Trees Classifier	0.8343	0.8343	0.8331	0.8281
gbc	Gradient Boosting Classifier	0.8270	0.8270	0.8286	0.8244
rf	Random Forest Classifier	0.8269	0.8269	0.8258	0.8211
lda	Linear Discriminant Analysis	0.8216	0.8216	0.8151	0.8135
catboost	CatBoost Classifier	0.8178	0.8178	0.8183	0.8142
xgboost	Extreme Gradient Boosting	0.8124	0.8124	0.8134	0.8095
lightgbm	Light Gradient Boosting Machine	0.8033	0.8033	0.8074	0.8030
qda	Quadratic Discriminant Analysis	0.7997	0.7997	0.7989	0.7896
nb	Naive Bayes	0.7906	0.7906	0.7893	0.7832
ada	Ada Boost Classifier	0.7687	0.7687	0.7820	0.7707
dt	Decision Tree Classifier	0.7505	0.7505	0.7606	0.7528
ridge	Ridge Classifier	0.7233	0.7233	0.6297	0.6456
lr	Logistic Regression	0.7160	0.7160	0.6753	0.6823
knn	K Neighbors Classifier	0.6811	0.6811	0.6496	0.6576
svm	SVM - Linear Kernel	0.6141	0.6141	0.5567	0.5328

The reduction in dimensionality from 15 features to a mere 4, achieved with just a small decrease in classification accuracy, represents a significant advancement in our research. Firstly, it facilitates the development of an explainable model, wherein a concise set of rules can be derived to elucidate individual classification outcomes. This interpretability is particularly important for medical practitioners and specialists who need to comprehend the reasoning behind each prediction. Secondly, the reduced feature space enables a more tractable analysis of feature importance and their global impact on the classification process. To this end, we employed state-of-the-art interpretability techniques such as SHAP (SHapley Additive exPlanations) and analogous methodologies, as will be seen in the next sections.

To obtain the definitive models, an ensemble of decision trees was employed, optimized through a hyperparameter search across 864 candidate configurations using 5-fold CV. The results of this fine-tuning process for both the original and reduced datasets are presented in Table 2, compared with the 15-feature model, the 4-feature model showed small absolute decreases across metrics (e.g., accuracy 0.87 vs 0.85; macro-F1 0.85 vs 0.83). Once again, it can be observed that the metrics obtained by the optimized models are remarkably similar in both cases, with the 15-feature model being slightly superior to the 4-feature model.

Table 2.

Comparison of final finetuned models.

Model	Class	Precision	Sensitivity	F1-score	Accuracy
rf-15feat	0	0.94	1.00	0.97	0.87
	1	0.77	0.65	0.71
	2	0.84	0.87	0.85
rf-4feat	0	0.95	0.97	0.96	0.85
	1	0.72	0.70	0.71
	2	0.83	0.83	0.83

Although combinations of weaker classifiers in a meta-algorithmic or ensemble framework could theoretically enhance predictive accuracy, such approaches were not pursued in this work since our main objective was to preserve interpretability while maximizing accuracy. By restricting the model to a compact set of 15 features, reduced to only 4 with small performance loss, we ensured compatibility with explainability techniques such as SHAP values and Anchors, which provide a clearer physiological interpretation of model decisions. Increasing the complexity of the model would likely reduce this clarity and hinder the extraction of clinically meaningful rules.

3.3. Explainability

Explainable Artificial Intelligence ( $x A I$ ) was utilized to gain insights into the individual contributions of the four features contributing to the final model. Explainable AI was incorporated as a central component of the study design. Beyond enhancing interpretability, the goal was to derive rule-based insights grounded in cardiac electrophysiology, enabling clinicians to relate the model’s decisions to measurable conduction phenomena. By means of these tools, we were able to define the specific ranges for each feature and each class, as well as to define feature importance globally and by class. Moreover, we obtained sets of rules that explained the models’ decisions for individual and global instances. These rules proved to be very useful, since they could be applied by the physician in a straightforward and simple way, beyond any computational algorithm.

We can conclude via $x A I$ that the most important feature for $s L B B B$ classification was $Q R S_{o f f s e t} - t_{V_{max}}$ , while $φ_{d_{1} x z}$ was the most important feature for $L B B B$ classification.

To interpret the decision process of the classification models, we used SHAP (SHapley Additive exPlanations) values, which quantify the contribution of each feature to the model’s output for each individual prediction. SHAP assigns an additive importance value to each feature based on cooperative game theory, comparing the model output with and without each feature.

For each patient, the model prediction $f (x)$ was decomposed as:

f (x) = E [f (x)] + \sum_{i} {SHAP}_{i},

where

E [f (x)]

is the mean model output (baseline prediction) and

{SHAP}_{i}

represents the contribution of the

i

-th feature to the deviation from the mean prediction.

In this study, SHAP values were computed using the TreeExplainer algorithm for the Random Forest model trained on the four selected features: $t_{V_{max}} - t_{Q R S_{onset}}, Q R S_{offset} - t_{V_{max}}, ‖ v_{d 1 x y}^{max} ‖, φ_{d_{1} x z}$ .

Figure 6 shows such feature importance by class, with the contribution for each class visually grouped and labeled in the legend, for two different models. The classification power for the $H e a l t h y$ class is plotted, with an area proportional to its discrimination for each feature. Accordingly, other areas show the classification power for the $s L B B B$ class and for $L B B B$ discrimination. It is important to remark that this feature importance was present regardless of the model. In fact, the same order of feature importance was determined on each model separately. Thus, both models (XGboost and Random Forest) assessed $Q R S_{o f f s e t} - t_{V_{max}}$ as the best feature for $s L B B B$ classification, and $φ_{d_{1} x z}$ as the most important feature for $L B B B$ on both models. Regarding the third class, $H e a l t h y$ , discordance on the most important feature was found when comparing the models.

Figure 6.

Feature importances by class for a) RandomForest and b) XGboost classifiers. Predictive power is represented by the length of the bars for each class (grouped and labeled in the legend). Notice the consistency in feature importance comparing both models. a) RandomForest model and b) Xgboost model.

Thanks to the use of explainable techniques, it becomes possible to infer the most likely class of a given observation based on the specific range of values taken by each of the four features. This relationship is clearly illustrated in the dependence plot, where distinct classes occupy different regions of the feature space.

To further explore these inter-class differences, Figure 7 presents the distribution of the feature QRSoffset–Tvmax across the three classes. As shown, the distribution progressively shifts toward higher values as the class transitions from healthy to LBBB and then to SLBBB. In particular, the healthy class can be characterized by values between 0 and 0.08 ms, the LBBB class by values between 0.08 and 0.10 ms, and the SLBBB class by values ranging from 0.10 to 0.16 ms. Notably, these ranges are consistent with the simplified decision rules previously identified for the SLBBB class (see Table 3).

Figure 7.

Dependence plots for feature $Q R S_{o f f s e t} - t_{V_{max}}$ in the 4-feature Random Forest model, for each of the 3 classes. Positive SHAP values indicate that the corresponding feature values increase the model’s likelihood of predicting that class. In this case, small values of $Q R S_{o f f s e t} - t_{V_{max}}$ (below 0.07) are positively associated with the $H e a l t h y$ class, moderate values ( $0.07 - 0.11$ ) contribute to $L B B B$ , and large values (above 0.12) contribute positively towards $s L B B B$ class prediction, while decreasing the probability of the other classes.

Table 3.

Anchors for class $s L B B B$ .

anchor	precision	coverage	# samples
$[Q R S_{o f f s e t} - t_{V_{max}} > 0.12]$	0.974359	0.2237	154
$[- 7.94 < φ_{d_{1} x z} \leq 63.25, Q R S_{o f f s e t} - t_{V_{max}} > 0.08, t_{V_{max}} - Q R S_{o n s e t} \leq 0.04, ‖ v_{d 1 x y}^{max} ‖ \leq 0.40]$	0.960317	0.0507	37
$[φ_{d_{1} x z} \leq 37.37, Q R S_{o f f s e t} - t_{V_{max}} > 0.08, t_{V_{max}} - Q R S_{o n s e t} \leq 0.04, ‖ v_{d 1 x y}^{max} ‖ \leq 0.40]$	0.955446	0.0221	11

3.4. Clustering

To evaluate whether these four physiological features tended to naturally group similarly to Strauss criteria, we implemented a Spectral Clustering algorithm on these features and clustered them into three classes. The result is displayed in Figure 8. Here, samples for each class are represented in different colors for each pair of features { $t_{V_{max}} - Q R S_{o n s e t}$ vs $Q R S_{o f f s e t} - t_{V_{max}}$ } for the upper panels and { $φ_{d_{1} x z}$ vs $‖ v_{d 1 x y}^{max} ‖$ } for the lower panels. The spectral clustering algorithm presented quite similar classes as the Strauss criteria visually. To quantify this natural grouping of the features, we computed the centroids for each cluster and compared them to the actual grouping by Strauss criteria. Centroids for the $H e a l t h y$ true classes and their corresponding clusters coincided for both pairs of features (upper and lower panels). True and clustered classes for $L B B B$ (class 1) however, presented the most distant centroids, as can be seen in Figure 8, black square markers.

Figure 8.

Spectral cluster for $H e a l t h y$ , $L B B B$ , and $s L B B B$ patients based on the 4 physiological features. Samples are colored by class: $H e a l t h y$ samples (class/cluster 0), $L B B B$ samples (class/cluster 1),and $s L B B B$ samples (class/cluster 2). Centroids are marked by solid squares. Note that both classes/clusters 0 and 2 show very similar centroids, while class/cluster 1 does not.

3.5. Anchors

The anchor method³⁶ is an $x A I$ technique that explains individual predictions of any black box classification model. Anchors provide rule-like explanations that are easy to understand by highlighting parts of the input data that are sufficient to ”anchor” the model’s prediction. This means that changes in other feature values not present in the rule predicates do not affect the prediction.

The method was explicitly designed to be highly precise, ensuring that the generated rules, when satisfied, lead to the same prediction with a (customizable) precision level.

As an example, this is the Anchor obtained for sample1 in our dataset:

\begin{aligned} IF (Q R S_{o f f s e t} - t_{V_{max}} \leq 0.04 AND \\ φ_{d_{1} x z} \leq {37.37}^{\circ}) THEN \\ PREDICT: H e a l t h y \\ WITH PRECISION: 0.9845 \\ AND COVERAGE: 0.2428 \end{aligned}

Precision represents the relative number of correct predictions. In this case, approximately 98% of the samples for which the anchor holds are classified as class $H e a l t h y$ . The coverage refers to the proportion of instances that satisfy the anchor conditions. By incorporating coverage, Anchors aim to produce explanations that are not only precise but also sufficiently general to be meaningful and actionable for human interpretation.

Anchor rules were generated in a stratified cross-validation setting to maximize precision subject to coverage, then consolidated by removing duplicates and retaining rules with 95% precision and highest coverage. Thresholds (e.g., $Q R S_{offset} - t_{V_{max}} > 0.12 s; φ_{d_{1} x z} \leq {37.37}^{\circ}$ ) arise from this optimization, not from manual tuning. Table 3 shows rule precision, coverage, and the number of supporting samples.

However, Anchor Explanations provide local interpretability, meaning they explain individual predictions rather than the overall behavior of the model. In an attempt to derive global explanations from local anchor-based interpretations, we employed a novel consolidation approach. Initially, we generated anchors for all samples in the dataset using the Anchor algorithm. The Anchor algorithm obtained a total of 210 different anchors for the whole dataset. These local explanations were then aggregated to obtain a reduced set of consolidated anchors representative of the entire dataset.

The rule consolidation process involved several steps, such as removing duplicates and incorrect predictions as well as retaining only those with over 95% precision and the highest coverage. This final step resulted in just 27 rules (10, 14, and 3 for each class, respectively). The reduced set of anchors obtained for the $s L B B B$ class is shown in Table 3, together with their respective precision, coverage and the number of $s L B B B$ samples that satisfied it. For example, approximately 50% of the samples can be classified as $s L B B B$ based solely on the condition that their $Q R S_{o f f s e t} - t_{V_{max}}$ is larger than 0.12. This fact is consistent with Figure. 6, identifying $Q R S_{o f f s e t} - t_{V_{max}}$ as the most important feature for the $s L B B B$ class. Note also that for a more relaxed threshold on $Q R S_{o f f s e t} - t_{V_{max}}$ , the remaining 3 features need to be incorporated as well, producing rules with less coverage. These feature combinations basically state that the angle from the horizontal in the $x z$ plane should be within the first quadrant, or anatomically, in the posterior left quadrant ( $- 7.94 < φ_{d_{1} x z} \leq 63.25$ ), the time of velocity peak should be located closer to the beginning than to the end of the QRS complex ( $t_{V_{max}} > 0.08$ and $t_{V_{max}} - Q R S_{o n s e t} \leq 0.04$ ) and present a large maximum instantaneous vector in the first segment of the loop ( $‖ v_{d 1 x y}^{max} ‖ \leq 0.40$ ). It is worth remarking that the third rule at Table 3, includes negative angles for $φ_{d_{1} x z}$ , which might seem inconsistent with the $s L B B B$ definition. However, this rule stands for just 11 patients, represented by the outliers shown in Figure 5b ( $s L B B B$ class). This fact reminds us that anchors just explain the model predictions based on the data we have, whether the data is ”good enough” or not. Anchors simply construct a predictive rule from these data and constrain its applicability with a coverage score.

4. Discussion

To improve the response to CRT, efforts were made to enhance the stimulation wave using biphasic configurations⁴ and designs inspired by neurostimulators,³⁷ without any success. Nowadays, a heavily explored approach for refining CRT is aimed at improving the diagnosis of LBBB, by means of a more rigid definition created empirically by Strauss.¹ The strict Left Bundle Branch Block (LBBB) criteria defined by Strauss in 2011 focus on specific electrocardiographic (ECG) characteristics that predict a better response to Cardiac Resynchronization Therapy (CRT). These criteria help identify patients who are more likely to benefit from CRT, focusing on those with a true conduction block in the left bundle branch rather than other causes of prolonged QRS duration. These criteria were designed to differentiate between regular LBBB and ”strict” LBBB, based on patient outcomes from the MADIT-CRT trial.

In this paper, we sought to find VCG-based variables that tell us about the pathophysiological differences, if any, underlying the Strauss criteria. To do this, we split up the QRS loop into two, according to the peak velocity. This procedure segmented the QRS loop into a first segment developed at preserved conduction, and a second one, developed at a much lower, non-specific conduction. On these two segments, variables such as the time of peak velocity referred to the QRS fiducial points ( $Q R S_{o f f s e t} - t_{V_{max}}$ , $t_{V_{max}} - t_{Q R S_{o n s e t}}$ ), the angles ({ $φ_{d_{1} x z}$ , $φ_{d_{1} x y}$ , $φ_{d_{1} y z}$ ) and moduli ( $‖ v_{d 1 x y}^{max} ‖$ , $‖ v_{d 1 x z}^{max} ‖$ , $‖ v_{d 1 y z}^{max} ‖)$ of the instantaneous vector on both loop segments in each plane were analyzed. From the former variables, four parameters coincided to provide the best classification performance with either an XGBoost or a Random Forest model: $Q R S_{o f f s e t} - t_{V_{max}}$ , $t_{V_{max}} - t_{Q R S_{o n s e t}}$ , $‖ v_{d 1 x y}^{max} ‖$ and $φ_{d_{1} x z}$ (see Figure 6). From the vectorcardiographic point of view, a $Q R S_{o f f s e t} - t_{V_{max}}$ that progressively increases from $H e a l t h y$ subjects to $s L B B B$ , passing through $L B B B$ , expresses in a very intuitive way the depolarization sequence in the three groups (Figure 4). That is, under normal conditions, the electrical activation of both ventricles starts from their endocardial regions and spreads to the rest of the myocardium until depolarization is complete in the basal septal and basal posterolateral segments of the left ventricle, 80 msec after the depolarization has started. Conversely, in advanced left bundle branch blocks, the activation sequence starts in the right ventricle and must progress towards the LV through the interventricular septum (transeptal barrier) until it reaches the endocardium of said ventricle, which requires 50 msec. The impulse takes another 40-50 msec from the moment it reaches the left Purkinje system and the endocardium of the high septum and basal posterolateral segment, to complete the depolarization of said segments after 140 - 150 msec after the process has started. In the vectorcardiographic analysis of the ventricular activation sequence during LBBB, four phases are recorded, according to Perez Riera et al.³⁸ Precisely, phase III corresponds to the activation of the high septum and the basal posterolateral segment of the LV and is responsible for the initial apex of the R wave of the electrocardiogram in the left leads, and the nadir of the S wave in $V_{1} - V_{2}$ . This vector is processed slowly and is responsible for the delay of the mid-final forces of the ECG (50 ms) to a great extent. The last phase (IV) corresponds to the afferent branch of the QRS loop and is also of slow inscription. In the ECG, it represents the second apex of the R wave in the left leads and the second nadir of the S wave in V1-V2. These last two phases explain the notches or slurring in these electrocardiographic leads. In the present work, the $Q R S_{o f f s e t} - t_{V_{max}}$ expresses this mid-final delay after phase II of the VCG and precisely quantifies this delay. For LBBB, in which the mid-late delay (phases III-IV) is not so great, the presence of a residual condition in the left bundle branch, left ventricular hypertrophy or a combination of left anterior hemiblock of the His bundle with left ventricular hypertrophy could explain the findings for this group in the present work.³⁹ Furthermore, the lack of statistical significance observed in the $d_{2}$ angles (p $=$ 0.39), in stark contrast to the highly significant $d_{1}$ angles, suggests that the primary ventricular conduction disturbances differentiating these groups occur early in the depolarization process. The $d_{1}$ segment captures this initial, anomalous activation pathway, while the $d_{2}$ segment represents the later, slower non-specific conduction phase, which appears to be morphologically similar (in the $x z$ plane) across all LBBB types once the main block is established.

On the other had, the different AI tools implemented in this work support the same concept: the $L B B B$ class appears to be an intermediate state between the $H e a l t h y$ and $s L B B B$ conditions. This is reflected in several outcomes. First, all AI models yielded the poorest classification metrics for this class (see Table 2). Second, it showed the lowest importance values (see Figure 6). Finally, it displayed the most distant centroids in the clustering analysis (Figure 8), where the four physiological features were grouped in an unsupervised manner and contrasted with the actual Strauss criteria classes. It is evident from Figure 8 that the features clustered similarly to the actual classes, with centroids being fairly close for all classes except for the LBBB group (class 1/cluster 1). This might suggest heterogeneity rather than a single pathophysiological entity. Future work should explore sub-phenotyping within LBBB using richer clinical covariates.

It is worth mentioning at this point, that the models tested in this piece of work produced ternary classification performances comparable to those binary classification performances compiled in the International Society for Computerized Electrocardiology in its initiative for automated LBBB detection in terms of accuracy, sensitivity and precision.² In this way, the best accuracy reported in the 7 participants group was of 82% (with 69% sensitivity and 87% precision).⁴⁰ Moreover, with only these 4 features, we obtained better metrics than in a previous work, who utilized sets of 7 or 19 physiological features for a ternary classification scheme.¹⁷ It is crucial to construct good models when attempting to explain them. Explainable AI is useful and can be reliable only when the AI models themselves are useful and reliable.¹²

4.1. Study limitations and future work

It is also important to contextualize our methodological approach. The primary goal of this study was to isolate and validate a set of novel VCG features as fundamental pathophysiological markers. To achieve this rigorously, our analysis was based on a single representative beat per patient. This approach allows for the establishment of a clear baseline, confirming with confidence that the observed discriminatory power comes directly from the proposed features by controlling for the variable of inter-beat dynamics. While this approach is complete for the stated objective, the analysis of multiple beats represents a valuable subsequent step. In fact, the study of beat-to-beat variability is the logical next step that builds upon the solid foundation we have established. Analyzing the variance of our features could reveal ”meta-features” with important clinical information about the stability of the block or the severity of the pathology, representing a very promising future line of research that stems directly from our findings.

Finally, a novel approach to xAI was explored to provide a simplified set of rules to quickly classify these three classes relying on the four physiological features (see Table 3). It is important to note that Anchors do not necessarily incorporate all features from the dataset, instead focusing on a select few. This characteristic imparts a sense of hierarchical importance to the rules governing model predictions. Moreover, consolidated Anchor rules can identify subsets of feature values that are sufficient conditions for the model to return a particular classification, providing a more granular understanding of feature interactions and their impact on model outcomes. We strongly believe that this technique might enhance the transparency and interpretability of AI models used in electrocardiology. However, this study is retrospective and draws on two existing datasets with subsampling and occasional manual delineation. Generalizability beyond these cohorts is unknown. In particular, it would be desirable to correlate the temporal features $t_{V_{max}} - Q R S_{o n s e t}$ and $Q R S_{o f f s e t} - t_{V_{max}}$ with the activation patterns recorded during coronary mapping in an electrophysiological study, to achieve a more robust experimental validation. Thus, before any clinical implementation, prospective, multi-center validation with pre-registered analysis is required, including calibration, decision-curve analysis, and assessment of net clinical benefit.

Additionally, future work will include a systematic analysis of misclassified cases to investigate whether these errors reflect physiological overlap or outlier behavior within the $L B B B$ and $s L B B B$ classes. Such analysis may help refine feature boundaries and improve the physiological interpretability of the classification model.

5. Conclusions

This study advances the understanding of the physiological differences between regular and strict Left Bundle Branch Block by analyzing vectorcardiographic data using machine learning and explainable AI techniques, aligning with recent research trends in the International Journal of Neural Systems that emphasize explainable AI in medical applications.^13–16 Our work connects to the journal’s recent focus on transparent machine learning for clinical decision support across various medical domains, from EEG-based impulsivity classification to cognitive impairment assessment and neonatal seizure detection.

This explainable AI framework offers several contributions to clinical practice. Firstly, it improves diagnostic discrimination: The model identifies four key physiological features (conduction velocity-related time intervals, maximum VCG loop norm in the frontal plane, and QRS angle in the horizontal plane) that allow for robust differentiation between $L B B B$ , $s L B B B$ , and $H e a l t h y$ subjects, with accuracy comparable to more complex models but with greater interpretability. Secondly, simple and verifiable clinical rules are generated: Thanks to explainable artificial intelligence (XAI) techniques, clear and practically applicable rules are derived, facilitating decision-making by the clinician without depending on ”black box” models, similar to recent XAI approaches published in this journal.^14,15 The study successfully identified a reduced set of four key features that classify these groups effectively, with the latter features composing a set of three simple rules to represent the $s L B B B$ class with a mean precision of 96% of all samples fulfilling the rule.

Our methodological approach builds upon established computational cardiology research^7,8 while advancing the field through explainable AI. For future extensions of this research, we plan to explore recently developed powerful classification algorithms such as Neural Dynamic Classification,¹⁹ Finite Element Machines for fast learning,²⁰ Dynamic Ensemble Learning Algorithms,²¹ and self-supervised learning approaches²² that have shown promising results in electrophysiological signal analysis. These advanced computational techniques could further enhance the accuracy and robustness of our classification framework while maintaining clinical interpretability.

The clinical implications of this work include potential improvements in selecting CRT candidates by accurately identifying sLBBB, which best predicts response to CRT, thereby helping select patients with the greatest likelihood of benefit and avoiding unnecessary interventions. More precise classification may also help identify candidates for other cardiac pacing modalities, such as His bundle pacing. While this work remains a retrospective analysis requiring prospective validation, it provides a quantitative and transparent framework that could improve the diagnosis of LBBB and sLBBB, facilitating accurate selection of CRT candidates and promoting personalized, evidence-based medicine.

The submission of this work to the International Journal of Neural Systems is justified by its alignment with the journal’s recent emphasis on explainable AI in healthcare, its connection to advanced neural systems approaches for medical signal analysis, and its contribution to the growing body of research that bridges machine learning innovation with clinical applicability in cardiovascular medicine.

Footnotes

Acknowledgments

This project was funded in part by grant by grants DTS19/00175 and PDC2022-133952-100 funded by the Spanish “Ministerio de Ciencia, Innovación y Universidades” and by the European Union’s Horizon 2020 Research and Innovation Programme MSCA-SE EPISTEAM, and by Programa Iberoamericano de Ciencia/Tecnología para el Desarrollo (CYTED) (Red 225RT0169). This research has been funded by a PhD scholarship from the National Council of Science and Technology (CONICET).

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Javier Garrigós

José Manuel Ferrández

María Paula Bonomini

References

Strauss

Selvester

Wagner

. Defining left bundle branch block in the era of cardiac resynchronization therapy. Am J Cardiol 2011; 107: 927–934.

Zusterzeel

Vicente

Ochoa-Jimenez

, et al. The 43rd international society for computerized electrocardiology ecg initiative for the automated detection of strict left bundle branch block. J Electrocardiol 2018; 51: S25–S30.

Pérez-Riera

Barbosa-Barros

Daminello-Raimundo

, et al. Re-evaluating the electro-vectorcardiographic criteria for left bundle branch block. Ann Noninvas Electrocardiol 2019; 24: e12644.

Ortega

Barja

Logarzo

, et al. Nonselective his bundle pacing with a biphasic waveform: enhancing septal resynchronization. EP Europace 2018; 20: 816–822.

Ortega

Logarzo

Barja

, et al. Novel implant technique for septal pacing: a noninvasive approach to nonselective his bundle pacing. J Electrocardiol 2020; 63: 35–40.

Ansari

Mourad

Qaraqe

, et al. Deep learning for ecg arrhythmia detection and classification: An overview of progress for period 2017–2023. Front Physiol 2023; 14: 1246746.

Martis

Acharya

Adeli

. Current methods in electrocardiogram characterization. Comput Biol Med 2014; 48: 133–149.

Sankari

Adeli

. Heartsaver: A mobile cardiac monitoring system for auto-detection of atrial fibrillation, myocardial infarction and atrio-ventricular block. Comput Biol Med 2011; 41: 211–220.

Amezquita-Sanchez

Valtierra-Rodriguez

Adeli

, et al. A novel wavelet transform-homogeneity model for sudden cardiac death prediction using ecg signals. J Med Syst 2018; 42: 176.

10.

Murugappan

Murugesan

Jerritta

, et al. Sudden cardiac arrest (sca) prediction using ecg morphological features. Arabian J Sci Eng 2021; 46: 947–961.

11.

Ayano

Schwenker

Dufera

, et al. Interpretable machine learning techniques in ecg-based heart disease classification: A systematic review. Diagnostics 2023; 13: 111.

12.

Górriz

Álvarez Illán

Álvarez Marquina

, et al. Computational approaches to explainable artificial intelligence: Advances in theory, applications and trends. Information Fusion 2023; 100: 101945.

13.

Hüpen

Habel

Shymanskaya

, et al. Impulsivity classification using eeg power and explainable machine learning. Inter J Neural Syst 2023; 33: 2350006.

14.

Jiménez-Mesa

Arco

Valentí-Soler

, et al. Using explainable artificial intelligence in the clock drawing test to reveal the cognitive impairment pattern. Inter J Neural Syst 2023; 33: 2350015.

15.

Raeisi

Khazaei

Tamburro

, et al. A class-imbalance aware and explainable spatio-temporal graph attention network for neonatal seizure detection. Inter J Neural Syst 2023; 33: 2350036.

16.

Mercaldo

Di Giammarco

Ravelli

, et al. Alzheimer’s disease evaluation through visual explainability by means of convolutional neural networks. Inter J Neural Syst 2024; 34: 2450007.

17.

Macas

Garrigós

Martínez

, et al. An explainable machine learning system for left bundle branch block detection and classification. Integr Comput Aided Eng 2024; 31: 43–58.

18.

BdC

Macas Ordóñez

Orellana Villavicencio D

Suing Ochoa

, et al. Graph theory and its potential in the automatic detection of left bundle branch block. Integr Comput Aided Eng 2025; 32: 424–442.

19.

Rafiei

Adeli

. A new neural dynamic classification algorithm. IEEE Trans Neural Networks Learn Syst 2017; 28: 3074–3083.

20.

Pereira

Piteri

Souza

, et al. Fema: A finite element machine for fast learning. Neural Comput Appl 2020; 32: 6393–6404.

21.

Alam

Siddique

Adeli

. A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 2020; 32: 8675–8690.

22.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for electroencephalography. IEEE Trans Neural Networks Learn Syst 2024; 35: 1457–1471.

23.

Moss

Brown

Cannom

, et al. Multicenter automatic defibrillator implantation trial-cardiac resynchronization therapy (madit-crt): design and clinical protocol. Ann Noninvas Electrocardiol 2005; 10: 34–43.

24.

Zheng

Guo

Chu

. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0.0). http://physionet.org/content/ecg-arrhythmia/1.0.0/, 2022.

25.

Edenbrandt

Pahlm

. Vectorcardiogram synthesized from a 12-lead ecg: Superiority of the inverse dower matrix. J Electrocardiol 1988; 21: 361–367.

26.

Ledezma

. WTdelineator, 2021. https://github.com/caledezma/WTdelineator.

27.

Daniel

Wood

. Fitting Equations to Data: Computer Analysis of Multifactor Data. New York: John Wiley & Sons, 1971.

28.

Lundberg

Lee

. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S et al. (eds.) Advances in neural information processing systems 30. Curran Associates, Inc., 2017. pp.4765–4774.

29.

Menze

Kelm

Masuch

, et al. A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformat 2009; 10: 213.

30.

Jovic

Brkic

Bogunovic

. A review of feature selection methods with applications. In: 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO), 2015, pp.1200–1205.

31.

Guyon

Elisseeff

. An introduction to variable and feature selection. J Mach Learn Res 2003; 3: 1157–1182.

32.

Ali

. PyCaret: An open source, low-code machine learning library in Python, 2020. https://www.pycaret.org. PyCaret version 1.0.

33.

Grinsztajn

Oyallon

Varoquaux

. Why do tree-based models still outperform deep learning on typical tabular data? In: Proceedings of the 36th conference on neural information processing systems (NeurIPS 2022), 2022, p.–. https://arxiv.org/abs/2207.08815.

34.

Shwartz-Ziv

Armon

. Tabular data: Deep learning is not all you need. arXiv preprint 2021; arXiv:2106.03253.

35.

Borisov

Leemann

Seßler

, et al. Deep neural networks and tabular data: A survey. arXiv preprint 2021; arXiv:2110.01889.

36.

Ribeiro

Singh

Guestrin

. Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 32, 2018.

37.

Ferrandez

Liano

Bonomini

, et al. A customizable multi-channel stimulator for cortical neuroprosthesis. In: 2007 29th Annual international conference of the ieee engineering in medicine and biology society, 2007, pp.4707–4710. DOI: 10.1109/IEMBS.2007.4353390.

38.

Pérez-Riera

Ferreira

Ferreira Filho

, et al. Electrovectorcardiographic diagnosis of left septal fascicular block: Anatomic and clinical considerations. Ann Noninvas Electrocardiol 2011; 16: 196–207.

39.

Grant

Dodge

. Mechanisms of qrs complex prolongation in man—left ventricular conduction disturbances. Am J Med 1956; 20: 834–852.

40.

Smisek

Viscor

Jurak

, et al. Fully automatic detection of strict left bundle branch block. J Electrocardiol 2018; 51: S31–S34.