Abstract
Background:
Multiple sclerosis (MS) is an autoimmune inflammatory disease of the central nervous system characterized by demyelination and neurodegeneration processes. It leads to different clinical courses and degrees of disability that need to be anticipated by the neurologist for personalized therapy. Recently, machine learning (ML) techniques have reached a high level of performance in brain disease diagnosis and/or prognosis, but the decision process of a trained ML system is typically nontransparent. Using brain structural connectivity data, a fully automatic ensemble learning model, augmented with an interpretable model, is proposed for the estimation of MS patients' disability, measured by the Expanded Disability Status Scale (EDSS).
Materials and Methods:
An ensemble of four boosting-based models (GBM, XGBoost, CatBoost, and LightBoost) organized following a stacking generalization scheme was developed using diffusion tensor imaging (DTI)-based structural connectivity data. In addition, an interpretable model based on conditional logistic regression was developed to explain the best performances in terms of white matter (WM) links for three classes of EDSS (low, medium, and high).
Results:
The ensemble model reached excellent level of performance (root mean squared error of 0.92 ± 0.28) compared with single-based models and provided a better EDSS estimation using DTI-based structural connectivity data compared with conventional magnetic resonance imaging measures associated with patient data (age, gender, and disease duration). Used for interpretation of the estimation process, the counterfactual method showed the importance of certain brain networks, corresponding mainly to left hemisphere WM links, connecting the left superior temporal with the left posterior cingulate and the right precuneus gray matter regions, and the interhemispheric WM links constituting the corpus callosum. Also, a better accuracy estimation was found for the high disability class.
Conclusion:
The combination of advanced ML models and sensitive techniques such as DTI-based structural connectivity demonstrated to be useful for the estimation of MS patients' disability and to point out the most important brain WM networks involved in disability.
Impact statement
An ensemble of “boosting” machine learning (ML) models was more performant than single models to estimate disability in multiple sclerosis.
Diffusion tensor imaging (DTI)-based structural connectivity led to better performance than conventional magnetic resonance imaging.
An interpretable model, based on counterfactual perturbation, highlighted the most relevant white matter fiber links for disability estimation.
These findings demonstrated the clinical interest of combining DTI, graph modeling, and ML techniques.
Introduction
Multiple Sclerosis (MS) is an immune-mediated inflammatory disease of the central nervous system, which causes demyelination and neurodegeneration with variable degrees of patients' disability (Ghasemi et al., 2016). While its etiology is still unknown (Polman et al., 2011), MS is the primary cause of nontraumatic neurological disability in young adults (Goodin, 2014). Its natural history includes a large heterogeneity of clinical symptoms and disease courses leading to cognitive deficits and irreversible disability (Lynch et al., 2005). In about 85% of cases, disease onset is characterized by a first acute episode called clinically isolated syndrome (CIS) and/or a relapsing-remitting course (RRMS), followed by a secondary-progressive course (SPMS), while the remaining 15% of patients starts directly from a primary-progressive course (PPMS) (Lublin et al., 2014). The clinical course of the disease and the risk to develop permanent disability are very different from one patient to another. Thus, to propose a personalized medical care and therapy, the neurologist's challenge is to predict the disease evolution based on early clinical, biological, and imaging markers available from disease onset.
The Expanded Disability Status Scale (EDSS) is the main test for quantifying disability and monitoring the disease progression over time (Kurtzke, 1983). Evaluated clinically by the neurologist, EDSS is an ordinal nonlinear rating scale that defines different levels of disability from 1 to 10, reflecting mostly walking and motor disability, and to a lesser extent cognitive impairment. In particular, eight functional systems are evaluated: pyramidal, cerebellar, brainstem, sensory, bowel and bladder, visual, cerebral, and other. These functional systems represent eight different areas of the central nervous system and each ranges from 0 (normal) to 5 or 6 (maximal impairment). These functional system grades, indications of mobility and restrictions in daily life, are used to define the EDSS as an ordinal measurement (Bushnik, 2011). For EDSS scores lower than 3, the disability remains moderate, while it becomes significant, affecting daily activities, for EDSS values of 3 to 4.5. With values over 5, the walking impairment becomes very severe and permanent (Van Munster and Uitdehaag, 2017). However, such neurological evaluation is limited by inter-rater reliability (Meyer-Moock et al., 2014).
MS disability prediction constitutes one of the most relevant aspects due to its impact on the decision therapy (Kuvalekar et al., 2015). Few machine learning (ML) methods have already shown their prognostic value based on conventional magnetic resonance imaging (MRI) (Kolcav et al., 2020, Law et al., 2019, Zhao et al., 2017), defined as measures for assessing inflammation (new and enlarging T2w lesions, gadolinium-enhanced T1w lesions), measures of acute and chronic disease (total T2 volume), and measures of degeneration (global brain and spinal cord atrophy, and chronic black holes on T1w images). Combined, they provide a unique insight into the dynamics of MS lesion development and the long-term pathological consequences on the central nervous system (Traboulsee et al., 2005). Tousignant et al. (2019) proposed an automatic end-to-end deep learning framework, based on parallel convolutional pathways, using multicenter MRI data images obtained in RRMS patients. Washimkar and Chede (2015) also used conventional MRI data images to identify lesions responsible for cognitive decline and physical disability. However, the poor correlation between inflammatory markers such as lesion load and patients' disability remains an issue (Declan and Trip, 2017).
More advanced MRI techniques, such as diffusion tensor imaging (DTI), was developed to provide a better characterization of the brain damages in MS, based on its great sensitivity to detect microarchitectural alterations in normal appearing white matter (WM) tissue (Jütten et al., 2019). Furthermore, new modeling approaches have been proposed to characterize the brain network organization by means of graph theory (Guo et al., 2017, Yong and Alan, 2010). These graph models consist of nodes, based on the parcellation of brain gray matter (GM) regions, and edges, determined by the underlying links between the network nodes. In brain structural connectivity, these links are defined by the extraction of WM fibers using DTI tractography (Basser et al., 2000). These new approaches laid the foundation for a better characterization of microarchitectural damages in MS (Bakshi et al., 2008). Such brain structural connectivity data were also used for ML techniques and provided excellent performances for the classification of MS clinical courses (Kocevar et al., 2016; Marzullo et al., 2019a) and the estimation of EDSS (Marzullo et al., 2019b). Nevertheless, the ensemble-based approach, where several models are combined together, demonstrated to be even more efficient compared with single-based models (Liu et al., 2004; Tsoumakas and Vlahavas, 2007). However, one of the drawbacks of ML models is their lack of transparency. Thus, several approaches were recently proposed (Gilpin et al., 2018) for model interpretability, first, to evaluate the accuracy of the decision strategy, and second, to understand the biological basis of the best results. This information is crucial for a better understanding of the pathological processes, their potential therapy, and to convey more trust in the results (Witten and Eibe, 2020).
In this work, we propose first a meta-learning approach, composed of four boosting base models organized in a stacking generalization scheme, to perform a cross-sectional estimation of disability (EDSS) in MS patients of different clinical courses. Second, a global and local interpretability model is developed to pin-down the brain connections that are most likely responsible for explaining different degrees of disability. The proposed approach benefits from both the sensitivity of DTI to detect microscopic WM alterations and the brain network connectivity model to point out disruptions of fiber links related to disability. In our knowledge, this is the first attempt to apply interpretable ML techniques to correlate brain structural connectivity with MS disability.
Materials and Methods
MRI acquisition
MRI examination was performed on a 1.5T Siemens Sonata system (Siemens Medical Solution, Erlangen, Germany) using an 8-channel head-coil. The conventional MRI protocol consisted of 3D T1-weighted MPRAGE (magnetization prepared rapid gradient echo) sequence (repetition time/echo time/time for inversion (TR/TE/TI) = 1970/3.93/1100 ms, flip angle = 15°, field of view (FOV) = 256 × 256 mm, voxel size = 1 × 1 × 1 mm3, acquisition time = 4.62 min) and a FLAIR (fluid attenuated inversion recovery) sequence (TR/TE/TI = 8000/105/2200 ms, flip angle = 150°, matrix size = 192 × 256, FOV = 240 × 240 mm, slice thickness = 3 mm, voxel size = 0.9 × 0.9 × 3 mm, acquisition time = 4.57 min). For DTI, an axial 2D spin-echo echo planar sequence was used (TE/TR = 86/6900 ms; 2 × 24 directions of diffusion gradient; b = 1000 s·mm−2; spatial resolution of 2.5 × 2.5 × 2.5 mm3).
Ninety MS patients distributed in four clinical courses (12 CIS, 30 RRMS, 20 PPMS, 28 SPMS) were recruited, following the (McDonald et al., 2001) diagnosis criteria, in the MS clinic of Lyon Neurological Hospital. Also, 15 healthy controls (HC) subjects, age- and gender-matched with MS patients, were included. Each patient received a clinical visit and an MRI examination, longitudinally, every 6 months during the first 3 years and then every year for the following 4 years. Patient information is provided in Table 1. A total of 528 scans were used in this study.
Data Set Information by Multiple Sclerosis Clinical Profiles (HC, CIS, RR, SP, PP)
Data set information partition by MS clinical profiles. Age and DD are reported as average of years (standard deviation in parenthesis). The median was considered for the EDSS score (range in parenthesis). Time between scans was reported in days and the average was considered (standard deviation in parenthesis).
CIS, clinically isolated syndrome; DD, disease duration; EDSS, Expanded Disability Status Scale; HC, healthy controls; MS, multiple sclerosis; PP primary progressive; RR, relapsing remitting; SP, secondary progressive.
To offer a feasible interpretation of the model behavior, the output space (EDSS score) was partitioned into three levels of disability (low [1–3], medium (3–4.5], and high (4.5–7]) defined by our neurologist (F.D.D.) in agreement with the literature (Kurtzke, 1983). In Table 2, the information of the data set partitioned by EDSS classes is also provided.
Information on the Data Set by EDSS Groups (Low—Medium—High)
Data set information partition by disability classes (low, medium, high). Age and DD are reported as average of years (standard deviation in parenthesis). Time between scans was reported in days and the average was considered (range in parenthesis).
EDSS, Expanded Disability status scale.
The proportion (%) of MS phenotypes in the three disability groups was distributed as follows: low: 100 HC and CIS, 58 RR; medium: 39 RR, 60 SP, and 58 PP; high: 40 SP and 36 PP. Inclusion criteria specified that excluded from this study are patients with contraindications to MRI (cardiac valve not compatible with MRI, claustrophobia, etc.), pregnant patients, patients requiring premedication for the examination, and patients with severe associated pathology or neurological disease other than MS, or any disabling condition that may interfere with the measurement of clinical disability and brain atrophy. Highly disabled patients with EDSS over 7 were also excluded from the study to select patients with less brain damages. This study was approved by the local ethics committee (CPP Sud-Est IV n° 2004-085) and the French national agency for medicine and health products safety (ANSM). Written informed consents were obtained from all patients before study initiation.
Image and graph processing
First, brain segmentation was performed on 3D T1-weighted images. Each voxel was classified into four classes: WM, cortical GM, subcortical GM, and cerebrospinal fluid using the FreeSurfer image analysis suite (Reuter et al., 2012). From GM maps, 84 regions (nodes) were identified based on the atlas Desikan for parcellation (Desikan et al., 2006) applied in common space. Second, DTI images were processed using the FMRIB Software Library (FSL; Jenkinson et al., 2012). Eddy current correction and nonbrain voxel stripping were performed as preprocessing steps. In each voxel of WM maps, the diffusion orientation distribution function (dODF) was computed using MRtrix software (Tournier et al., 2012). The maximum spherical harmonics of order-k was calculated by
Model conception
A stacked ensemble model, composed of multiple gradient boosting algorithms, was developed to estimate the disability status of MS patients, using their structural connectome. A thorough explanation of each phase of the workflow is given in the following section. The ANOVA statistical test of hypothesis was used to evaluate statistical differences (α < 0.05) in the estimation error between MS disability classes (low, medium, and high). In addition, to demonstrate the superiority of our approach, a comparison between the connectome data and conventional MRI measures was performed using three additional controlling factors (age, gender, and disease duration). Conventional MRI measures included WM T2w-lesion volume, lateral ventricle volume, cortical and subcortical GM volume, black hole T1w-lesion volume, obtained from T1w and FLAIR images using the MSmetrix software (Jain et al., 2015) developed by icometrix (Leuven, Belgium). Besides, our model included the development of an interpretability model based on the counterfactual concept and a conditional logistic regression. Figure 1 depicts the complete workflow of our model.

Complete model pipeline. The left box represents the training process of the stacked ensemble model, while the right box represents the interpretability model. (1) Starting from original data (WM connectivity matrix), the stacked ensemble model is trained to estimate the EDSS score. Notice that the original data set is only used for training and validating the model. (2) For each holdout set during the cross-validation, the original data set is perturbed to generate valid counterfactual WM connectivity matrix samples. (3) The perturbed (counterfactual) sample is used to probe the trained ensemble model and a corresponding counterfactual EDSS score is obtained (y_perturbed). (4) A statistical analysis based on the conditional logistic regression model is performed to estimate the β coefficients describing the importance of each link in the WM connectivity matrix of the generated counterfactual sample. (5) Conditioning on the disability class (low—medium—high), the β coefficients are averaged to obtain the final score of importance. EDSS, Expanded Disability Status Scale. WM, white matter.
The ensemble model is trained using the brain structural connectome data set in its adjacency matrix form (referred to as “original data” in Fig. 1). As shown in Figure 1, the green box encloses the training process for which an estimation of the EDSS score is obtained. Once the training process is completed, the final ensemble model is ready to be interpreted by the statistical model. The red box encloses the interpretability model. For each holdout sample, a graph perturbation, based on conditional probability distribution, is performed to obtain valid counterfactual samples. The perturbed (counterfactual) data set is used as input for the trained ensemble model to estimate the counterfactual EDSS score (
Stacking generalization model for EDSS estimation
An ensemble model is an advanced ML approach where multiple single-base models are assembled to better predict an outcome. As long as the base models are diverse and independent, the variance of the prediction error decreases when the ensemble approach is used (Kotu, 2015). To combine the base models together, a stacked generalization scheme was applied. Basically, a meta-learning algorithm is built to learn how best to combine the predictions from multiple single-base ML algorithms. In our implementation, linear regression was used as the higher level model (Wolpert, 1992). The ensemble model is composed of four different gradient boosting algorithms. ML boosting techniques use subsequent sets of weak learners added together such that the residual error obtained from the previous learner is reduced. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. Gradient boosting exploits differentiable loss functions by using gradient descent approximation to minimize the objective function when adding subsequent learners. Moreover, the function that estimates
where K represents the number of iterations,
where
while the loss function is defined as the squared-error L2
loss:
Most often, a regularization term is added to the final loss function to avoid overfitting. In this context, the minimization objective becomes:
where
where T defines the number of leaves in the tree and
Model interpretability
Following the first process of disability estimation, a counterfactual model was implemented to uncover the causal link between brain connections and disability. A counterfactual explanation describes a causal situation in which, if a specific event X had not occurred, a consequent event Y would not occur. In practice, counterfactual thinking requires a hypothetical situation that has in fact never happened. In other words, it provides a what-if explanation for model output. In this work, a counterfactual-based model is proposed with the aim of explaining the global and local behaviors of a general trained black-box model. This approach aims at investigating first, which features (graph connections) are more relevant for disability estimation (global), and second, how this role is varying with the level of disability (local). In other words, a local explanation is obtained by looking for features that explain a confined part of the output space. Similar to the global analysis, a local counterfactual explanation of a prediction can be interpreted as the smallest change to the feature values (hereafter feature perturbation) needed to obtain the output of interest.
Global interpretability
Global interpretability is first obtained by identifying the smallest number of important features that best explain the output prediction. For this purpose, we exploited the characteristics of the tree-based boosting methods, where the importance was calculated for each feature, at a single-tree level and averaged across all the trees of the ensemble (Hastie et al., 2009) as specified in the following equation:
where
In this formulation,
This approach is repeated for each boosting-based learner in the meta-model. For each base learner, only the features with a score of importance greater than the fifth percentile of not-null feature values are selected. Finally, to define a unique set of features, the union of all base-learner sets is considered. A threshold of importance, obtained through Equation (7), was set up to a value of 10%, corresponding to the exponential increase of the cumulative relative importance.
Local interpretability
A local counterfactual method refers to the technique that can explain the prediction results of the ML model depending on certain conditions. Three requirements need to be met:
1. prediction obtained from counterfactual should be as close as possible to the objective output;
2. counterfactual should be as similar as possible to the original instance, which translates in changing as few features as possible; and
3. counterfactual instance should have feature values that are likely.
For each disability level, a truncated normal distribution is assumed as described in Equation (8)
if
To satisfy the requirements for good counterfactuals, a conditional probability distribution was considered for applying perturbations. The mathematical formulation is given in Equation (9).
where
which implies that the number of connections between x 1-th and the x 2-th percentiles is equally likely.
The goal is to perturb the lowest number of edges to increase (or decrease) the likelihood of a subject to make him/her more (or less) likely to pertain to a certain disability class of interest. This process ensures the first two requirements for a good counterfactual. To satisfy the third requirement, the conditional probability distribution was used to perturb the original input. This corroborates the idea of a highly likely counterfactual since it lies inside the manifold of the true data. An iterative process was then applied where, for each subject, one thousand counterfactuals were generated conditioning on the disability class. The causal effect of each edge onto the probability of pertaining to a predefined class of disability was obtained by conditional logistic regression [Eq. (10)].
where
Parameter setting and model evaluation
The model performance was evaluated using a 10-fold cross-validation strategy. At each cycle, nine folds were used for training and validation, using an 80–20% split strategy (nested cross-validation), determining the best hyperparameters for each base model by using a grid-search method, and the last fold was holdout for testing the performance of the final model. For each base model, only a limited number of parameters have been tuned. For all the boosting models, 300 trees were used except for the XGBoost model for which 500 trees were used. The learning rate was imposed to 0.2, while the maximum depth parameter was imposed to 3 for all the models, but for XGBoost 5 was used. All other parameters were left unchanged at their default values. Concerning the counterfactual model, the parameter
Results were evaluated by calculating the root mean squared error (RMSE) [Eq. (11)]:
which measures the weighted average root squared difference between the estimated values and the measured value. Moreover, to avoid possible misinterpretations of the true predicting power due to outliers' errors, the mean absolute deviation error (MADE) was also considered [Eq. (12)]:
In both cases, the parameter
Results
Stacking generalization model performance
The evaluation of the models' performances was assessed on both single and ensemble models using two types of data: the connectome-based data on one hand and on the other hand, conventional data including conventional MRI measures and patients' information (age, gender, and disease duration) (Table 3).
Ablation Study: Performance Comparison Between Single-Base Models and Stacking Ensemble
Performance comparison between single-base models and the stacked ensemble model. The values in the table report the average RMSE score (standard deviation in parenthesis) calculated over the same 10 folds of the cross-validation. LR was also considered for comparison. The first line of the table (Graphs) reports the performance comparison based on the connectome data. In the second line of the table (MRI+demographics), eight clinical measures were used for estimating EDSS (T2w lesion volume, T1w black hole volume, cortical GM volume, deep GM volume, lateral ventricles, age, sex, and disease duration). Last line reports the p-value comparison using the nonparametric Wilcoxon Mann–Whitney test.
GBM, gradient boosting machine; GM, gray matter; LR, linear regression; MRI, magnetic resonance imaging; RMSE, root mean squared error.
When comparing the two types of data, the performances were significantly different for all models with the exception of linear regression (p > 0.05), demonstrating the superiority of the DTI connectome data compared with the conventional data. Second, the stacking generalization model showed its superiority with respect to all single-base models by an overall RMSE of 0.92 (±0.28) and a MADE score equal to 0.8 (±0.7). This later suggested that a longer tail on the right-hand side of the error distribution was present. A more detailed performance analysis of the ensemble model with graph data was conducted along with the disability levels. As shown in Figure 2, the error distribution was reported for low, medium, and high disability classes.

Box plot of the MADE by class of disability. MADE, mean absolute deviation error.
A longer tail can be detected, especially for patients with low disability level, suggesting a more difficult estimation in this range of EDSS. Conversely, a lower estimation error was obtained for the higher level of disability (EDSS of 4.5–7). Significant differences between EDSS groups were observed. Figure 3 depicts the error distribution by MS clinical phenotypes. Coherently with what is shown in Figure 2, a higher estimation error is observed for the early stage of the disease, namely the CIS group.

Box plot of the MADE by class of multiple sclerosis phenotypes.
Figure 4 depicts the average EDSS for the measured and estimated disability score obtained from the stacking generalization model.

Comparison between measured (red) and estimated (blue) EDSS score. The two lines are obtained by, first, sorting the measured and estimated EDSS scores with respect to the test set for each of the 10 folds of the cross-validation. Then, for each sample, the average measured and estimated scores are computed obtaining an average measured and estimated distribution.
The two lines were obtained by computing the average measured and estimated scores by sorting them with respect to the holdout set for each of the 10 folds of the cross-validation. In Figure 4, we can notice that the two averaged (and standard deviations) curves (measured and estimated) are overlapping.
Model interpretability
The stacking generalization model was completed by an interpretability model to provide global and local explanations of the estimation process. Figure 5 reports the exponential increase of the cumulative relative importance of brain connections, while the red line defines the imposed threshold.

Global interpretability: sorted relative importance of brain connections expressed by indexed (idx) edges for readability purpose.
First, the global interpretability helped to identify brain connections playing an important role in disability estimation. These connections included frontal, temporal, and parietal cortical GM regions as well as subcortical GM regions. In particular, the highest number of connections corresponded to frontal/frontal, frontal/parietal, temporal/temporal, and temporal/parietal links. A visual representation of these connections is shown in Figure 6.

Global interpretability. Connectome representation of the most important edges for EDSS estimation. Nodes with high importance are depicted in a dark, while nodes with lower importance are depicted in white gray. The thickness of the connections lines represents edge importance.
Second, local interpretability was analyzed based on disability levels. Links with importance higher than the 95th percentile threshold were selected for visual purposes (Table 4). The left bank superior temporal sulcus represented the most important connection for the EDSS estimation in all of the three disability classes. Considering a threshold of importance of 0.5, in two EDSS classes (low and medium) out of the three, numerous WM links between GM regions were highlighted, including the left bank superior temporal sulcus with the left posterior cingulate and the right precuneus, confirming the importance of these connections for disability-level discrimination.
Brain Area Connection Importance by Class of Disability
Most Important Brain Connections and Relative Importance for Three Classes of Expanded Disability Status Scale Score: Low ([1–3]), Medium ([3–4.5]), and High ([4.5–7]). In gray, the interhemispheric connections are highlighted, while in bold, the reference maximal scores of importance are emphasized.
To better characterize the connections playing an important role in the estimation process, the average number of WM fibers was calculated (Table 5) and the number of fibers participating in the estimation of each disability level was compared. Thirteen out of 19 connections resulted to be significantly different, pointing to a severe reduction in fibers' number in relation to an increased level of disability. This finding confirmed that the axonal neurodegeneration measured by DTI is directly related to the disability progression. For this reason, the ANOVA statistical test of hypothesis was repeated considering all connections for a baseline comparison. Only 10% and 7% of the total WM fibers resulted statistically significant at a 5% and 1% confidence level, respectively, confirming that the model focused on the most damaged WM fibers for estimating disability.
Brain Area Number of Connections by Class of Disability
Mean number of white matter fiber connections for the 95th percentile most important edges divided into three classes of EDSS score: low ([1–3]), medium ([3-4.5]), and high ([4.5–7]). In gray, the interhemispheric connections are highlighted, while in bold, statistically significant deteriorations are emphasized by means of one-way ANOVA test of hypothesis (*10%, **5%, ***1% confidence level).
Finally, two observations are highlighted in Tables 4 and 5. First, by considering the 95th percentile of the most important brain connections, 78% are localized in the left hemisphere. Second, interhemispheric connectivity is well represented, amounting to 45% over the total connectivity. For comparison, the overall percentage average of interhemispheric connections, calculated over all the MS patients, amounts to 14%.
Discussion
MS is a neurological disease for which symptoms and severity can vary widely from patient to patient and progress at different rates. While the progression is rather slow and variable in RRMS patients, PPMS and SPMS patients are subjected to a progressive increase of disability, starting from the onset or later, respectively. To investigate the relationship between brain connectivity and disability, an ensemble of boosting methods was developed to estimate EDSS.
First, our method provided excellent performances (RMSE 0.9 ± 0.09) and outperformed the results obtained previously by Marzullo et al. (2019b), for which a convolutional neural network was considered (RMSE 1.08 ± 0.09).
Second, these findings demonstrated that DTI-based structural connectivity data represent an interesting alternative for disability estimation to conventional data including conventional MRI measures, such as the lesion load or tissue atrophy, with patient information such as age, gender, and disease duration. Indeed, it is known that lesion load is moderately correlated with the clinical status of the patient (Barkhof, 2002). For the characterization of microstructural WM alterations in MS patients, DTI is known to be more sensitive than conventional MRI as well as other patient information. Thus, the rapid technical improvements of MRI should facilitate the clinical accessibility of DTI acquisition, which is more cumbersome to realize in clinical routine.
Third, when observing the error distribution, a clear difference in performance, between low and high disability status, was found (F-score: 19.54). This is expected since higher levels of disability correspond to more degenerated WM fiber links, which are easily detectable by the model, based on the sensitivity of the DTI-based structural connectome. Also, the evaluation of low EDSS values is known to be cumbersome (Ebers, 2006) due to the highly variable clinical symptoms, which results in less accuracy. Indeed, a minimum of 1.0 U difference in EDSS is needed to define a significant clinical change (Goldman et al., 2010). Thus, automated ML methods may overcome this limitation by providing a more reliable estimation of disability based on the sensitivity of brain structural connectivity data. Moreover, by appropriately customizing the loss function to be optimized by the meta-learning model, the skewness of the EDSS can be taken into account and the optimization focus might be shifted toward lower scores. Indeed, the comparison between the holdout test estimation and the true EDSS score provided a visual evaluation of the goodness of fit, with standard deviations overlapping in most of the EDSS ranges (Fig. 4). This observation confirmed that our model was able to reach a high level of performance along the entire estimation range. Notwithstanding, the proposed approach is not intended to replace the current practice but rather to provide an additional tool for the neurologist to perform a more precise evaluation of a patient's status.
On top of the stacking ensemble model, our approach provided a global and local interpretability model to better identify the brain WM connections leading to predictive model performance. The global analysis highlighted the importance of large WM connections linking the frontal, temporal, and parietal GM regions. In particular, brain links between the frontal/frontal, frontal/parietal, temporal/temporal, and temporal/parietal GM regions correlate with the main brain networks usually damaged in MS patients (Andica et al., 2019; Bonavita et al., 2013; Han et al., 2017) and confirmed the involvement of these areas in the WM degeneration of MS patients (Curral et al., 2012; He et al., 2009). In addition, these regions may be predictive for discriminating MS phenotypes, especially between RRMS and SPMS patients, which represents one of the most interesting and difficult tasks for the automatic classification of MS profiles. In fact, previous studies highlighted that both global and regional cortical atrophies were observed to be more prominent in the left hemisphere relative to the right hemisphere (Narayana et al., 2012). Moreover, reduced mean cortical thickness, particularly in the frontal and temporal regions, may be a predictor of epilepsy in RRMS patients (Calabrese et al., 2012). Also, a significant reduction in GM thickness was observed in progressive stages of the MS disease, particularly in the parietal and precentral cortex (Chen et al., 2004). These results are coherent with our findings since more than half of our patients followed a progressive course.
Aside from the global analysis, the local interpretation focused on the importance of certain links depending on the level of disability. The main fiber links implied are connecting the left banks of superior temporal sulcus and the left caudal anterior cingulate GM regions. Indeed, the left bank thickness has been found to correlate with cognitive performance and motor skills in MS patients (Achiron et al., 2013; Pan et al., 2001), and the thickness reduction of the anterior cingulate was associated with loss in verbal and figural fluency as well as with hypoenergetic cognitive functioning in MS patients (Geisseler et al., 2016; Parisi et al., 2014).
In summary, the local analysis demonstrated the importance of the left and interhemispheric connections for the disability estimation process, reflecting more severe damages in the left hemisphere (Preziosa et al., 2017). These findings are concordant with the hypothesis that brain hemispheres can have different susceptibilities to damage accumulation (Thompson et al., 2003), particularly the left hemisphere, which is dominant for both handedness and language functions. This finding was already observed in neurodegenerative diseases such as Alzheimer's (Filippi et al., 1995). The importance of interhemispheric connections for disability estimation confirmed also that the corpus callosum is significantly affected in MS (Evangelou et al., 2000; Ozturk et al., 2010).
Notwithstanding, this study has few limitations. First, on the methodological aspects, the connectome analysis was performed using the Desikan atlas, which provides a gross parcellation of brain GM. In fact, large areas of the GM are considered for parcellation, which limits the interpretability of our results to specific WM tracts. However, the use of a more detailed parcellation atlas such as Glaser atlas will significantly increase the dimensionality of the problem, providing a higher chance of overfitting the training set. To prevent such limit, one should increase the size of the data set and/or exploit some a priori knowledge to preselect the most important GM regions.
Second, inflammation and neurodegeneration mechanisms are known to occur differently among the MS clinical courses, inflammation being predominant in the early stages of the disease (CIS and RRMS), while neurodegeneration (axonal and neuronal loss) mainly affects progressive courses (PPMS and SPMS). Thus, the distinction between these two main pathological mechanisms of MS would be of great interest to better understand the relationship between brain connectivity and patient disability. However, the EDSS distribution of our MS patients, both early stages and progressive courses, was very skewed toward an EDSS score of 4, due to the inclusion criteria. Unfortunately, this limitation prevented us to further analyze the differences between these two classifications.
Third, the EDSS score is mainly reliant on physical disability and in particular lower limb function. Since the MS clinical population is in general heterogeneous, not all patients may present predominantly motor symptoms, but rather so-called soft symptoms, such as fatigue, cognitive decline, and mood/anxiety disturbance, in agreement with the most important connections detected in this study, linking GM areas involved in higher order cognitive rather than movement-related functions. Thus, the EDSS may not be an absolute reliable measure of disability for the whole population of patients with MS, especially those with mild severity (Meyer-Moock et al., 2014; Saccà et al., 2017). For this purpose, the multiple sclerosis functional composite (MSFC) score, which takes into account other functional and cognitive capacities, is also performed and might better reflect the global disability of MS patients, especially those with moderate symptoms. Notwithstanding, the EDSS score was considered in this work due to its greater use in clinical practice (Amato and Ponziani 1999). Finally, it is well known that the spinal cord lesions have a significant role in patient disability, and thus, the study of only brain data constitutes probably a partial analysis of the issue.
Conclusion
In this work, we demonstrated the capability of our model to reach a good level of performance for disability estimation using DTI-based connectome data. To better understand the model decision pathway and identify its neural basis, a counterfactual model was implemented considering three levels of disability. A clear left hemispheric lateralization of WM fiber link degeneration, as well as interhemispheric connections, such as the corpus callosum, was detected and deemed to be relevant by our interpretability model along with frontal, temporal, and parietal GM regions.
As future work, we plan to extend the model by incorporating morphometric GM features and repeat the analysis considering other disability scales such as the MSFC score. Finally, we aim to extend this work by applying our model directly on large WM fiber bundles to precisely pin down the WM connections directly involved in the MS disability process.
Footnotes
Authors' Contributions
B.B. performed the analysis and wrote the article. A.M., C.S., and D.S.-M. supervised the writing of the article. C.S. performed the preprocessing. D.S.-M. and F.D.-D. coordinated the study and performed the data acquisitions.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study is funded by the following projects: European Research Council within the grant 813120-2018 of Marie Skłodowska-Curie Innovative Training Networks (ITN) of the Horizon 2020 through the INSPiRE-MED project. French National Research Agency (ANR) within the national program “Investissements d'Avenir” through the OFSEP project (ANR-10-COHO-002).
