Abstract
Background:
Mild cognitive impairment (MCI) patients are at a high risk of developing Alzheimer’s disease and related dementias (ADRD) at an estimated annual rate above 10%. It is clinically and practically important to accurately predict MCI-to-dementia conversion time.
Objective:
It is clinically and practically important to accurately predict MCI-to-dementia conversion time by using easily available clinical data.
Methods:
The dementia diagnosis often falls between two clinical visits, and such survival outcome is known as interval-censored data. We utilized the semi-parametric model and the random forest model for interval-censored data in conjunction with a variable selection approach to select important measures for predicting the conversion time from MCI to dementia. Two large AD cohort data sets were used to build, validate, and test the predictive model.
Results:
We found that the semi-parametric model can improve the prediction of the conversion time for patients with MCI-to-dementia conversion, and it also has good predictive performance for all patients.
Conclusions:
Interval-censored data should be analyzed by using the models that were developed for interval- censored data to improve the model performance.
Keywords
INTRODUCTION
Alzheimer’s disease (AD) is a progressive neurological disorder predominantly impacting cognitive abilities such as memory, executive function, and behavior.1,2, 1,2 It stands as the leading cause of dementia among the elderly.3,4, 3,4 Mild cognitive impairment (MCI) patients are at a higher risk of developing dementia than healthy normal controls in later years. Among MCI patients, some of them may remain stable for a long time while more than 10% MCI patients will progress to dementia annually. 5 The early identification of MCI patients who are at risk of conversion to dementia (MCI-to-dementia conversion), along with the proper estimation of the potential timeline for this progression, is critically significant. Such knowledge is instrumental in shaping clinical decision-making, helping determine the most beneficial interventions for specific patients, and assisting families in planning for future care if needed.6,7, 6,7
Current research in predicting dementia encounters several challenges. First, numerous models depend on sophisticated biomarkers obtained from positron emission tomography (PET) scans or cerebrospinal fluid (CSF) analyses. These are not widely accessible due to their cost, invasiveness, and the specialized setting required for their use.8,9, 8,9 The predictive performance by using blood biomarker data only is not as good as the model by using clinical data. The predictive performance may be improved by adding the blood biomarker data to the model with clinical data, although the improvement is small. 10 Consequently, the utilization of easily obtainable clinical data is crucial, as it provides wider and more immediate applicability. Second, most studies concentrate on classification models which only take binary outcomes (either MCI-to-dementia conversion or not). The traditional logistic regression often has worse predictive performance than the machine learning methods. 11 The conversion from MCI to dementia often spans several years, sometimes decades, with many individuals not reaching a dementia diagnosis before the end of a study. 12 These models do not effectively utilize the MCI- to-dementia conversion data and the censoring information. 13 Several studies applied traditional time-to-event statistical methods based on survival models for right-censored data.13 - 15 In fact, the exact MCI-to- dementia conversion time is unknown. We only observe the time interval of the dementia onset event. This type of censoring data is known as interval-censored data. The statistical methods for right-censored data are not applicable to interval-censored data since the time to the event cannot be directly observed, and event times are often poorly estimated.16,17, 16,17 Misuse of regression methods may lead to biased estimates and invalid statistical inferences.
Several regression analysis methods for interval-censored data were developed in the literature including the Cox proportional hazards (PH) model.18 - 26 A general and flexible modeling framework referred to be as the transformation models which include the Cox model as a special case was developed to relax the restrictive assumption of the Cox model.27 - 30 The penalized spline method was used to analyze interval-censored data to reduce the bias of parameter estimation. 30
In addition to these semi-parametric regression models, one may also consider machine learning and deep learning methods for interval-censored data. Cho et al. proposed interval-censored recursive forests as a solution to split bias arising from discrepancy among survival probabilities in traditional tree-based techniques. 31 This approach refines survival estimates through an iterative process. 32 Yao et al. developed the conditional inference forest model which is a random forest approach based on the weighted Kaplan-Meier estimate.33 - 36 As compared to the traditional random survival forests that treat all terminal nodes with equal weights, the conditional inference forest assigns large weights to terminal nodes with a substantial number of subjects at risk.33,37 - 39, 33,37 - 39 Sun and Ding introduced an innovative neural network designed for interval-censored data. 17 Their neural network leverages Bernstein polynomials to address the challenge of approximating the baseline cumulative hazard function and covariate effects. These methods (e.g., random forest) flexibly handle complex interactions and non-linear relationships, enabling greater adaptability in modeling, but they remain a challenge to obtain the direct relationship between the outcome of interest (e.g., MCI-to-dementia conversion) and each measurement.
METHODS
Data sets
We utilized two data sets in this project: 1) the National Alzheimer’s Coordinating Center (NACC) downloaded on the date of December 21, 2023, and 2) the AD Neuroimaging Initiative (ADNI) downloaded on the date of January 15, 2024. From both data sets, we selected patients who were diagnosed with MCI at baseline. There are three versions of NACC data spanning from 2005. We combined the version 1 and version 2 of the NACC data (NACC-v1v2) as the training data with patients from 2005 to 2015. The third version (referred to be as NACC-v3) was used as the validation data set with patients from March 2015 in the NACC study. The ADNI MCI cohort was used as the testing data set.
We prepared the data sets by using a similar approach as these in the literature. 14 For example, any codes representing missing values (e.g., -4) were all transformed into NA, and variables that had more than 50% missing values were removed from the analysis. We utilized the first visit as the baseline date. For MCI-to-dementia conversion patients, that event was recorded in the study. Suppose T i is the MCI-to-dementia conversion time from baseline and (L i , R i ] is the time interval for the dementia onset event for the i-th patient, i = 1, 2, ⋯⋯, N, where N is the total number of participants in a study. For interval-censored data, R i is the time of the MCI-to- dementia conversion visit from baseline, and L i is the time of the visit right before the conversion from baseline. It is only known that the dementia onset time T i is between L i and R i . For MCI stable patients who remained stable as MCI at the end of a study, they are considered as right-censored with R i as ∞ and L i as the time of the last visit from baseline.
Statistical models for interval-censored data
We considered two models for interval-censored data: the Cox PH model (Cox-I) and the random forest method based on the conditional inference approach (RF-I) by Yao et al.33,40, 33,40 As compared to the RF-I model, the Cox-I model has the PH assumption. But, the Cox-I model provides hazard ratios for each measure, therefore improving our ability to interpret the relationship between the survival outcome and each measure. The Cox- I model and the RF-I model can be implemented by using the R packages icenReg and ICcforest, respectively.33,41, 33,41
We also included the random forest model for right-censored data (RF-R) from the R package randomForestSRC in the model comparison as it was frequently used in the literature although it may not be proper here for interval- censored data. When data were treated as right-censored, we only utilized the data of (L i = 0, R i ] for MCI patients who reported the dementia onset during a visit where R i was the time of that visit with the dementia onset report from baseline. For MCI stable patients, data used in the right-censored model was the same as that in the interval-censored model. It can be seen that the interval-censored models have more detailed information on L i for patients with MCI-to-dementia conversion. One goal of this research is to gain the knowledge of the model prediction improvement by analyzing the interval-censored data properly.
Model evaluation performance metrics
We calculated the following three performance metrics to compare different models: 1) the integrated Brier score (IBS), 2) Pwithin: the proportion of the predicted medium survival time within the observed time interval, and 3) Pbefore: the proportion of the predicted medium survival time before the MCI-to-dementia conversion. For the value of Pbefore, we will only calculate that proportion for MCI-to-dementia conversion patients, as there is no observed dementia conversion time for MCI stable patients.
The value of Pwithin can be calculated for all patients, MCI-to-dementia conversion patients, and MCI stable patients, as Pwithin(all), Pwithin(MCI-to-dementia), Pwithin(MCI-stable), respectively. The value of Pwithin(MCI- to-dementia) was calculated from the sub-MCI populations who had dementia onset event, as the proportion of the predicted medium survival time being within the observed time interval, while Pwithin(MCI-stable) can be computed similarly from the MCI stable patients. A model with large values of Pwithin and Pbefore is preferable.
The IBS is a weighted squared distance between the estimated survival function and the empirical survival function as the overall model performance. Suppose S (t|Z i ) is the survival function given measures Z i from the i-th patient. Let tmax be the largest observed follow-up time of all MCI patients. Then, the IBS is defined as
The medium survival time can be directly obtained from the fitted two interval-censored models (Cox-I and RF-I). For the RF-R model, the survival probabilities at each possible time point were provided. We calculated the median survival time as the earliest time such that the survival probability is below 50%.
Variable selection in interval-censored statistical processing
The NACC-v1v2 data set was used to build the predictive model based on the Cox-I model which provides the coefficient estimates of the relationship between the outcome (e.g., MCI-to-dementia conversion) and each measure such as age, and clinical dementia rating- sum of boxes (CDR-SB). In the NACC-v1v2 data set, there were 211 variables after removing the categorical variables whose highest frequency is above 97% to avoid model fitting issues. We also removed the measures from the clinician judgment of symptoms form as these measures are often not available in other studies. After that, we had 195 measures as the initial variables. These variables are from demographic data (e.g., age), physical measures (e.g., BMI), genetic data (Apolipoprotein E ɛ4), neuropsychological battery scores including sub-scales (e.g., Trail-B score, MMSE, CDR-SB), and functional activities questionnaire (FAQ) data. The FAQ assesses instrumental activities of daily living (IADLs), which require more cognitive ability than basic daily tasks. 42
After data cleaning, there were 3,529 MCI patients, with 1,453 MCI stable patients and 2,076 MCI-to-dementia patients. We excluded the MCI patients who had multiple conversions during the follow-up visits, such as MCI- dementia-MCI. We then performed the following steps to determine the measures in the final predictive model. 9
Step 1: Perform the Cox-I model with the MCI-to-dementia conversion time as the interval-censored outcome and each measure as a covariate to calculate the log-likelihood of each model. A model with a higher log-likelihood is better. These measures are ranked by the log-likelihood values from the largest to the smallest. The measures from the clinician judgment of symptoms are often not available in other studies. For that reason, these measures are re-ranked to the bottom of the list.
Step 2: From the fitted Cox-I model, if the direction of the estimated coefficient is not as expected, such measures are moved to the bottom of the list. The first 30 variables with the largest log-likelihood values are selected in the following forward selection method.
Step 3: The first measure with the largest log-likelihood value among the 30 measures from Step 2 is selected as X1.
Step 4: We add each of the remaining 29 measures (X i , i = 2, 3,⋯⋯, 30) to the model already having X1. Suppose the model with X1 and X i as covariates has the largest log-likelihood and their relationships with the outcome are as expected. Then, we select X i as the second measure in the final model.
Step 5: We repeat the approach in Step 4 to select the third measure from the remaining 28 measures. The next 15 measures can be determined by using a similar approach with a total of 18 measures.
After the forward model selection approach, we fitted the Cox-I model by using the first K measures to calculate the IBS score where K = 1, 2,⋯⋯, 18. The model with the best performance with regards to the IBS was selected as the final model. In Step 2, we selected 30 variables as the initial set of measures that have the best predictive performance. In the literature, the final selected measures are often below 10. 14 For that reason, we selected the first 18 top measures to compare the predictive performance.
RESULTS
Predictive model
We used the Cox-I model to select the top 18 measures by following the forward model selection approach as presented in above. We then utilized the Monte Carlo Cross Validation (MCCV) with 2,000 simulations to compare the three predictive models: the Cox-I model, the RF-I model, and the RF-R model. The MCCV has better performance than the traditional k-fold cross-validation with regard to prediction accuracy when the study sample size is small to medium. 43 In each simulation, 90% of the data from NACC-v1v2 were used as the training data, and the remaining 10% data were used as the testing data to calculate the model performance evaluation metrics. The NACC-v1v2 data set had a sample size of 3,529. The MCI patient demographic characteristics of the NACC-v1v2 data set were presented inTable 1.
The average of each performance metric as a function of the number of measures (K) in the predictive model was presented in Fig. 1 for IBS, Fig. 2 for Pwithin, and Fig. 3 for Pbefore. In Fig. 1, the computed IBS of the Cox-I model was almost independent of the number of measures. Its IBS was lower than the IBS values of the RF-I model and the RF-R model. The RF-R model had the worst performance with regard to IBS. In Fig. 2, the Cox-I model had the highest Pwithin(MCI-to-dementia), followed by the RF-I model and the RF-R model. For MCI stable patients, the RF-R model had some advantages over the Cox-I model for Pwithin(MCI-stable) when the number of measures was 8 or above. Figure 3 compared Pbefore for patients with MCI-to-dementia conversion, the Cox-I model had better performance than the RF models.
Patient baseline characteristics of NACC-v1v2 MCI cohort

The average IBS from MCCV simulations for all data as a function of the first K measures in the three models, by using the training data set: NACC-v1v2.

The average Pwithin(MCI-to-dementia) for MCI-to-dementia conversion patients, Pwithin(MCI-stable) for MCI stable patients, Pwithin for all patients, from MCCV simulations as a function of the first K measures in the three models, by using the training data set: NACC-v1v2.

The average Pbefore from MCCV simulations for MCI-to-dementia conversion patients as a function of the first K measures in the three models, by using the training data set: NACC-v1v2.
The Cox-I model had the lowest IBS for the model with first 7 measures. These 7 measures included the CDR-SB score, number of Apolipoprotein E (APOE) ɛ4 copies, logical memory IIA delayed score, trail B score, total number of vegetables named in 1 minute, remember dates from functional assessment questionnaire (FAQ), and the MMSE score. The identified 7 measures were listed in the first column in Table 2. The second column was the equivalent measurement names in the ADNI study with the third and fourth columns for the detailed information of these measures.
Selected 7 variables in predictive models
Model testing
After we determined the 7 measures in the final predictive model, we applied the developed models on another two data sets: NACC-v3, and the ADNI MCI cohort. The MCI patient demographic characteristics of these two data sets were presented in Tables 3 and 4, respectively. As compared to the training data of the NACC-v1v2 with a sample size of 3,529, the NACC-v3 and the ADNI MCI cohort had 1,199 and 350 complete data with 7 measures and the conversion status data. In the NACC-v3, the measure of CRAFTDVR was used to replace the measure of MEMUNITS after rescaling the range of CRAFTDVR to 0–25 to match the range of MEMUNITS, and the MMSE score was converted from the MoCA score.44,45, 44,45
Patient baseline characteristics of NACC-v3 MCI cohort with complete data
Patient baseline characteristics of the ADNI MCI cohort with complete data
Patients in the NACC-v3 had shorter follow-up times than those in the NACC-v1v2, and the rate of MCI-to- dementia conversion was lower in the NACC-v3:41.7% as compared to 58.8% in the NACC training data set. Meanwhile, we found that the ADNI MCI cohort had a similar MCI-to-dementia conversion rate as compared to the NACC training data: 56.6% versus 58.8%. The MCI-to-dementia conversion group had an average follow-up time of 32.4 months in the NACC-v1v2, 27.0 months in the NACC-v3, and 30.4 months in the ADNI, respectively. The average age in MCI-to-dementia conversion and MCI stable groups were found to be similar within each MCI cohort with the mean baseline age from 72 to 75. In the NACC-v1v2 data set, the MCI-to-dementia conversion group exhibited slightly higher baseline ages than MCI stable group by 1.5 years. The NACC had a good balance on sex with almost equal number of females and males, while the ADNI study had more males than females. The cognitive outcomes were worse in the MCI-to-dementia conversion group as compared to the MCI stable group in all the three cohorts as presented in Supplementary Tables 1-3.
We conducted external validation on the NACC-v3 data set by using the final predictive model with the identified 7 measures. The computed IBS values were 0.118, 0.123, and 0.124 for the Cox-I model, the RF-I model, and the RF-R model, respectively. For the MCI-to-dementia conversion group, the Cox-I model had a much higher value of Pwithin(MCI-to-dementia) and Pbefore as compared to the other two RF models (9% to 14% difference). For MCI stable patients, the RF-R model had the highest Pwithin(MCI-stable) value of 77%, followed by the Cox-I model of 65%, and the RF-I model at 59%. These findings were similar to the results from using the NACC-v1v2 data set.
For the ADNI MCI cohort, the Cox-I model had the lowest IBS of 0.099, followed by the RF-R model of 0.110, and the RF-I model of 0.113. For the MCI-to-dementia conversion patients, the Pwithin(MCI-to-dementia) of the Cox-I model was close to 16% which was higher than 12% from the RF-I model and 12% from the RF-R model. The Pbefore of the Cox-I model was above 55% while the Pbefore of the RF-I model was below 45%. For MCI stable patients, the Cox-I model had Pwithin(MCI-stable) = 45% chance to predict the dementia onset time beyond the last observed time, while the RF-I model reduced that probability close to Pwithin(MCI-stable) = 42%.
DISCUSSION
When interval-censored data are analyzed by using right-censored methods, the parameters in the model are often biased as the right censored models assume that the event time can be exactly observed if it occurs. In the AD research to study MCI-to-dementia conversion, the exact time is unknown. For that reason, the right-censored models are not appropriate. The considered interval-censored models have better performance with much lower IBS scores than the right-censored models. Another contribution of this article is the feature selection technique that utilizes the forward model selection and the relationship between the outcome and each variable. Without the feature selection process, inconsistencies may arise between the variable correlation to the outcome and the estimated coefficient from survival models.9,46,47, 9,46,47
We identified the 7 measures to predict MCI-to-dementia conversion. Among these 7 measures, three measures were found in the predictive model by using right-censored models: the CDR score, the logical memory score, and the difficult level in remembering appointments. 14 For the remaining 4 measures, Trail B score is commonly used for evaluating cognitive impairment, and it is often associated with cognitive performance. 48 The MMSE score at baseline was found to be predictive of conversion from MCI to dementia. 49 The APOE ɛ4 is the primary genetic risk factor of dementia.50,51, 50,51 The last measure, total number of vegetables named in 60 seconds, was selected in a model to predict disease progression from normal to MCI and MCI to dementia. 52
It is noted that the Pwithin(MCI-stable) value had a 2% increase for the model with K = 8 measures from the model with K = 7 measures. The 8th measure was TAXES from the FAQ in the NACC study to measure the level of difficulty in assembling tax records, business affairs or other papers. This finding may indicate that the TAXES measure can improve the prediction of the dementia onset time among MCI stable patients. That trend did not occur in the MCI-to-dementia conversion group.
The proportion of data censoring may impact the model performance in survival models. This has been discussed in several medical research studies in which the focus was primarily on the right-censored data. Rahman et al. observed that censoring levels affected the variability in predictive accuracy measures, particularly in data with a medium to high censoring rate. 53 Persson and Khamis found that the accuracy of hazard ratio estimates from the Cox PH model was affected by censoring types and proportions in different hazard scenarios. 54 Their study showed that increased censoring proportions generally led to higher biases in estimates, especially under the early censoring in which the actual censoring time was shorter than the original simulated censoring time. These findings emphasized the importance of accounting for censoring proportions in survival analysis to ensure the reliability and validity of survival models.
In machine learning methods (e.g., RF), both parameters and hyperparameters are essential factors that have influences on model performance. Parameters include the regression coefficients corresponding to each measure. Hyperparameters such as mtry (number of variables randomly sampled as candidates at each split), and node size are adjusted to achieve a balance between individual tree robustness and minimizing correlation among trees. As the optimal hyperparameter values are contingent upon the dataset, default settings may not ensure the optimal performance. 55 Among these hyperparameters, mtry has a significant influence, with its optimal value related to the number of variables incorporated in the model. Notably, addressing over-fitting is crucial during hyperparameter tuning, as it can cause the model to overly fit the training data, potentially leading to poor performance on new data sets. Employing cross-validation during the tuning process can alleviate this issue to some extent but not completely. 56 We consider this as future work to further improve the performance of RF methods.
AUTHOR CONTRIBUTIONS
Yahui Zhang (Data curation; Methodology; Software; Writing – original draft; Writing – review & editing); Yulin Li (Formal analysis; Writing – original draft); Shangchen Song (Resources; Software; Writing – original draft); Zhigang Li (Conceptualization; Writing – original draft); Minggen Lu (Conceptualization; Methodology; Writing – original draft); Guogen Shan (Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Software; Supervision; Validation; Writing – original draft; Writing – review & editing).
Footnotes
ACKNOWLEDGMENTS
We would like to thank the comments from the editor, associate editor, and reviewers, that help us to improve the manuscript.
FUNDING
Shan’s research is partially supported by grants from the National Institutes of Health: R03AG083207, R03CA248006, and R01AG070849.
DATA AVAILABILITY
The dataset used in this study was obtained from a third-party organization “Alzheimer’s disease Neuroimaging Initiative” (ADNI) database. The data are available from the ADNI database (adni.loni.usc.edu) upon registration and compliance with the data usage agreement. For up-to-date information, see www.adni-info.org. The proposed algorithm uses this ADNI data repository. The data that support the findings of this study are available on reasonable request from the authors. All ADNI studies are conducted according to the Good Clinical Practice guidelines, the Declaration of Helsinki, and U.S. 21 CFR Part 50 (Protection of Human Sub- jects), and Part 56 (Institutional Review Boards). Written informed consent was obtained from all participants before protocol-specific procedures were performed. The ADNI protocol was approved by the Institutional Review Boards of all of the participating institutions. This study was approved by the Institutional Review Boards of all of the participating institutions, such as the Office for the Protection of Research Subjects at the University of Southern California. A complete listing of ADNI investigators and affiliations can be found at
. Informed written consent was obtained from all participants at each site. The investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. More details can be found at adni.loni.usc.edu.
The NACC database is funded by NIA/NIH Grant U24 AG072122. NACC data are contributed by the NIA- funded ADRCs: P30 AG062429 (PI James Brewer, MD, PhD), P30 AG066468 (PI Oscar Lopez, MD), P30 AG062421 (PI Bradley Hyman, MD, PhD), P30 AG066509 (PI Thomas Grabowski, MD), P30 AG066514 (PI Mary Sano, PhD), P30 AG066530 (PI Helena Chui, MD), P30 AG066507 (PI Marilyn Albert, PhD), P30AG066444 (PI John Morris, MD), P30 AG066518 (PI Jeffrey Kaye, MD), P30AG066512 (PI Thomas 66Wisniewski, MD), P30 AG066462 (PI Scott Small, MD), P30 AG072979 (PI David Wolk, MD), P30 AG072972 (PI Charles DeCarli, MD), P30 AG072976 (PI Andrew Saykin, PsyD), P30 AG072975 (PI David Bennett, MD), P30 AG072978 (PI Neil Kowall, MD), P30 AG072977 (PI Robert Vassar, PhD), P30 AG066519 (PI Frank LaFerla, PhD), P30 AG062677 (PI Ronald Petersen, MD, PhD), P30 AG079280 (PI Eric Reiman, MD), P30 AG062422 (PI Gil Rabinovici, MD), P30 AG066511 (PI Allan Levey, MD, PhD), P30 AG072946 (PI Linda Van Eldik, PhD), P30 AG062715 (PI Sanjay Asthana, MD, FRCP), P30 AG072973 (PI Russell Swerdlow, MD), P30 AG066506 (PI Todd Golde, MD, PhD), P30 AG066508 (PI Stephen Strittmatter, MD, PhD), P30 AG066515 (PI Victor Henderson, MD, MS), P30 AG072947 (PI Suzanne Craft, PhD), P30 AG072931 (PI Henry Paulson, MD, PhD), P30 AG066546 (PI Sudha Seshadri, MD), P20 AG068024 (PI Erik Roberson, MD, PhD), P20 AG068053 (PI Justin Miller, PhD), P20 AG068077 (PI Gary Rosenberg, MD), P20 AG068082 (PI Angela Jefferson, PhD), P30 AG072958 (PI Heather Whitson, MD), P30 AG072959 (PI James Leverenz, MD).
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
