Abstract
“δ”, a latent variable constructed from cognitive performance and functional status measures, can accurately diagnose dementia. The minimal assessment needed is unknown. We have constructed a δ homolog, “dTEXAS”, from Telephone Executive Assessment Scale (TEXAS) items, and validated it in a convenience sample of Japanese persons (n = 176). dTEXAS scores correlated strongly with both Instrumental Activities of Daily Living (IADL) (r = –0.86, p < 0.001) and Clinical Dementia Rating Scale (CDR) (r = 0.71, p < 0.001). Constructed independently of their diagnoses, dTEXAS scores accurately distinguished dementia versus controls [area under the receiver operating curve (AUC; ROC) = 0.92, dementia versus mild cognitive impairment (MCI) (AUC = 0.80) and controls versus MCI (AUC = 0.74)]. These AUCs are higher than those of multiple observed executive measures, as reported recently by Matsuoka et al., 2014. A dTEXAS score of –0.58 best discriminated between dementia versus controls with 90.1% sensitivity and 80.0% specificity.
INTRODUCTION
We have constructed a latent measure of dementia severity, “δ”, and validated it in several datasets [1]. δ represents the “cognitive correlates of functional status”. It is uniquely related to dementia severity and distinguishes cases with Alzheimer’s disease (AD), and mild cognitive impairment (MCI) from each other, and from normal controls (NC) [1–3]. A δ homolog has been independently validated in the National Alzheimer’s Coordinating Center (NACC)’s “Uniform Dataset” (UDS) among cases with frontotemporal, Lewy body, and vascular dementias, in addition to AD [4]. It had an area under the receiver operating curve (AUC/ROC) of 0.96 for the discrimination of all cause dementia versus MCI cases and controls. The six-year rate of change in δ scores correlated r = 0.93 with concurrent changes in dementia severity.
The latent variable δ is related to Spearman’s general intelligence factor, “g” [5]. It appears to share g’s robust “indifference” to the cognitive measures used to construct it (i.e., its “indicators”). In theory then, δ might be constructed from almost any ad hoc combination of cognitive and functional status measures, making it feasible to retrospectively estimate expert clinical diagnoses in almost any existing dataset, or to engineer prospective batteries to achieve specific applications (e.g., brevity, low cost, telephone, or internet administration, etc.).
Another potential application would be to construct δ from a minimal cognitive assessment. The bare minimum required is currently unknown. Nevertheless, we have constructed one δ homolog (“dMA”) from as little as the Mini-Mental State Examination (MMSE) [6] and CLOX: An Executive Clock-drawing Task (CLOX) [7] and a second from the CLOX and the Executive Interview (EXIT25) [8, 9]. Both achieved AUCs of 0.96 for the diagnosis of dementia.
Further reductions may be possible. The indifference of g to its indicators suggests that δ and its homologs might be extracted from item-level data. If so, then δ homologs might free valid dementia case finding from the clinic, and allow widespread dementia screening, e.g., in population cohorts, or by telephone and/or the internet).
Matsuoka et al. [10] allows us to further explore this issue. They recently validated Japanese language translations of CLOX and the EXIT25 in a small sample of elderly persons with AD and other dementias. These were compared with other bedside screening tests, including the Frontal Assessment Battery (FAB) [11]. Their dataset contains item level data. This allows the extraction of three EXIT25 short forms, including the EXIT15, the “Quick EXIT” [12] and the Telephone Executive Assessment Scale (TEXAS) [13]. The EXIT15 contains 15 optimal EXIT25 items selected through factor analyses (DRR, unpublished). 13 of those appear on the Quick EXIT, selected by an independent Rausch analysis.
The TEXAS is comprised of five EXIT25 items and is intended for telephone administration.
We chose to construct a new δ homolog (i.e., “dTEXAS”) from those five. We hypothesize that dTEXAS will explain more variance in Instrumental Activities of Daily Living (IADL) and the Clinical Dementia Rating Scale (CDR) [14] than the TEXAS, and as much or more than the EXIT25, which has previously been shown to load strongly on δ [2]. We have shown that IADL can be estimated from a subset of a δ homolog’s composite loadings [8]. We hope to replicate that finding with dTEXAS.
METHODS
Subjects
This study is a secondary analysis of data obtained by Matsuoka et al. [10]. Two hundred one subjects (62 men and 139 women; mean age ± SD, 78.4 ± 6.4 years old; mean education ± SD, 11.4 ± 3.0 years) participated in that study. A subset was chosen for analysis. Inclusion criteria consisted of (i) age ≥65; (ii) a CDR global score of less than 3; (iii) Mini-Mental Status Exam (MMSE) score ≥10. All participants were comprehensively assessed by a geriatric psychiatrist. On the basis of that exam, n = 176 eligible subjects were diagnosed as NC (n = 45), MCI (n = 40), or “Dementia” (n = 91). The Ethics Committee of Kyoto Prefectural University of Medicine approved the study and informed consent was obtained from all subjects.
Clinical assessments
All subjects were evaluated using the CDR, CLOX, EXIT25, FAB, IADL, and MMSE [6]. The CDR was rated blind to the CLOX, EXIT25, and FAB. Thus, there is no potential for a tautology in the prediction of CDR scores or derived categorical diagnoses from EXIT25 items.
Clinical Dementia Rating Scale [14]: The CDR rates dementia severity across six clinical domains. From these, an algorithm derives global ratings of CDR = 0, “normal”; CDR 0.5, “questionable dementia”; CDR = 1, “mild dementia”, CDR = 2 “moderate dementia” or CDR = 3 “severe dementia” (excluded from these analyses).
Instrumental Activities of Daily Living [15]: IADLs were assessed using informant ratings. Items rating housekeeping, food preparation, and laundry were obtained only from women. The total IADL scale is 5 in men and 8 in women.
CLOX: An Executive Clock-Drawing Task [7] in Japanese translation [10]: The CLOX is a brief measure of executive function based on a clock-drawing task. It is divided into two parts. CLOX1 is an unprompted task that is sensitive to executive control. CLOX2 is a copied version that is less dependent on executive skills. CLOX1 is more ‘executive’ than other comparable clock drawing tests [16]. Each CLOX subtest is scored on a 15-point scale. Lower CLOX scores are impaired. Matsuoka et al. [10] found CLOX1 to have an AUC of 0.73 for the discrimination between dementia cases and non-demented persons (MCI and NC) in their cohort.
The Executive Interview [8] in Japanese translation [10]: The EXIT25 is comprised of 25 items indicating frontal circuit pathology. Each item is rated on a 3-point response: 0 (intact performance), 1 (a specific partial error or equivocal response), and 2 (specific incorrect response or failure to perform). A total score ranges from 0 to 50, with a higher score indicating greater impairment. A score of 15/50 best discriminates normal elderly from all cause dementia (sensitivity = 93% , specificity = 83% , AUC = 0.93, n = 200) [17]. Matsuoka et al. [10] found the EXIT25 to have an AUC of 0.88 for the discrimination between dementia cases and non-demented persons.
Several “short forms” can be derived from EXIT25 items. The Quick EXIT is comprised of 14 items selected through Rasch analysis [12]. Cronbach’s α=0.88. Royall had independently selected 13 of those, and two additional items (i.e., “Imitation Behavior” and “Complex Command Task” = the EXIT15), by factor analysis (DRR, unpublished). Cronbach’s α= 0.85.
The TEXAS is comprised of five EXIT25 items that can be administered by telephone, i.e., “Number-Letter Task,” “Word Fluency,” “Memory/Distraction Task,” “Serial Order Reversal Task,” and “Anomalous Sentence Repetition” items. Four of those appear on the Quick EXIT and EXIT15. Total TEXAS scores range from 0 to 10, with higher scores corresponding to greater impairment. Cronbach’s α= 0.74. TEXAS scores significantly distinguish MCI cases from NC and contribute to CDR scores independently of the Telephone Interview for Cognitive Status— modified (TICS-m) [18], Memory Impairment Screen— telephone (MIST-t) [19], Alzheimer’s Questionnaire (AQ) [20], and covariates [13].
Frontal Assessment Battery [11]: The FAB consists of six items to explore the different aspects of frontal lobe function. A score in each item ranges from 0 to 3 and a total score of the FAB is 18, with a lower total score indicating more frontal lobe dysfunction. The CDR was rated blind to the FAB. Matsuoka et al. [10] found the FAB to have an AUC of 0.79 for the discrimination between dementia cases and non-demented persons.
The MMSE [6, 21] is a well-known and widely used test for screening cognitive impairment. Scores range from 0 to 30. Scores less than 24 reflect cognitive impairment. MMSE scores were known to the CDR raters.
Statistical analyses
Analysis sequence
This analysis was performed using Analysis of Moment Structures (AMOS) software [22]. All analyses were conducted in an SEM framework. We used the Maximum Likelihood estimator.
First we constructed a bifactor latent δ homolog (i.e., “dTEXAS”), as previously described (e.g., reference 3)(Fig. 1). dTEXAS was indicated by the five TEXAS /EXIT25 items and IADL. By selecting these TEXAS /EXIT25 items for our homolog, dTEXAS could be administered by telephone, if desired. However, our analysis was performed post hoc. dTEXAS’ residual in Spearman’s g (i.e., “g”’) was indicated only by the five TEXAS /EXIT25 items. All the observed indicators were adjusted for age, education, and gender. The latent variables dTEXAS and g’ are implicitly orthogonal. This was confirmed by inter-correlation. Covariate adjusted CDR scores were correlated with dTEXAS.
Next, the latent variable dTEXAS was output as two composite variables, “dTEXAS” and “dTX2”. dTEXAS was constructed from the factor loadings of six indicator variables in Fig. 1, including both TEXAS items and IADL. However, as a test of this δ homolog’s “indifference to its indicators”, we removed IADL from the dTEXAS composite and used the remaining dTEXAS composite loadings to construct “dTX2”. Thus, the dTX2 composite is informed only by the TEXAS items (never the less weighted by their multivariate associations with IADL). Since the dTX2 composite is informed by a briefer battery and does not invoke IADL, it could be estimated in patients without an intact patient /informant dyad, and might also be used to estimate IADL from a simple bedside cognitive assessment. dTEXAS and dTX2 were then validated as predictors of observed diagnostic class by ROC analyses (see below).
Missing data: These models were all constructed in an SEM framework using complete cases.
Fit indices: Model fit was assessed using four common test statistics: chi-square, the ratio of the chi-square to the degrees of freedom in the model (CMIN/DF), the comparative fit index (CFI), and the root mean square error of approximation (RMSEA). Where two nested models were compared, Akaike’s Information Criterion (AIC) was added. A lower AIC statistic indicates better fit [23]. A non-significant chi-square signifies that the data are consistent with the model [24]. However, in large samples, this metric is limited by its tendency to achieve statistical significance when all other fit indices (which are not sensitive to sample size) show that the model fits the data very well. A CMIN/DF ratio <5.0 suggests an adequate fit to the data [25]. The CFI statistic compares the specified model with a null model [26]. CFI values range from 0 to 1.0. Values below 0.95 suggest model misspecification. Values approaching 1.0 indicate adequate to excellent fit. An RMSEA of 0.05 or less indicates a close fit to the data, with models below 0.05 considered “good” fit, and up to 0.08 as “acceptable“ [27]. All fit statistics should be simultaneously considered when assessing the adequacy of the models to the data.
Factor determinancy: A potential limitation of the common factor model is that an infinite number of unique factor score composites can be derived from any factor [28]. While they all might be consistent with the factor’s loadings, some composites may be orthogonal to others, or even inversely related, potentially resulting in wildly discrepant subject rankings, depending on the composite selected.
However, these can be divided into “determinant” and “indeterminant” fractions [29]. Fortunately, many common factor score estimates are highly intercorrelated and yield an identical reproduced covariance matrix [30]. Several statistical methods are available to test a factor’s determinacy. We used Grice’s “Refined Factor Score Evaluation Program (Equation 5)” [28). This method maximizes composite validity rand is recommended when the factor composite scores are to be used as “observed” variables in subsequent analyses (e.g. ras predictors). We report two indices from this program’s output: “Total Item Squared Multiple Correlation” (TIMSC) rand a “Minimum Correlation” (MC). Acceptable TIMSC and MC should be >0.50.
ROC curves: The diagnostic performance or accuracy of a test to discriminate diseased from normal cases can be evaluated using ROC curve analysis [31, 32]. Briefly the true positive rate (Sensitivity) is plotted as a function of the false positive rate (100-Specificity) for different cut-off points of a parameter. Each point of the ROC curve represents a sensitivity /specificity pairing corresponding to a particular decision threshold. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal). For these analyses, “AD” was coded as CDR ≥1.0, “MCI” as CDR = 0.5, and NC as “CDR = 0”. The analysis was performed in Statistical Package for the Social Sciences (SPSS) [33].
RESULTS
Table 1 shows the demographic and neuropsychological data for each diagnostic group. Figure 1 presents the confirmatory bifactor model of the latent δ homolog “dTEXAS”. The model had excellent fit [RMSEA = 0.000; CFI = 1.00; ChiSQ = 6.52 (8), p = 0.589]. dTEXAS was significantly associated with each cognitive indicator (range r = –0.28 ––0.43, all p≤0.01). g’ was significantly associated with EXIT2, EXIT4 and EXIT22, but neither EXIT1 nor EXIT6.
dTEXAS exhibited marginally acceptable factor determinancy by Grice’s method (i.e., TISMC = 0.68; MC = 0.37). dTEXAS composite scores were output from the model as described above. The resulting dTEXAS dementia phenotype was a continuously varying variable (Fig. 2). Table 2 presents dTEXAS and dTX2’s univariate correlations with CDR, IADL, and the executive measures. IADL was most strongly associated with the dTEXAS (r = –0.86, p < 0.001). CDR was most strongly associated with the EXIT25, EXIT15, and dTEXAS (all r = 0.71, p < 0.001).
Table 3 presents two multivariate regression models of each cognitive measure as independent predictors of IADL and CDR score, respectively. Both the predictors and the outcomes are adjusted for age, education, and gender. The predictors (including covariates) explained 98% of the variance in IADL. dTEXAS scores (partial r = 0.98, p < 0.001) explained the most variance. EXIT25 (r = –0.06, p < 0.001) and TEXAS (r = 0.21, p < 0.001) also made significant independent contributions. The predictors explained 48% of the variance in CDR scores. dTEXAS scores (partial r = –0.47, p < 0.001) explained the most variance. EXIT25 (r = 0.39, p < 0.001) and TEXAS (r = –0.24, p < 0.001) also made significant independent contributions. The FAB was not independently associated with either outcome.
ROC analysis
Table 4 presents AUC for dTEXAS and dTX2 scores as predictors of the diagnostic groups. dTEXAS’s AUC was best for the relatively easy discrimination between dementia cases and NC (AUC = 0.92, 95% CI = 0.88–0.97). However, it also performed well at more difficult discriminations, including dementia versus MCI (AUC = 0.80, 95% CI = 0.72–0.88), and MCI versus NC (0.74, 95% CI = 0.63–0.84). dTX2 had generally weaker AUCs but none were statistically distinguishable from dTEXAS.
A dTEXAS score of –0.11 best discriminated between dementia versus MCI with 74.7% sensitivity and 75.0% specificity. A dTEXAS score of –87.4 best discriminated between NC versus MCI with 70.0% sensitivity and 66.7% specificity. A dTEXAS score of –0.58 best discriminated between dementia versus all others with 90.1% sensitivity and 80.0% specificity (Fig. 2).
DISCUSSION
This analysis confirms the moderate to strong statistical association between executive measures and functional outcomes (i.e., IADL and CDR scores). However, we have recently found disability (and therefore “dementia”) to be almost exclusively related to δ, and therefore to a fraction of variance in Spearman’s general intelligence factor “g” [34]. This has important psychometric implications. First, the apparent association between observed executive function and disability must be mediated though the former’s associations with g and especially δ, as we have demonstrated in Table 3. Second, executive measures may not be necessary to dementia’s diagnosis as g/δ’s construction is indifferent to their indicators. Instead, dementia-relevant variance may reside in every cognitive measure, regardless of its domain-specific properties.
The present analysis takes δ’s construction down to the item level, with minor degradation of its psychometric properties. A δ homolog derived from a small set of EXIT25 items has been shown to be more strongly associated with clinical outcomes than any observed executive measure, and to more accurately distinguish categorical diagnostic classes by ROC. These findings suggest that any single cognitive measure might improve its ability to predict disability/dementia if its items are weighted to reflect their associations with g, and especially δ.
Some of δ’s improved psychometric performance may arise from our use of a latent measurement model. Any latent construct is likely to be somewhat advantaged over its indicators, because it will be resistant to the influence of their individual measurement error(s). However, that advantage can be lost when the latent construct is “reified” as a composite score [28]. Instead, it is δ’s bifactor design that best explains its improved psychometric performance. δ’s construction discards the considerable variance of g’, which is unrelated to the target indicator, by definition. The poor diagnostic performance of g’ forces us to distinguish dementia from cognitive performance per se. This is a key conceptual insight.
Both points have the potential to dramatically alter the landscape of dementia case-finding. If IADL’s variance is uniquely related to g/δ, and if both are indifferent to their indicators, then any cognitive battery can be used to assess disability/dementia, including those derived from item-level data extracted from a single measure’s items.
This finding is tempered by the fact that larger and more comprehensive batteries seem to produce stronger δ homologs. dTEXAS appears to suffer on two fronts: it is derived from a small set of indicators, and they all target a single cognitive domain. dTEXAS has marginal factor determinancy, in contrast to other δ homologs constructed from more comprehensive batteries of formal psychometrics. It also has weaker AUCs. Its AUC for the discrimination of AD versus controls [i.e., 0.92 (0.88–0.98)] is reducedrelative to a brief battery of bedside scales [i.e., 0.95 (0.91–0.98)] [9], which in turn appears to be reduced relative to a battery of formal psychometrics [i.e., 0.995 (0.97–1.0)] [34]. Similarly, Koppara et al. [35] demonstrate that the psychometric performance of δ homolog improves when it is built from a larger psychometric battery. Nevertheless, the overlap in these estimates’ confidence intervals suggest that dTEXAS’s AUC is statistically indiscriminable from those of larger batteries. The users of such item level δ homologs will have to decide if their brevity and ease of use justify such tradeoffs. dTEXAS, for example, might be administered over the telephone by lay interviewers. Whether a five-item δ homolog administered over the telephone by lay interviewers actually performs comparably to formal psychometrics awaits prospective demonstration.
If g/δ can be constructed from any cognitive battery, then this operation might be conducted on the item-set of any single cognitive measure, thereby improving upon the performance of their traditional scoring systems. Koppara et al. [35] have also independently replicated this finding in a European cohort. Given that a more comprehensive set of items might have been more effective, dementia’s special association with g/δ may explain the unexpected utility of dementia screening tests such as the MMSE. They effectively provide a measure of Spearman’s g, within which resides δ’s dementia-salient variance. This insight undermines the importance of any measure’s face validity as a measure of any particular cognitive domain. All measures of any domain can be related to Spearman’s g, and through it to δ. This is consistent with dementia’s inherently “global” effects on cognitive performance and may necessarily constrain its pathophysiology and biomarkers.
Thus, the ideal brief δ-homolog might be constructed from a broader survey of cognitive performance items. The optimal δ homolog for the telephone assessment of dementia has yet to be determined and would require validation within Standards for the Reporting of Diagnostic Accuracy Studies (STARD) guidelines (http://www.stard-statement.org/) to ensure its generalizability to other samples. Our intent has been instead to demonstrate the potential utility of item-level homologs and the process by which they could be constructed, and their optimal diagnostic threshold(s) derived. dTEXAS outperforms the entire EXIT25, from which its indicators are derived. This is impressive because the EXIT25 is among the strongest psychometric predictors of functional status [36]. Both TEXAS and dTEXAS take less time to administer than the EXIT25, and can be administered by telephone. Moreover, dTEXAS outperforms the TEXAS’ traditional scoring.
There remains the problem that δ’s target indicators (IADL or CDR for example) require the independent interrogation of a second observer. This constrains δ’s assessment to intact patient-caregiver dyads, and further to those with competent and knowledgeable informants. The CDR requires the presence of an experienced and adequately trained clinician. Both constraints limit the potential of δ to remove dementia case-finding from the clinic.
However, we have previously shown that δ’s diagnostic ability is not statistically compromised when it is calculated from a subset of its indicators [9]. Such restricted δ composites can be used to predict their target indicator(s). Because dTEXAS is strongly associated with both CDR and especially IADL, it may be possible to accurately predict those outcomes, by telephone if dTX2 is used. This would also allow the assessment of patients who live alone or who otherwise lack a competent informant. dTX2 could be operationalized as a phone app or website calculating the composite from observed data obtained by telephone (etc.).
Another potential limitation to our analysis has been the use of the CDR as a “gold standard” outcome assessment. AD cases can present at CDR = 0.5 and might be misdiagnosed here relative to a clinical assessment. Since this is a secondary analysis of data collected for other purposes (i.e., the validation of a Japanese translation of the EXIT25), we are limited to this a priori constraint. However, we also note that dTEXAS’ AUC for the discrimination of “AD” versus NC is reduced relative to that of multiple published δ homologs tested against clinical diagnoses. That might be the result of these Japanese subjects’ misdiagnosis by CDR and not simply dTEXAS’ brevity.
In short, our approach appears capable of removing significant obstacles to the routine and cost efficient screening of individuals for functionally salient cognitive impairment. One caveat is that the d-score alone is not capable of providing a specific diagnosis, and so cases found to be “demented” on the basis of a d-score would still need evaluation and treatment. In fact, the clinical evaluation of d-scores may lead to detection of potentially disabling cognitive impairments across a broad range of diagnoses. This would allow the empiric recognition of “dementia” outside of neurodegenerative disorders, e.g., in schizophrenia (“dementia praecox”), mood disorders, diabetes mellitus, head injury, etc. [37]. Many patients with impaired d-scores will have reversible conditions.
Any such patient should benefit from their family’s and healthcare providers’ recognition of a disabling cognitive impairment. The mismatch between a patient’s capacity to participate in their healthcare and our expectations of them, is a serious obstacle to the efficient delivery of healthcare services. Patients identified as “demented” on the basis of their δ score could be directed toward more appropriate levels of supervision, relieved from responsibility for self-monitoring or self-treatment beyond their capacity, protected from unsafe driving or living conditions, and screened for the capacity to direct their finances or to make healthcare decisions. Raw psychometric performance cannot be relied upon to make these determinations because a significant fraction of its variance is attributable to g’ and therefore irrelevant to functionalautonomy.
Finally, while we have constructed δ from item-level data, Spearman’s g and therefore δ may extend even deeper into brain-behavior relationships. Spearman himself constructed g from simple sensory discrimination tasks [5]. g has since been associated with a broad range of psychophysical and biological variables [38]. If δ homologs can also be constructed from such variables, accurate dementia diagnoses might be obtainable from a wide selection of objective and easily acquired data streams. If those homologs retain δ’s psychometric properties, dementia screening would be changed dramatically.
Footnotes
ACKNOWLEDGMENTS
This research was supported by “Redesigning Communities for Aged Society” project (Research Institute of Science and Technology for Society, Japan Science and Technology Agency: RISTEX, JST). Dr. Royall is supported by the Julia and Van Buren Parr endowment for Alzheimer’s-related research. The Sponsors had no role in the design, methods, subject recruitment, data collections, analysis and preparation of thispaper.
DRR holds copyrights to the EXIT25 and CLOX. DRR and RFP have disclosed δ’s invention to the University of Texas Health Science Center at San Antonio (UTHSCSA), which has filed patent application 2012.039. US1.HSCS and provisional patents 61/603,226, 61/671,858 and 62/112,703 relating to the latent variable δ’s construction andbiomarkers.
