Reliability and Validity of a Novel Internet-Based Battery to Assess Mood and Cognitive Function in the Elderly

Abstract

Dementia is a chronic condition in the elderly and depression is often a concurrent symptom. As populations continue to age, accessible and useful tools to screen for cognitive function and its associated symptoms in elderly populations are needed. The aim of this study was to test the reliability and validity of a new internet-based assessment battery for screening mood and cognitive function in an elderly population. Specifically, the Helping Hand Technology (HHT) assessments for depression (HHT-D) and global cognitive function (HHT-G) were evaluated in a sample of 57 elderly participants (22 male, 35 female) aged 59–85 years. The study sample was categorized into three groups: 1) dementia (n = 8; Mini-Mental State Exam (MMSE) score 10–24), 2) mild cognitive impairment (n = 24; MMSE score 25–28), and 3) control (n = 25; MMSE score 29–30). Test-retest reliability (Pearson correlation coefficient, r) and internal consistency reliability (Cronbach’s alpha, α) of the HHT-D and HHT-G were assessed. Validity of the HHT-D and HHT-G was tested via comparison (Pearson r) to commonly used pencil-and-paper based assessments: HHT-D versus the Geriatric Depression Scale (GDS) and HHT-G versus the MMSE. Good test-retest (r = 0.80; p < 0.0001) and acceptable internal consistency reliability (α= 0.73) of the HHT-D were established. Moderate support for the validity of the HHT-D was obtained (r = 0.60 between the HHT-D and GDS; p < 0.0001). Results indicated good test-retest (r = 0.87; p < 0.0001) and acceptable internal consistency reliability (α= 0.70) of the HHT-G. Validity of the HHT-G was supported (r = 0.71 between the HHT-G and MMSE; p < 0.0001). In summary, the HHT-D and HHT-G were found to be reliable and valid computerized assessments to screen for depression and cognitive status, respectively, in an elderly sample.

Keywords

Cognitive function dementia depression elderly mood

INTRODUCTION

As the United States (U.S.) population continues to age, the prevalence of dementia is increasing, as well as the associated societal costs of treating and caring for people diagnosed with dementia [1, 2]. For the purposes of testing cognitive function and screening for or detecting dementia, there are a number of primarily pen-and-paper assessment tools available; yet the accessibility and utility of some screening and diagnostic tools has been questioned [3, 4]. In conjunction with cognitive function is the related issue of mood in older adults as depression is often an associated symptom of cognitive decline and impairment [5, 6]. Hence, there is a need to further develop and refine screening tests for both cognitive functioning and depression in the aging U.S. population.

A number of pencil-and-paper based tests exist for the assessment of mood and cognition in the elderly. Mood is frequently assessed with the Geriatric Depression Scale (GDS) [7, 8], while one of the most popular screening instruments for cognitive impairment and dementia is the Mini-Mental State Examination (MMSE) [9, 10]. Despite their popularity, limitations of commonly used pencil-and-paper tests include the requirement that individuals visit clinics for assessments, which results in significant clinical costs and practical limitations in conducting assessments in large numbers of individuals. Moreover, these tests can be time intensive and are a non-automated means of quantifying and recording changes in brain function and assessing mood. These issues are particularly relevant for assessments in the elderly, wherein time consuming methods can be burdensome and fatiguing [8]. Hence, it is increasingly recognized that there is a need to develop reliable and valid computer-based screening instruments to allow for the remote assessment of mood and cognition in older adults. Development of computerized assessments would significantly increase the scale, scope, and speed with which mood and cognition can be screened in the elderly.

In the current study, we tested the reliability and validity of two newly developed internet-based screening assessments of mood and cognition entitled Helping Hand Technology (HHT). The HHT-D assesses depression, while the HHT-G is a brief screening instrument of cognitive function. The hypothesis of this study was that HHT computerized assessments of depression and cognition are reliable and valid tools that briefly assess mood and screen cognition in the elderly.

MATERIALS AND METHODS

This study was reviewed and approved by the Institutional Review Board of Pennington Biomedical Research Center and conducted according to the guidelines in the Helsinki Declaration 1975. Written informed consent was provided by each participant prior to initiation of any study procedures. During the informed consent process, decision-making capacity was determined through an ongoing exchange of information between the participant and the examiner and no participants were incapable of providing written informed consent. Participants received monetary compensation for participation.

Participants

Fifty-seven participants were recruited from the Baton Rouge, Louisiana metro community. To improve external validity and generalizability, minimal inclusion and exclusion criteria were utilized. Inclusion criteria were: 1) men and woman between the ages of 60–85, inclusive, and 2) MMSE scores of 10–30, inclusive. Exclusion criteria were: 1) unable to complete computerized and pencil-and-paper based assessments. Using a common cutoff value for the MMSE [3, 10], included participants were categorized into three groups: 1) dementia (MMSE 10–24, inclusive), 2) mild cognitive impairment (MCI; MMSE 25–28, inclusive), or 3) control (MMSE 29-30, inclusive). We relied solely upon the MMSE to categorize participants into these cognitive function groups given our objective was to test a similar screening instrument (i.e., HHT-G screener for cognitive status). While we acknowledge its limitations and criticisms [3 , 10], we note that the MMSE is most often used to screen for potential dementia and is usually the first step in ascertaining a dementia diagnosis [10].

Assessments

This study compared HHT computerized assessments to traditional pencil-and-paper based assessments. The pencil-and-paper assessments of mood and cognition were completed once at the first study visit. Participants took the computerized battery twice, separated by 5–16 days, to assess test-retest reliability. The HHT internet-based assessments were self-administered with minimal supervision as participants were directed to follow auditory instructions presented in the computerized program.

Mood assessments

The GDS is a 15 question pencil-and-paper based form that is commonly utilized for assessing depression in the elderly. The HHT depression test (HHT-D) is a computerized assessment and utilizes a format similar to the GDS questionnaire capturing true or false answers for each of the questions. The questions for the HHT-D are designed to capture the same information as outlined in the GDS.

Cognitive assessments

The MMSE is a pencil-and-paper based assessment that is one of the most commonly utilized clinical tools to screen for cognitive function, which is why it was selected to be used in this validation study. The HHT computerized screening instrument for global cognitive function (HHT-G) was designed to provide a measure of cognitive function tapping the same cognitive domains as the MMSE, including orientation, language, and memory. Data for the HHT-G were captured by having the participant complete a series of computerized screens following the delivery of verbal commands.

Analyses

First, individual items on the HHT-D and HHT-G were assessed to identify and eliminate items that failed to detect variation among respondents. Second, the reliability of the HHT computerized assessment battery was assessed via Pearson correlation coefficients to quantify test-retest reliability and Cronbach’s alpha coefficients to quantify internal consistency reliability. Third, two tests of validity were carried out: 1) the HHT-D was examined by comparing its scores to those on the GDS; and 2) the HHT-G was evaluated against the MMSE. Last, item-total correlations were used to evaluate the final individual items on each assessment.

RESULTS

Participants

Fifty-seven (n = 57) participants aged 59–85 years (22 men, 35 women) completed the study; 8 had MMSE score of 10–24 (dementia), 24 had MMSE scores of 25–28 (MCI), and 25 had MMSE scores of 29-30 (controls). The descriptive characteristics of the sample are outlined by depression statuses and cognition in Table 1. The study failed to recruit the expected number of participants in the dementia category. Consequently, conclusions about the reliability and validity of the HHT computerized assessment battery are qualified among people with an MMSE score of 10–24, inclusive.

Mood assessment: HHT-D

Scoring

There were 15 yes/no questions on the HHT-D with each question being scored as 1 or 0. There were also 5 multiple choice questions that were scored from 0 to 3 points each; thus, the possible scores for the 20 items ranged from 0 to 30, with higher scores reflecting higher levels of depressed mood. Based upon results, there were two questions for which all subjects responded ‘0’, indicating that these questions have no value for use in separating subjects into depression categories. The HHT-D was administered to participants at two time points in the study with a mean score of 11 (S.D. = 3.7/3.6; range = 0–20/6–22; median = 10/10) at each time point.

Reliability

The test-retest reliability of the HHT-D was supported. The Pearson correlation coefficient was 0.80 (p < 0.0001) indicating good agreement between the HHT-D scores obtained from assessments given at two time points. Further, internal consistency reliability was acceptable with a Cronbach coefficient alpha of 0.73 [11].

Validity

The Pearson correlation coefficient between the total scores on the HHT-D (range 0 to 30) and scores on the GDS (range 0 to 30) were calculated as 0.60 (p < 0.0001), indicating moderate support for the validity of the HHT-D.

Item-total correlations

Pearson correlation coefficients were calculated between scores on each individual item on the HHT-D and the total score on the HHT-D. The correlations between the scores on the individual depression items and the sum of the HHT-D are shown in Table 2. Based on the cut-score of 0.25 noted above, four HHT-D items can be considered for elimination.

Relationship between the HHT-D and the GDS

The GDS can be used to categorize people by level of depression, with scores of 0–1 indicative of normal levels of mood, 2–3 as mild depressed mood, and 4–8 as severely depressed mood [7]. HHT-D scores in the range of 0–11 are consistent with normal levels of mood, 12–16 with mild depressed mood, and 17–30 as severe levels of depressed mood. Results indicate agreement between categorization of subjects based on the criterion (the GDS) and the HHT-D. As illustrated in Table 3, the Kappa value reflects fair to moderate levels of agreement in classifying participants between the GDS and HHT-D[12].

Cognitive assessment: HHT-G

Scoring

A total of 50 questions were on the original HHT-G with possible total scores ranging from 0 (lowest score) to 53 (highest score). For most questions (49), a correct answer was scored as 1 point with incorrect answers scored as 0 points; only one question was scored 0 to 4 with 0 being the lowest and 4 being the highest. Based upon results from the 57 subjects, 23 questions were considered invalid for separating subjects into cognition categories. Responses to these questions did not demonstrate any variation among the subjects and therefore were deleted from the list of 50 questions used to determine an HHT-G score. Hence, 27 items were retained and are used to calculate the HHT-G score; 26 of these questions were scored 0 or 1 and the 27th was scored 0 to 4. Thus, the minimum possible score summed across the 27 questions was 0 and the maximum was 30. The HHT-G was administered to participants at two time points in the study with a mean HHT-G score of 25 (S.D. = 3.0/3.5; range = 10–28/5–28; median = 25/26) at each time point.

Reliability

The test-retest reliability, or repeatability, of the HHT-G was supported with a Pearson correlation coefficient of 0.87 (p < 0.0001), indicating a good agreement between the HHT-G scores obtained from assessments given at two time points. Further, internal consistency reliability was acceptable with a Cronbach coefficient alpha of 0.70 [11].

Validity

The HHT-G score for each of the 57 subjects was evaluated against the pencil-and-paper score on the MMSE. The Pearson correlation coefficient between total scores on the HHT-G (range 0 to 30) and total scores on the MMSE (range 0 to 30) was calculated to be 0.71 (p < 0.0001). This correlation supports the validity of the HHT-G module.

Item-total correlations: All items

Correlations were calculated between scores on each of the reduced set of 27 individual items on the HHT-G and the HHT-G total score (range 0 = lowest cognition to 30 = highest cognition). A cut-score for these correlations of 0.25 has been proposed to identify items that do not sufficiently measure the construct assessed by the questionnaire and thus can be considered for elimination [13]. Based on this cut-score, ten HHT-G items can be considered for elimination on future versions of the HHT-G module. Table 4 provides item-total correlations for the HHT-G.

Relationship between the HHT-G and MMSE

As noted earlier, participants were classified as being in the dementia (MMSE 10– 24, inclusive), MCI (MMSE 25– 28, inclusive), or control (MMSE 29– 30, inclusive) groups based on their MMSE scores. The HHT-G score was also used to classify participants. The HHT-G consists of 27 items scored such that the minimum possible total score is 0 and the maximum possible total score is 30 (similar to the MMSE). Also similar to the MMSE, participants were grouped into the dementia, MCI, and control categories based on HHT-G scores of 10– 24, 25– 27, and 28– 30, respectively. Agreement between classifying participants with the MMSE and HHT-G was then evaluated. As illustrated in Table 5, the Kappa statistic indicates fair agreement [12] between categorizing subjects with the MMSE versus the HHT-G.

DISCUSSION

Prior to this study, the reliability and validity of the HHT battery for screening depression and cognition in the elderly had not been evaluated. The HHT-D was found to have support for its test-retest reliability, as well as internal consistency reliability. Also, correlations between the HHT-D and the GDS were significant, providing support for its validity. Item-total correlations for the HHT-D identified four items that can be considered for elimination in future versions of the HHT-D. Compared to the GDS, the HHT-D similarly classified all subjects with severe levels of depressed mood, though agreement between the HHT-D and the GDS was limited for subjects with minimal and mild levels of depressed mood, which is not uncommon when brief screening instruments are used to classify people into categories.

The reliability of the HHT-G was supported by test-retest correlations and internal consistency reliability. The HHT-G had large correlations with a commonly utilized cognitive assessment tool, the MMSE, providing support for its validity. Ten items had low item-total correlations and can be considered for elimination in future versions of the HHT-G, as they did not adequately measure the construct of interest and many of these items also had limited variability. Agreement in classifying subjects as demented, MCI, or control between the HHT-G and the MMSE was limited. Similar to the HHT-D and the GDS, this is expected when single brief self-report instruments are utilized to classify people into groups.

There are a few limitations of our study that are noteworthy. First, test-retest reliability is evaluated by administering the same battery at both time points in the study. Hence, learning effects were possible between the first and second administration of the HHT-D and HHT-G assessments. Second, the ability of the HHT battery to classify subjects into categories based on levels of cognitive performance and depression was mixed. More specifically, regarding the categorization of participants into dementia, MCI, and control groups, we note that separation between the groups did not rely upon clinical definitions of dementia and MCI syndromes, nor did we undertake other means, such as a clinical exam or clinical dementia rating, to verify dementia, MCI, or control subjects. Rather, we solely relied upon the MMSE scores and traditional cutoff values to separate subjects into these three groups. We recognize that the MMSE is not the sole component for classifying dementia, but rather is a screener for further diagnostic testing to ascertain a more definitive diagnosis. However, the MMSE is the most commonly employed screening tool for dementia [3, 10]. We also highlight that the study failed to recruit the expected number of participants in the dementia category. Consequently, conclusions about the reliability and validity of the HHT computerized assessment battery were qualified among people with an MMSE score of 10– 24, inclusive, and thusly limit the utility for determining dementia.

This study supported the reliability and validity of the HHT-D and HHT-G to assess depression and cognition in the elderly. The HHT internet-based assessment battery offers the ability to remotely screen for depression and cognitive status in the elderly, which is the primary advantage of the HHT-D and HHT-G assessments over commonly utilized pencil-and-paper based assessments. Given this remote accessibility, the HHT assessments can be deployed in a variety of settings with the use of an internet-enabled computer. In particular, the HHT assessments could circumvent issues related to patients having to visit clinics or examiners traveling to meet with patients for assessments. Additionally, the HHT-G addresses a number of criticisms stated against the MMSE, including its length [4, 10] and issues with accessibility and usefulness [3]. Our analysis revealed that the HHT-G could be shortened by the elimination of items that did not adequately measure cognitive function, which demonstrates decreased time intensity of administering the HHT-G. For use in elderly populations, the HHT-D and HHT-G has the potential to reduce both clinician and participant burden while screening for critical health conditions.

This project received support from Helping Hand Technology, LLC, which created and owns the intellectual property associated with the tests evaluated during the trial. JNK and HRA have financial interests in the tests. WDJ is supported in part by 1 U54 GM104940 from the National Institute of General Medical Sciences of the National Institutes of Health, which funds the Louisiana Clinical and Translational Science Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Authors’ disclosures available online (http://j-alz.com/manuscript-disclosures/16-0441r1).

Footnotes

ACKNOWLEDGMENTS

References

Hurd

, Martorell

, Langa

(2013) Monetary costs of dementia in the United States. N Engl J Med 369, 489–490.

Hurd

, Martorell

, Langa

(2015) Future monetary costs of dementia in the United States under alternative dementia prevalence scenarios. J Popul Ageing 8, 101–112.

Tsoi

, Chan

, Hirai

, Wong

, Kwok

(2015) Cognitive tests to detect dementia: A systematic review and meta-analysis. JAMA Intern Med 175, 1450–1458.

Newman

(2015) Copyright and bedside cognitive testing: Why we need alternatives to the Mini-Mental State Examination. JAMA Intern Med 175, 1459–1460.

Lovheim

, Sandman

, Karlsson

, Gustafson

(2008) Behavioral and psychological symptoms of dementia in relation to level of cognitive impairment. Int Psychogeriatr 20, 777–789.

Hudon

, Belleville

, Gauthier

(2008) The association between depressive and cognitive symptoms in amnestic mild cognitive impairment. Int Psychogeriatr 20, 710–723.

Sheikh

, Yesavage

(1986) Geriatric Depression Scale (GDS): Recent evidence and development of a shorter version. Clin Gerontol 5, 165–173.

Brown

, Astell

(2012) Assessing mood in older adults: A conceptual review of methods and approaches. Int Psychogeriatr 24, 1197–1206.

Folstein

, Folstein

, McHugh

(1975) “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12, 189–198.

10.

Mitchell

(2009) A meta-analysis of the accuracy of the mini-mental state examination in the detection of dementia and mild cognitive impairment. J Psychiatr Res 43, 411–431.

11.

Tavakol

, Dennick

(2011) Making sense of Cronbach’s alpha. Int J Med Educ 2, 53–55.

12.

Landis

, Koch

(1977) The measurement of observer agreement for categorical data. Biometrics 33, 159–174.

13.

Field

(2005) Discovering statistics using SPSS. 2nd Edition, Sage, London.