Abstract
Keywords
INTRODUCTION
Recently there has been a marked increase in the prevalence of dementia and related cognitive disorders. Dementia already represents one of the leading causes of death in high income countries incurring a heavy social and economic toll on most societies and its prevalence is expected to rise further in the future [1, 2]. The global effort against dementia and cognitive disorders focuses on early diagnosis with the aim of detecting cognitive dysfunction at the early stages of Alzheimer’s disease (AD). Mild cognitive impairment (MCI) represents a state between normal aging and the cognitive decline often associated with dementia. While patients with MCI retain a high degree of functionality and the ability to live autonomously, they may suffer from memory impairment, often with associated subjective memory complaints. At the same time they may be unable to perform complex activities such as managing doctor appointments and financial planning [3, 4].
Early intervention at the MCI stage can allow patients to retain and even improve their cognitive functioning [5]. In an effort to aid detection of MCI, a new generation of computerized cognitive tests has been developed. These short tests focus on screening and allow the clinician to quickly identify the possible existence of MCI [6, 7]. At the same time the potential of using virtual reality (VR) applications in MCI screening is being researched. Since VR allows the user to experience and interact with a virtual environment, it allows for a high degree of ecological validity [8] and therefore it could provide indications of possible cognitive and functional decline. As older adults often exhibit a positive attitude toward VR environments [9] and new technology in general, [10] VR-based cognitive assessment could allow for a more pleasant testing experience.
Performance in VR applications has been shown to relate to subjective memory complaints [11] and VR applications have exhibited promising results for assessing cognitive functions [11–13]. Already studies have attempted to use VR applications featuring a shopping task as screening tools for MCI [14, 15]. The most recent study has shown promising results with a correct classification rate (CCR) of 87.30% (sensitivity: 82.35%; specificity: 95.24%) when differentiating between MCI patients and healthy older adults [15]. That study had used the same VR application (VSM), utilizing data from a single administration of the exercise by an examiner in order to differentiate between healthy older adults and MCI patients. VR elements are also being introduced to computerized cognitive tests; however, there are yet no indications of whether they will enhance the diagnostic ability of these tests [6, 16].
Taking into account the ability of VR applications to be used for cognitive screening in combination with the widespread use of cognitive training applications by older adults, it is reasonable to wonder if VR cognitive training applications can be used as remote assessment tools by utilizing longitudinal data such as the average performance of each user, in order to detect possible cognitive impairment due to MCI. Longitudinal performance has been studied in computerized cognitive tests; however, these studies focused mainly on practice effects. Practice effects have been examined in computerized cognitive tests [6], and they have been used to aid diagnosis when using the CogState computerized battery [17, 18]. Currently there is no literature on the feasibility of using longitudinal performance in a VR application for detecting MCI. The possible use of longitudinal performance on a VR cognitive training exercise would be preferable to the use of longitudinal performance or practice effects in traditional computerized neuropsychological tests. Cognitive training applications are generally more entertaining and better tolerated by older adults and unlike standardized testing procedures they are less likely to cause anxiety. This new method could supplement existing screening protocols in cases where older adults are unable or unwilling to visit appropriate health services for neuropsychological assessment.
Following the above, the aim of this study is to assess whether monitoring longitudinal performance could provide useful diagnostic information for older adults using a self-administered VR cognitive training application at home. The choice of longitudinal performance, and more specifically average performance over time, as an indicator was made with the aim of reducing random variations of performance while at the same time negating the need for an examiner and strict administration protocols.
MATERIALS AND METHODS
Participants
Participants were recruited from the sample of a previous study assessing the potential of a VR application as a screening test for MCI [15]. Exclusion criteria were: diagnosis of dementia or another major neurological or psychiatric disorder, illiteracy, health issues such as motor and vision difficulties that could interfere with the use of the exercise, treatment with cholinesterase inhibitors or other drugs that could affect cognitive performance, alcoholism or drug abuse, and participation in other studies. Recruitment took place between June and July 2014. MCI patients were paired randomly with healthy older adults with similar age and education characteristics in order to ensure a balanced sample. All participants were informed about the purpose of the study before providing their consent. Diagnosis was confirmed by a neurologist after a full neurological, neuropsychological, and laboratory assessment.
Demographic characteristics of participants are shown in Table 1. Mean age was 63.75 years ranging from 56 to 72 years. Subjects had a mean of 11.08 years of formal education ranging from 6 to 16 years. Participants included 3 males and 9 females. The sample included 6 healthy older adults and 6 MCI patients. No statistically significant differences were observed between the healthy and MCI groups in age and education while a statistically significant difference in Mini-Mental State Examination (MMSE) scores was observed as expected. Relatively younger older adults were chosen for this study as the aim was to test the new screening method at people who have recently entered old age. Our experience has shown that this segment of older adults is more interested in using self-administered cognitive training exercises thus the goal was to assess the effectiveness of the new screening method in thissegment.
Neuropsychological assessment
Participants were administered a neuropsychological test battery including the following cognitive tests: MMSE, Ray Auditory Verbal Learning Test (RAVLT), a Greek version of the “FAS” verbal fluency test, Rey-Osterrieth Complex Figure Test (ROCFT), Rivermead Behavioral Memory Test (RBMT), Test of Everyday Attention (TEA) items 1, 4, & 6, and Trail Making Test part B. It also included the following functional scales: Functional Rating Scale for Symptoms of Dementia (FRSSD), Functional Cognitive Assessment Scale (FUCAS), and Clinical Dementia Rating (CDR). Furthermore, the battery included the following measures of depression, anxiety, and neuropsychiatric symptoms: Beck Anxiety Inventory (BAI), Beck Depression Inventory (BDI), Geriatric Depression Scale (BDS), and the Perceived Stress Scale (PSS). The battery was used to gather qualitative and quantitative data on all participants and help the neurologist provide an accurate diagnosis. MMSE, RAVLT, ROCFT, and RBMT were used to assess cognitive functioning and aid diagnosis, FUCAS and FRSSD for assessment of activities of daily living (ADL), and GDS was used to determine the presence of depressive symptoms. The rest of the tests were included to provide qualitative data and examine the correlations between VSM performance and performance in tests assessing specific cognitive domains.
Virtual reality cognitive training application
The virtual super market (VSM) application has been developed by the Information Technologies Institute in collaboration with the Greek Association of Alzheimer’s Disease and Related Disorders (GAADRD). It is a simple VR cognitive training program with a low degree of immersion, based on the state of the art in the field of VR applications for cognitive assessment. It runs on any tablet device with Android® operating system, whereas PC and web-based versions also exist. The application has been described in detail in a previous study [15].
The VSM is designed to mimic one of the most common activities of daily living, daily shopping in a super market. It features a short demographics questionnaire followed by instructions before the user can engage in the virtual experience. A shopping list is provided to the user who is allowed to navigate freely, buy the products he or she is instructed to buy, and proceed to pay at the till, by entering the correct amount. The application is aimed at training a multitude of cognitive processes namely visual and verbal memory, executive function, attention, and spatial navigation with the emphasis placed on executive function. The need of simultaneous activation of different cognitive processes makes the program challenging enough to correspond to the ability of the target population while reducing ceiling effects. Randomization of the shopping list in each trial allows the application to maintain its difficulty level and limit learning effects.
In this study, a modified version of the VSM exercise, the VSM remote assessment routine (VSM-RAR), was used. VSM-RAR is an automated administration protocol which includes 5 administrations of the VSM exercise in each of the available difficulty levels, starting at level 1 and progressing up to (and including) level 4, for a total of 20 administrations of the exercise. These administrations are preceded by 5 familiarization administrations at difficulty level 1. The score of familiarization administrations is not calculated, as their main aim is to allow the user to get accustomed to the exercise before his or her performance is assessed. The large number of familiarization administrations in combination with the fact that participants had also participated in the previous VSM study (and were familiar with the exercise) ensured that there would be limited variation in performance due to lack of familiarization or misunderstanding of instructions.
Administration of the cognitive training exercise
Participants were given a 10-inch tablet PC with custom software that launched the VSM-RAR application on startup, which they could keep at their home for one month. They were instructed to self-administer the cognitive exercise at least once a day and they were informed that they were allowed to administer the exercise as many times as they wanted; however, they were advised not to administer the exercise more than 5 times each day as this could lead to fatigue. They were informed that the exercise featured a program of increasing difficulty which would raise the difficulty level after 5 administrations in each difficulty level. More specifically they were informed that the program featured 5 training administrations at difficulty level 1 followed by 5 administrations in each of the 4 difficulty levels. After the completion of the program, the difficulty settings were unlocked and the participants were able to select the difficulty level they wanted. One training administration was conducted at a day center of the GAADRD under the supervision of a psychologist while subsequent administrations were carried out in the participants’ homes.
RESULTS
Data acquisition and preparation
At the end of each exercise, the program measures the completion time (duration) and stores four categorical measures which answer to the following: if the correct items were bought (CorrectItems), if the correct money were paid (CorrectMoney), if unlisted items were bought (BoughtUnlisted), and if the correct item types were acquired (CorrectTypes).
The collected dataset is comprised of two groups, MCI and Healthy, which construct the variable class (MCI = 0, Healthy = 1). The duration variable was normally distributed for the 20 data points (5 points per level) for each of the 12 subjects, as assessed by the Shapiro-Wilk’s test (p > 0.05). Outliers were identified and removed through the 2.5 SD rule. A summary of the average duration per level for the healthy and MCI subjects is depicted in Fig 1.
The mean duration time per subject (MeanDur), i.e., the average time needed to complete the exercise, was selected as a feature replacing the duration variable in order to eliminate the intra-level differences and fit the analysis. Additionally, the MeanDur variable was found to follow a normal distribution, as assessed by the Shapiro-Wilk test (p > 0.05). Using independent samples t-test, it was found that mean duration variable (MeanDur) was strongly influenced by the class variable (t = 5.560, p < 0.0001). Figure 2 portrays the average duration per class depicting the class influence on MeanDur variable, as assessed by the t-test.
Data analysis
A chi square test for association was conducted between gender and the binary VSM-RAR variables. There existed no statistically significant association between gender and the binary VSM-RAR variables (p > 0.05). Using independent samples t-test, it was found that gender has no influence on the duration variable (p > 0.05).
Through independent samples t-tests, no association between the binary variables and education or age (p > 0.05) was found. Using the Kendall’s tau coefficient for correlation it was found that age and education influenced the MeanDur variable. Positive correlation was found between age and the average duration variable (MeanDur) (τ= 0.240, p < 0.001) and negative between education and MeanDur(τ= –0.233, p < 0.001).
Correlations with established neuropsychological tests and past performance on the VSM exercise
Correlations between VSM-RAR variables and the performance of the participants in the previous VSM study [15] (where the exercise was administered only once and administration was conducted by an examiner) were examined, as well as correlations between VSM-RAR variables and established neuropsychological tests. The average duration (MeanDur variable) was found to correlate significantly with performance in the previous VSM study. Correlation significance was found to be at the p < 0.001 level, using the Kendall’s tau coefficient (0.822, p < 0.001). Correlations between MeanDur and established neuropsychological tests were also assessed by the Kendall’s tau coefficient. Significant association was found with FUCAS, TEA4r (TEA visual elevator raw score) and ROCFT1 (ROCFT pattern copying) as shown in Table 2. At the same time, there was a trend for correlations with TEA1b, TEA4t, TEA6t, RBMT1, RAVLT3, andRAVLT4.
Association of VSM-RAR variables with healthy/MCI class
A chi-square test for association was conducted between class and the binary VSM-RAR variables, namely CorrectMoney, CorrectTypes, CorrectItems, and BoughtUnlisted. All expected cell frequencies were greater than five for all cases. There existed no statistically significant association between class and the binary variables (p > 0.05). MeanDur variable associates with class, as was indicated by an independent samples t-test (t = 5.560, p < 0.001).
Discrimination between healthy and MCI
A Naïve Bayes classifier was chosen for the discrimination between the two classes, mainly because of its effectiveness in small sample sizes. Another reason for that choice is that Naïve Bayes does not assume any particular distribution for the features.
Feature selection and classification
The data (12 subjects, 6 Healthy, 6 MCI) were split in two subsets, being approximately 60% and 40% of the entire dataset, respectively, and each subset was used once for training and once for testing. Essentially this was 2-fold cross-validation strategy modified to account for the small sample size.
Feature selection
To select suitable features for classification among the VSM-RAR variables, the test sample margin from multiple models of the Naïve Bayes classifier was estimated using boxplots [19]. Test sample margins were estimated for all the different combinations of the VSM-RAR variables and their distribution was found to be the same. Figure 3 depicts two cases, one with all the predictors and one with just one predictor. The simplest model, with the lower variance, comprising of one feature, namely the MeanDur variable was chosen.
Classification
The average misclassification error rate for the model with the MeanDur variable as a predictor was 0.0833, whereas the average CCR in detecting MCI patients was 91.8% with a sensitivity and specificity of 94% and 89%, respectively.
DISCUSSION
Influenced by the increased use of VR technology and computerized cognitive training exercises in the field of cognitive disorders and geriatric preventive medicine, we aimed to combine cognitive training with screening for MCI while eliminating the need for an examiner and associated costs. Our goal was to assess the potential of using VSM longitudinal performance to distinguish between healthy older adults and MCI patients and compare the discriminant ability of this method with the discriminant ability of a single administration of VSM as assessed in a previous study [15]. Additionally we aimed to validate longitudinal performance in the VSM exercise against performance in traditional neuropsychological measures.
Feasibility of the VSM-RAR exercise
All participants were able to use the exercise on their own without any usability issues/problems. Furthermore, all participants completed the assigned number of administrations of the exercise and continued using the exercise after the completion of the research protocol. Despite being given the contact information of a researcher who was available for technical support, no participants reported any problems. A researcher called each participant twice during the study to confirm that no issues were present and all participants reported no issues with the application and the VSM-RAR exercise. A few instances of minor issues concerning the use of the Tablet device were reported, e.g., leaving the device switched on because of mistakes/confusion concerning the shut-down procedure. Similar to the previous VSM study [15], all participants described VSM-RAR as enjoyable and engaging.
Correlation of VSM-RAR variables with established neuropsychological measures and past performance on the VSM exercise
Contrary to the previous VSM study, [15] binary VSM-RAR measures did not correlate significantly with established neuropsychological measures. This can be attributed to the fact that participants were familiar with the exercise due to their participation in the previous study and the 5 familiarization administrations that were included in VSM-RAR. By being allowed to familiarize themselves with the exercise, they were able to make fewer mistakes and thus mistakes could not be used to distinguish between healthy and MCI.
At the same time, MeanDur variable correlates with FUCAS, TEA4r, and ROCFT1. It is worth noting that these correlations were also present in the previous VSM study [15]. The correlation of VSM-RAR performance with FUCAS and TEA scores indicates a strong executive component in VSM-RAR while correlation with ROCFT indicates a visuo-spatial component. These correlations were expected based on the previous VSM study and the structure of the VSM exercise. Furthermore, there are many examples of VR super market environments being successfully used to train executive function in the relevant literature [13, 21]. Similarly to the previous VSM study [15], VSM-RAR performance did not correlate with any anxiety or depression measures further signifying that VSM is unaffected by depression or anxiety. It is also worth noting that correlations that appeared in the previous VSM study appeared in this study as well however some of them did not reach statistical significance probably due to the small sample size.
As was expected, VSM-RAR average performance as well as the average performance on each of the 4 difficulty levels, correlated with performance in the previous VSM study where only one administration was used [15].
Discrimination between MCI patients and healthy older adults
Average performance differed significantly between healthy older adults and MCI patients. MCI patients needed almost twice the time it took healthy older adults to complete the exercise. Using the average performance of each user and more specifically the mean duration for all VSM administrations for that user (after excluding outliers), VSM-RAR achieved a CCR of 91.8% with a sensitivity and specificity of 94% and 89%, respectively. The previous VSM study had achieved a CCR of 87.3% with a sensitivity of 82.35% and a specificity of 95.24%, utilizing one administration of the exercise by an examiner [15]. By utilizing the average performance of each user the CCR of the VSM was increased despite using a less structured administration protocol and not involving a trained examiner in the testing process. Despite the effect of age and education on the performance of participants in the VSM-RAR study, the difference in performance between the healthy and MCI groups was clear. The only occasion where the difference was not very pronounced was when comparing the performance of a 70-year-old healthy older adult with 6 years of education to the performance of a 57-year-old MCI patient with 15 years of formal education. It is obvious that the effect of age and education on performance can have a confounding effect only when people with radically different age and education profiles are compared. The differences in average duration between healthy older adults and MCI patients could probably be attributed to the issues in executive function exhibited by these patients. Such issues have been assessed using VR environments in previous studies [13, 21]. The possible effect of executive dysfunction on average duration is also evident in the correlations of average duration with standardized neuropsychological tests. It is worth noting that a strong executive component in the VSM was evident through correlations with established tests in the previous VSM study where duration was also the variable that exhibited the most and the strongest correlations with establishedtests [15].
CONCLUSION
There is a growing research and commercial interest on e-health applications and the use of new technologies in the diagnosis and support of patients with dementia and other cognitive disorders [22, 23]. An increasing number of older adults are using cognitive training software in order to retain and improve their cognitive functioning [24, 25]. At the same time, there is no consensus on the feasibility and cost-effectiveness of screening the entire older adult population for cognitive disorders [26–28]. We propose a paradigm shift concerning population screening for cognitive disorders, namely the combination of cognitive training and cognitive screening. Since many older adults are already using computerized cognitive training software, specialized algorithms embedded in the software could analyze longitudinal performance for cognitive screening purposes thus allowing such software to act as an early warning system for signs of cognitive decline. This new method could be targeted to younger older adults who are still healthy and have not exhibited signs of cognitive impairment allowing for detection of cognitive disorders at their very earliest stage. While not being necessarily better than traditional neuropsychological tests administered by experts, this method is aimed at people that are unable to visit appropriate health services or people who are living at regions where the waiting list in appropriate health services is too long thus deterring people from visiting such services. Such problems exist in many countries and regions and people end up visiting appropriate health services only after they themselves are aware of their cognitive problems and thus impairment is detected at a later stage further limiting intervention options. The engaging nature of computerized cognitive training software and the ability to automate screening without incurring any additional costs or burden to the health system could allow for large scale screening and ease the burden of primary health care services. In essence, the proposed method hinges on combining screening with something older adults enjoy and are willing to use, namely self-administered computerized exercises. Evidently this study provides only some preliminary validation and support for the proposed paradigm shift. Its main aim is to inspire further research interest in that direction.
The strengths of this study include the thorough examination of all participants, as a full neurological, neuropsychological, and laboratory assessment was conducted in order to confer a diagnosis, the large number of established neuropsychological measures administered to participants, the good usability and engaging nature of the VSM-RAR and the use of repeated measurements. Limitations include the small sample size and the inclusion of significantly more female participants. Furthermore, this study cannot provide data on VSM-RAR’s test-retest reliability or ability to detect change over time; however, such data may be irrelevant as the aim of the application is to provide pre-clinical screening acting as an early warning system for cognitive decline which will prompt the user to visit an appropriate health care service for further, more detailed assessment.
Future studies should focus on further validation of the proposed concept and also on examining methods for integration of such applications in the fabric of healthcare systems and healthcare information policies and campaigns. At the same time, the attitudes and needs of the users of such applications should be examined in order to create more engaging applications which will motivate older adults to self-monitor and train their cognitive functions.
Footnotes
ACKNOWLEDGMENTS
The work of authors from CERTH/ITI is partially supported by the EU H2020 funded project Frailsafe (Grant agreement no: 690140). Stelios Zygouris is a PhD/doctoral candidate at the Aristotle University of Thessaloniki, Greece/University of Heidelberg, Germany (joint program) and receives funding by the Robert Bosch Foundation Stuttgart within the Graduate Program People with Dementia in General Hospitals (GPPDGH), located at the Network Aging Research (NAR), University of Heidelberg, Germany. The authors wish to thank all the participants who volunteered in this study.
