Abstract
Widespread cognitive test screening as part of tele-public health initiatives necessitates a test that is self-administered online and automatically scored, with no clinician effort. The feasibility of unsupervised cognitive screening is unclear. We adapted the Self-Administered Tasks Uncovering Risk of Neurodegeneration (SATURN) to make it suitable for self-administration and automatic scoring. A sample of 364 healthy older adults completed SATURN via a web browser, in a fully independent manner. SATURN’s overall score was not modulated by gender, education, reading speed, the time of day at which the test was taken, or an individual’s familiarity with technology. SATURN proved extremely portable across operating systems. Importantly, comments from participants reported satisfaction with the experience and the clarity of the instructions. SATURN represents a fast and easy screening tool that can be used for a first assessment, during a routine test or clinical evaluation, or during periodic health monitoring, in person or remotely.
• A low-cost and easily accessible cognitive screening tool • A tool in the public domain that could be adapted to specific needs
• Easy adaptation to different languages with consequent improvement of replicability across populations • A proof of the feasibility of a self-administered and unsupervised screening tool • Has the potential to improve screening of cognitive decline when used as periodic health monitoringWhat this paper adds
Applications of study findings
Background
Dementia, namely, the deterioration of cognitive function beyond normal biological aging, currently affects more than 55 million people worldwide, with 10 million new cases every year (World Health Organization, 2017). There is variability in how dementia manifests itself in the early stages of the disease, and patients with gradual onset may be overlooked. Screening for cognitive impairment can facilitate early detection of dementia and neurodegenerative conditions at a pre-dementia stage. Early diagnosis is extremely important for both patients and caregivers (Burns, 2012, de Vugt & Verhey, 2013), allowing them to plan for the future, initiate pharmacotherapy (Lee et al., 2004; Rasmussen & Langerman, 2019), and address problematic habits or other treatable conditions that may worsen cognitive decline (Bauer et al., 2014; Löppönen et al., 2004; Poblador-Plou et al., 2014).
Early diagnosis can be achieved using telemedicine (Koo & Vizer, 2019; Sternin et al., 2019). At least in the first phases of testing, computerized and telemedicine approaches can reduce clinician time commitment, bypassing a potential barrier to diagnosis. Telemedicine also makes it possible to screen people with physical mobility or transportation barriers to medical care, and those who do not feel the need to contact a clinician (e.g., poor insight, distorted sense of what is “age-typical,” or simply at a very early stage of cognitive decline). To facilitate early diagnosis via telemedicine delivered to an individual, there is a need for a low-cost cognitive screening, with high sensitivity at the early stages, that can be self-administered online, in an unsupervised manner, and automatically scored (García-Casal et al., 2017; Tsoy et al., 2021; Zygouris & Tsolaki, 2015). Computer testing with older adults is reliable (Sternin et al., 2019; Vaportzis et al., 2017), and computerized screening achieves comparable, or even better, diagnostic outcomes than its paper-and-pencil counterpart (Brinkman et al., 2014; Chan et al., 2021; Cyr et al., 2021). At the frontier of this screening paradigm, tele-public health initiatives might uncover cases of cognitive impairment through inexpensive and widespread testing of large cohorts of ostensibly healthy older adults—much like in-person health fairs can uncover cases of hypertension in the same population (Lucky et al., 2011). Finally, low-cost computerized screening tools could be incorporated into programs providing care for adults with limited income and resources.
Recently, the Self-Administered Tasks Uncovering Risk of Neurodegeneration (SATURN) (Bissig et al., 2020) was developed and validated for in-clinic use. This freely-available electronic screening tool was comparable to the Montreal Cognitive Assessment (MoCA) (Nasreddine et al., 2005) at detecting mild cognitive impairment and dementia. SATURN uses only visual stimuli (some reading capability is necessary), bypassing potential language barriers between the assessor and the patient or hearing impairment that is prevalent in the target population, and reduces some of the hardware and software requirements for remote use (e.g., no volume calibration for speakers, no multimodal stimuli to time-synchronize). SATURN is self-administered and automatically scored, two aspects that are common barriers to unsupervised use. Importantly, because SATURN is in the public domain, there were no barriers to its free adaptation to our project needs. We modified SATURN to be delivered remotely, in a completely unsupervised manner, to a relatively large cohort of healthy older adults recruited online. As usability is the weakest methodological aspect being explored when developing computerized screening tests (García-Casal et al., 2017), we also assessed SATURN’s usability in this setting (Lewis, 1995, 2002).
Methods
Participants
Demographics.
Note. M, Mean; SD, standard deviation, and sample size (n) for age, years of education, gender, reading speed, total time to complete SATURN (from the “Welcome screen” to the “Goodbye screen”), and time-on-task, that is, time spent on cognitive tasks, overall and for each sample.
Statistics are also reported for the total SATURN score, and the scores associated to each sub-domain.
aTwo participants did not complete the questionnaires
Cognitive Screening
SATURN combines scores from 20 brief tasks, cumulatively testing several cognitive domains, with a maximum score of 30 points. For tasks and scoring details, we refer the reader to the original description and testing of SATURN (Bissig et al., 2020). To make it suitable for our purposes, the original version of SATURN was replicated using PsychoPy® (v2021.2.3) (Peirce et al., 2019), translated into JavaScript, and uploaded on Pavlovia (https://pavlovia.org/). PsychoPy® is a Python-based free cross-platform package allowing researchers to run a wide range of behavioral experiments. PsychoPy® can run studies online using Pavlovia, a high-performance, hardware-accelerated port of the PsychoPy Python library. Both PsychoPy and Pavlovia are products of Open Science Tools Ltd. (https://opensciencetools.org/). While PsychoPy is free to use, Pavlovia requires a small fee (£0.24 GBP per participant or a site license) for hosting the experiment.
The following minor changes were necessary to make the tasks suitable for online unsupervised administration and scoring. First, in the original version of SATURN, participants were initially prompted to read “close your eyes,” and compliance was determined by the assessor to confirm that they had the necessary literacy, sensory function, and alertness to proceed with the subsequent tasks. In the current version, we first prompted individuals to choose one of two shapes on the screen (“Click on the square to proceed”), a test that can be automatically scored. Whereas the original version probed incidental memory by asking which phrase was read at the start, the current version later asks the participant to remember which shape had been selected. Second, we removed the orientation question asking which state the participant was in, as it could not be scored automatically (i.e., there was no reliable way to assess the correctness of the response). We did not expect the removal of this question to impact SATURN’s sensitivity, as the item showed substantial ceiling effects in scorable cognitively impaired patients in the original study (Bissig et al., 2019). Our version thus had 19 tasks and a maximum total score of 29.
Usability Questionnaire
To assess the usability of the online screening tool, we administered an ad hoc modified version of the Post-study System Usability Questionnaire (PSSUQ) (Lewis, 1995, 2002). The PSSUQ evaluates four dimensions: overall satisfaction; system usefulness; information quality and interface quality, rated on a 7-point Likert scale (from 1 “strongly agree” to 7 “strongly disagree,” also allowing for a “not applicable” option), with lower scores associated with better quality. We also administered a questionnaire developed in-house to assess the participants’ familiarity with everyday technology. The two questionnaires are reported in the Supplementary Materials. SATURN data were collected as part of a larger study in two waves. In the first wave (Tagliabue et al., 2022), only individuals with a SATURN score higher than 25 would complete additional cognitive tasks over multiple sessions. In these subsequent sessions, participants also completed a series of questionnaires that included the TECH questionnaire. However, this resulted in only individuals with a SATURN score higher than 25 completing the TECH questionnaire in the first wave of data, thus biasing usability data from sample A. Therefore, in the second wave of recruitment, we decided to ask participants to complete the TECH and PSSQU questionnaires at the same time as SATURN, and to analyze only TECH and PSSQU data from sample B.
Procedure
After recruitment through the Prolific platform, participants provided their consent, confirmed their eligibility criteria, and were enrolled in the study. Individuals were then redirected to the online version of SATURN. After SATURN, only individuals in sample B filled out the PSSUQ and the technology questionnaire. Throughout the whole session, they had no contact with the experimenters and completed the screening fully independently and in an unsupervised manner. Anonymized demographic information was collected through Prolific, and data on age, level of education, first language, and gender were further confirmed through the consent form.
Statistical Analysis
Statistical analysis was carried out with JASP for MAC (JASP Team, 2020). SATURN raw scores (maximum score: 29) were grouped into seven cognitive domains and summed to obtain a score for each domain (see Table 1). Participants’ education was first assessed as the highest level of education completed and later recoded in actual years of school attended. 1 Reading speed (words per second) was calculated by dividing the reading time of instruction screens containing only text and the number of words therein. Dependency between variables was assessed with Spearman’s correlation. To investigate the effect of time-of-day on performance, we coded this variable by dividing a 24th day into five intervals. 2 Usability and familiarity with technology were analyzed by age ranges (65–69 years and 70–75 years) and running a 2 (GENDER) x 2 (AGE RANGE) ANOVA on the questionnaires’ scores. Open comments from 166 users completing the PSSQU were classified post-hoc into five categories: users who found SATURN enjoyable, interesting, or fun (FUN); users who found the instructions and tests clearly explained and easy to follow (CLEAR); users reporting on their own perceived performance (PERCEIVED PERFORMANCE); users reporting technical problems or errors (e.g., misspelled words) (TECHNICAL ISSUES); users with nothing to report (NONE). Answers for each category were counted while keeping in mind that participants could provide more than one answer. SATURN’s replicability was assessed in two ways, 1) by comparing the independent sample A and B with each other via t-test and 2) via correlation analysis with data from the original SATURN validation, collected in person (Bissig et al., 2019). We decided to use correlation to assess replicability instead of a direct comparison of scores because of the differences in demographics between the samples.
Additional univariate comparisons (e.g., with demographics) were run through t-tests or X2 tests. False Discovery Rate correction for multiple comparisons was used when necessary, and two-tailed p < .05 was considered significant. SATURN source code and the anonymized data are available on the Open Science Framework: https://osf.io/xnj5m/.
Results
Demographic Characteristics
Descriptive statistics are reported in Table 1, as are statistics for the total SATURN score and its sub-domains. For the entire cohort (Sample A and B), we found no gender-related differences in age, reading speed, education, total time-on-task, or overall SATURN score (t-tests, all pfdr> 0.1). We found a significant correlation between SATURN score and age (
SATURN Single-Item Statistics.
Note. M, Mean; SD, standard deviation, and maximum score, are reported for each item.
Mean and standard deviation of the time spent on each task is also reported in seconds. TMT = Trail Making Task.
Reading Speed
Reading speed (words per second) did not significantly correlate with the overall SATURN score. We found no difference in reading speed between male and female participants. Reading speed correlated significantly with age (
Time-of-Day Effects
As we had no control over when participants chose to complete SATURN, we investigated the potential effect of time-of-day on SATURN scores. The time of the day at which the test was taken was binned into five intervals of 4 hours during the day, and one interval of 8 hours during the night (24:00 to 08:00). We found no association between age and the time the test was completed (
Usability
Only participants in sample B were asked to complete the TECH and PSSU questionnaires. Of those, two did not complete the questionnaires. We found a significant effect of gender on familiarity with technology (F(1,160) = .350, p = .022, Count of users’ feedback in each category. Clear = number of users who found the instructions and tests clearly explained and easy to follow; Fun = number of users who found SATURN enjoyable, interesting, or fun; Perceived performance = number of users reporting on their own perceived performance; Technical issues = those reporting technical problems or errors (e.g., misspelled words in instructions or poor reactivity of the software); None = users with nothing to report. Users can give more than one answer.
Discussion
We adapted a version of SATURN for fully unsupervised remote use through an internet browser and used it to assess a sizable cohort of healthy, English-speaking, older adults. This adds to a previous in-person validation of SATURN, which showed performance comparable to the MoCA in detecting mild cognitive impairment and dementia (Bissig et al., 2019). We assessed the feasibility of remote use through score dependency on age, education, and gender, and on the time of the day the test was taken. We further assessed the robustness of scoring across independent samples, and we compared it with a third sample previously collected in person. Finally, we quantified the overall perceived usability of the system, in three different domains: usefulness, instructions quality, and graphics quality. There is an increasing need for screening tools to be used remotely in large cohorts. This need arises in several settings, ranging from clinical trials to public health applications.
A few other web-based or dedicated apps are available (Charalambous et al., 2020) (e.g., https://www.aptwebstudy.org, https://memtrax.com/amongst others) but they are seldom thoroughly tested, a process requiring considerable monetary and human resources. These web- or app-based tests, however, cannot be easily adapted to other uses (e.g., different populations or different environments). In addition, basic standards recommendations for medical health apps (high usability, clear language, privacy and security, and control over conflict of interests, for example, commercialization, advertising within the app (Charalambous et al., 2020; Larson, 2018)), are not always guaranteed. SATURN, by being in the public domain, has an advantage in this sense. Besides fulfilling the requisites for basic standard quality, SATURN can be readily moved from the original clinical setting to a remote large-cohort platform, as this study demonstrates. At the time of writing, SATURN is being translated and tested in several languages. Its public domain status facilitates this development and may be essential for a reproducible screening tool that will be comparable across countries.
Despite the narrow age range covered in this study, we found that SATURN scores correlated with age and education, as expected in cognitive screening tests (Ardila et al., 2000); no gender-related differences were evident. We found that time-of-day had no impact on SATURN scores: this is an important source of variability that we cannot control for when the test is performed remotely but could impact performance in both healthy and clinical populations (Blatter & Cajochen, 2007; Singh et al., 2016; Wilks et al., 2021).
With the caveat that our sample had a narrow age range (65–75 years) and was comprised of self-reported healthy individuals, we showed that SATURN scores are consistent across different independent samples sharing similar demographic characteristics. Furthermore, SATURN scores correlated well with the scores of a smaller sample collected in person in the original validation study (Bissig et al., 2019). We only found a significant difference in reading speed between data collected online and in-person data (respectively, 258 (sd ˜ 78) versus 160 (sd ˜ 44) words per minute). Differences could be due to familiarity with the interface and testing environment, the use of monetary compensation, or other unmeasured factors. Reassuringly, values were similar between samples and did not correlate with SATURN scores.
Another important factor in remote unsupervised testing is represented by the dropout rate: individuals, especially those with some level of cognitive decline, might find SATURN too demanding to complete on their own. We found that only 1% of those recruited in this healthy sample dropped out before completing the tasks, a small fraction when compared to Prolific or other online recruiting platforms (Peer et al., 2017).
In the current study, individuals used only desktop or laptop computers running Linux (15/364 = 4%), Mac (42/364 = 12%), and Windows (307/364 = 84%) systems. While our sample is biased toward individuals who are familiar with computers, SATURN proved extremely portable with only 2% of participants reporting technical issues that prevent them from completing the test.
We found SATURN to be enjoyable, with high reported scores for usability, instruction clarity, and interface quality. These qualities are critical (García-Casal et al., 2017) for clinical testing, longitudinal monitoring, and tele-public health applications. Moreover, SATURN was not sensitive to previous experience in everyday technology (Lee Meeuw Kjoe et al., 2021).
Electronic cognitive testing in general has potential advantages over classical paper-and-pencil versions: stimuli can be easily randomized, and multiple test versions automatically implemented. The measurable variables range from accuracy to completion time, time spent on tasks, and reading time. Additionally, movement-related parameters (e.g., related to mouse use) could be recorded. All these pieces of information can be combined to better define the neuro-cognitive user’s profile and, in turn, provide hints to improve early diagnosis. Tests like SATURN can be rapidly performed without supervision, and immediately and automatically scored, which is appealing both for routine health-maintenance visits and for focused clinical evaluations.
The present adaptation of SATURN highlights a further advantage to electronic cognitive testing: testing can be delivered at home, at any time of day. More than just a convenience, this feature is necessary for tele-health clinical services, and for remote basic science applications, which blossomed during the COVID-19 pandemic. The implementation of SATURN might even be used for a “tele-public health” approach to widespread cognitive screening.
Limitations of our study include the narrow age range and the focus on self-reported healthy older adults. Further validation should include a wider age range and individuals with various levels of cognitive function to measure sensitivity and specificity. Individuals of different races/ethnicities and levels of education need to be considered in future work, as well as the use of mobile devices (smartphones and tablets). Finally, future work should focus on translating and validating SATURN in different languages, while collecting normative data specific for each population.
Following in the principles of the original SATURN implementation, we also make all SATURN materials, related to the current implementation, freely available for download (https://osf.io/xnj5m/), and encourage readers to use, share, and adapt SATURN without restriction.
Supplemental Material
Supplemental Material - Feasibility of Remote Unsupervised Cognitive Screening With SATURN in Older Adults
Supplemental Material for Feasibility of Remote Unsupervised Cognitive Screening With SATURN in Older Adults by Chiara F. Tagliabue, David Bissig, Jeffrey Kaye, Veronica Mazza, and Sara Assecondi in Journal of Applied Gerontology.
Supplemental Material
Footnotes
Acknowledgments
The authors thank Greta Varesio and Giulia Buzi for their help with data collection and literature review.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: C.F.T, D.B., J.K., and V.M. declare that they have no conflict of interest. SA is named inventor on a patent application (publication number WO/2022/106850) jointly submitted by the University of Birmingham and Dalhousie University, titled “Improving cognitive function,” currently in the PCT phase (international application No. PCT/G82021/053019).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Fondazione Cassa di Risparmio di Trento e Rovereto (CARITRO) [grant number 000040103444CARITRO]. JAK is supported by National Institute on Aging (NIA) [grant numbers P30 AG066518 and P30 AG024978]. The sponsors were not involved in the study design or in the collection, analysis, and interpretation of data.
Ethics approval
The study was approved by the University of Trento Research Ethics Committee (Protocol No. 2021-041).
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
