Abstract
Background
Episodic memory tests in Alzheimer's disease (AD) often depend on verbal recall or drawing.
Objective
To develop Visual Image Simple Recognition Test (VISRET) and evaluate its psychometric and clinical performance.
Methods
We studied 149 individuals (healthy participants [HP] = 62; AD = 53; patients with aphasia [AP] = 34). We assessed reliability (split-half Spearman–Brown [SB]), known-groups validity with age-adjusted models and age-stratified analyses, and a Bayesian logistic model (AD versus HP). A Bayesian linear model produced a composite Memory Score and highest posterior density (HPD)-based cut-offs using HP alone, subsequently evaluated by five-fold cross-validation. Convergent and discriminant validity were assessed by correlating VISRET with established neuropsychological tests in non-aphasic AD.
Results
Internal consistency was good in AD (SB = 0.87) and acceptable when pooled within-group (SB = 0.84). AD–HP discrimination was large, persisting after age adjustment, within age strata, and following aphasic AD exclusion. The Bayesian model showed excellent discrimination (posterior-mean AUC = 0.99, 95% HPD = 0.97–0.99). AP differed from HP but with trivial absolute differences (total 39.6 versus 39.4; false recognitions 0.1 versus 0.3). In non-aphasic AD, VISRET total correlated with an established episodic memory test (ρ=0.60) but demonstrated weak or near-zero correlations with non-memory domains (e.g., nonverbal reasoning, ρ=0.02). Cross-validated, HP-derived Memory-Score cut-offs achieved mean AUC = 0.98; at the 95%-HPD threshold, sensitivity = 0.87 and specificity = 0.95; at 99%-HPD, sensitivity = 0.74 and specificity = 0.98.
Conclusions
VISRET is a brief, language-minimized recognition test facilitating AD-related memory impairment detection, with minimal practical impact of aphasia. The HP-derived Memory Score and cut-offs demonstrated stable cross-validation, suggesting potential clinical utility pending replication and external validation.
Keywords
Introduction
Episodic memory is the system that supports conscious recollection of personally experienced events, together with their temporal, spatial, and situational context. It is a component of declarative memory and is widely regarded as depending primarily on the medial temporal lobe system.1–3 Episodic memory impairment is a hallmark feature of Alzheimer's disease (AD) and frequently appears at the earliest stages of its clinical presentation. 4 According to recent diagnostic frameworks, such as the National Institute on Aging – Alzheimer's Association (NIA-AA) criteria, accurate assessment of episodic memory is essential for clinical diagnosis as well as disease severity evaluation, tracking progression, and guiding treatment decisions. 5 Memory functions are broadly categorized by modality—verbal versus visual—and by response format—recall, recognition, or reproduction. However, in current clinical practice, most assessments focus on verbal memory, particularly recall-based tasks that involve listening to or reading words and reproducing them from memory.6–7 These tests require both linguistic comprehension and expressive output.
Widely used tools such as the Mini-Mental State Examination (MMSE), 8 the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-cog), 7 and the Rey Auditory Verbal Learning Test (RAVLT), 9 rely heavily on these verbal recall tasks. Major longitudinal studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI), have similarly adopted verbal memory measures as primary cognitive endpoints, further cementing their dominance in both clinical and research contexts.10–11 These measures are sensitive to early-stage cognitive decline and predictive of AD progression,12–14 leading to frequent adoption in both clinical and research settings. However, these tools are inherently language-dependent. It is noteworthy that they are also susceptible to confounding by aphasia, which is observed in approximately 35–52% of patients with AD.15–17 In such cases, verbal deficits can be mistakenly interpreted as memory impairments, leading to potential overestimation of cognitive decline.18–19
In contrast, visual memory assessments offer a modality that is less dependent on language processing. These tests aim to evaluate episodic memory encoding and retention through visual stimuli. However, many traditional visual memory tests present their own limitations. The Benton Visual Retention Test and Rey–Osterrieth Complex Figure Test, for example, require participants to reproduce images through drawing, which introduces confounding effects from visuoconstructional ability or motor coordination. 20 Other tests, such as the Visual Association Test (VAT) 21 and Scenery Picture Memory Test, 22 were developed for older adults and use semantically meaningful stimuli, but they still require verbal responses, thereby reintroducing the very language dependence they aim to avoid. The Extended Visual Association Test (VAT-E) adds choice-based recognition items to reduce verbal output, but these items are positioned as adjuncts to the verbal task rather than an independently validated index. 23 The Cambridge Neuropsychological Test Automated Battery (CANTAB) includes a suite of computerized memory tasks delivered by touchscreen—largely language-independent visual-memory paradigms (Pattern Recognition Memory [PRM]) and a visual associative-learning paradigm (Paired Associates Learning [PAL]).24–25 While valuable, the use of abstract/low-semantic stimuli together with hardware/software and licensing requirements may limit accessibility and routine clinical uptake, particularly among older adults.26–27
The Wechsler Memory Scale (WMS) includes both verbal and visual subtests, providing a comprehensive assessment across modalities. 28 Recent versions of the WMS have introduced modifications to reduce the influence of language and motor impairments—for example, by adding recognition components and introducing tasks that minimize drawing requirements. 20 This is a valid and important strategy, particularly for assessing episodic memory in individuals with aphasia or physical disabilities. However, in the latest version, the WMS-IV, the Visual Reproduction subtest still requires examinees to draw figures from memory, thereby potentially confounding memory performance with constructional deficits.20,28 Additionally, the Designs subtest, which evaluates memory for abstract figures and their spatial locations within a grid, may engage visuospatial processing along the dorsal “where” pathway.29–30 This raises concern that individuals with visuospatial deficits—common in AD 5 —may be penalized due to difficulty processing spatial layouts instead of impaired memory. 31 Furthermore, WMS subtests are typically interpreted only as part of composite indices, and cutoff scores for individual subtests are not formally established or validated. 28 A similar constraint applies to the Rivermead Behavioural Memory Test, which blends verbal and visual tasks and mixes recall and recognition; in standard scoring, component items are not intended to yield independent domain-specific or recognition-only scores. 32 As a result, clinicians are generally required to administer multiple subtests to compute indices, increasing testing burden.
To address this unmet need, we developed the Visual Image Simple Recognition Test (VISRET), a delayed visual recognition test designed to directly assess episodic memory while minimizing language, motor, and visuoconstructional demands, with three key design principles:
Visual encoding modality: stimuli are presented visually, without verbal instruction or labeling, minimizing the influence of language during encoding; Recognition-based response format: participants make simple binary decisions (seen/unseen), reducing the burden associated with recall, and minimizing the influence of motor impairments or constructional deficits that may affect drawing or writing responses; Nonverbal response mode: answers can be provided verbally by saying “yes/no,” with pointing as an alternative when verbal responses are difficult, thereby avoiding the need for complex expressive language or fine motor control.
VISRET was conceived as a simple, culturally appropriate tool for older Japanese adults that leverages visual input and recognition-based output to reduce examiner burden and limit influences of aphasia, motor dysfunction, and constructional deficits. In this preliminary study, the aim was to evaluate the reliability, construct validity (including convergent and discriminant patterns), and clinical applicability of VISRET, by comparing performance across healthy participants (HP), patients with AD, and patients with aphasia (AP). We also tested whether VISRET can distinguish AD from HP while remaining relatively robust to aphasia-related deficits, and a Bayesian framework was also illustrated for deriving a composite memory score and candidate cut-offs.
Methods
Participants
A total of 149 individuals were included: 62 HP, 53 patients with AD, and 34 AP. Participants were categorized into three groups for analysis: the AD, AP, and HP groups. This study included patients diagnosed with AD who visited Hokkaido University Hospital, Hokkaido Neurosurgical Memorial Hospital, Hokuto Hospital, and Asahikawa Red Cross Hospital between February 2022 and May 2025. The AD diagnosis was based on the NIA-AA 2011 criteria, 5 selecting cases classified as probable AD. In total, the AD cohort comprised 53 patients, of whom six presented with anomic aphasia (without additional aphasic syndromes or speech disorders). All patients with AD underwent standardized memory assessments to confirm the presence of episodic memory impairment. In most cases, a Standard Verbal Paired-Associate Learning Test (SPA) developed by the Japan Society of Higher Brain Function (formerly the Japan Society for Higher Brain Dysfunction) 33 was administered. To avoid language-related confounding in the six aphasic AD cases, the visuospatial memory subtest of the WMS-revised (WMS-R) 34 was used instead. These individuals, who met this impairment criteria, were categorized as the AD group.
To assess the impact of aphasia on test performance, patients with primary progressive aphasia (PPA) and patients with cerebrovascular aphasia were included. Patients with PPA met the diagnostic criteria of Gorno-Tempini et al., 35 and cerebrovascular aphasia was confirmed by neuroimaging; right-hemisphere lesions were excluded because of their potential impact on visual memory. All cerebrovascular cases were assessed 3 weeks to 6 months after stroke. As an additional inclusion safeguard, all AP participants completed the WMS-R visuospatial memory subtest and those with an index (scale) score ≥ 70 (≈2 SD below the normative mean) were included. 34 Finally, to exclude concurrent dementia at the time of VISRET testing, all AP participants underwent structured clinical screening (history, neurological examination, informant-based daily functioning, and a non-verbal neuropsychological battery tailored for aphasia). No AP participant fulfilled the NIA-AA 2011 criteria 5 for probable AD dementia or the 2017 Dementia with Lewy Bodies (DLB) Consortium criteria, 36 and there was no clinical evidence suggestive of other dementia syndromes (e.g., frontotemporal dementia spectrum, vascular dementia, idiopathic normal-pressure hydrocephalus) at assessment. These participants were categorized as the AP group.
HP were recruited via two avenues: (1) a municipal senior staffing agency for older adults (the Sapporo City Kita-ku Silver Human Resources Center, a government-affiliated program that registers people aged ≥60 and dispatches them for light community work, to which we submitted a formal dispatch request; the Center then randomly selected registrants from its roster and dispatched them to our study under its standard procedures including payment of dispatch fees); and (2) convenience sampling through announcements to acquaintances and community contacts. Eligibility required a score of ≥ 28 on the MMSE-Japanese version (MMSE-J) 6 and a “normal” overall judgment on the SPA, 33 no history of central nervous system disease or head injury, no reported developmental disorder, and no hearing/vision problems that would interfere with testing. Hearing was functionally judged from the ability to follow spoken instructions during practice trials and functional near vision was verified as described below. Eligible individuals were enrolled as the HP group.
Written and verbal informed consent was obtained from all the participants or their family members in cognitive impairment cases. This study was approved by the Ethics Committee of the Faculty of Health Sciences, Hokkaido University Graduate School (Approval No. 23-84).
Neuropsychological evaluation
All the participants underwent a comprehensive neuropsychological assessment. Memory function was assessed using the VISRET and SPA; 33 however, the SPA was not performed in the AP group. Central executive function was evaluated using the Trail Making Test-Japanese version (TMT-J), 37 while attentional function was measured using the Digit Span Task. 34 In addition, all participants were screened for higher-order visuospatial deficits: unilateral spatial neglect was assessed with the Line Bisection Test, and dorsal-type simultanagnosia was assessed with a dot counting task. General intellectual function was assessed with the Raven's Coloured Progressive Matrices (RCPM) 38 in all participants (HP, AD, and AP), providing a language-minimized estimate that is robust to aphasia. The MMSE-J 6 was administered only to the HP group for intact global cognition screening. Language functions, including naming and word comprehension, were evaluated using the Japanese version of the Western Aphasia Battery-Japanese (WAB-J). 39
Functional vision considerations
A formal ophthalmologic acuity test was not conducted. To verify that reduced vision would not confound VISRET performance, we confirmed (i) accurate reading of both kana and kanji printed at 24-point size and (ii) performance on picture-based subtests of the WAB-J, specifically picture naming and word comprehension (picture-pointing). The WAB-J stimuli used for these tasks are line drawings and smaller than the images used in VISRET, which were presented at a substantially larger size. Participants who routinely used glasses or contact lenses were instructed in advance to bring and wear their habitual corrective lenses during testing. To minimize confounding by higher-order visuospatial impairments, we also screened for unilateral spatial neglect (Line Bisection Test) and for dorsal-type simultanagnosia (Dot Counting Task) before VISRET; no participant showed evidence of neglect or dorsal-type simultanagnosia on these brief screens. Based on these procedures, we evaluated whether there were any visual limitations likely to interfere with VISRET performance.
VISRET: overview and procedure
VISRET is a brief delayed visual recognition test administered in three phases—encoding, interference, and recognition (Figure 1). Total administration time is approximately 9 min (encoding 100 s, interference 5 min, recognition approximately 2 min) and typically remains under 10 min including brief instructions/practice. Participants gave “yes/no” responses verbally by default; when speech output was difficult, “yes/no” was indicated by pointing and recorded by the examiner. The procedures for each phase are detailed below.
Encoding Phase: Patients were shown 20 images sequentially, each displayed for 5 s. They were instructed to observe each image carefully and remember it for subsequent recognition. No verbal labeling or description was required, and participants were explicitly instructed not to name or verbally describe the images. After 5 s, the examiner replaced the current image with the subsequent image, which continued until all 20 images were presented. Interference Phase: A 5-min delay followed as part of the Interference Phase, during which patients completed a Cancellation Task. The Cancellation Task used in VISRET was originally developed for this study and follows a format similar to that included in the Clinical Assessment for Attention created by the Japan Society of Higher Brain Function. This task was implemented to prevent memory rehearsal and divert attention from the encoded images. Participants searched for and marked specific target symbols among the distractors on a printed sheet. This assesses attentional function and ensures that participants do not actively recall the images. Recognition Phase: Patients were presented with 40 randomly arranged images, comprising 20 previously seen images and 20 distractor images. Participants must determine whether they recognize each image, responding “yes” for previously seen images and “no” for new images. Responses were provided verbally by default; when speech output was difficult, pointing to indicate “yes” or “no” was permitted and recorded by the examiner. Performance was measured by the total number of correct responses (maximum score: 40) and false recognition counts, defined as the number of times a distractor image was incorrectly identified as previously seen.

Overview of the VISRET procedure. The VISRET procedure consists of three phases: Encoding, Interference, and Recognition. In the Encoding Phase, patients viewed and memorized 20 images, each presented for 5 s. In the Interference Phase, a 5-min cancellation task was performed to prevent memory rehearsal. In the Recognition Phase, patients identify previously seen images among 20 target and 20 distractor images, responding “Yes” for recognized images and “No” for distractors. For clarity, in this figure, distractor images are displayed in grayscale, although they are presented in color during the actual experiment. VISRET: Visual Image Simple Recognition Test. Image credits: dog—iStock by Getty Images/GlobalP; Asset ID 1482207116. wine—iStock by Getty Images/denphumi; Asset ID 178732508. muscat grapes—iStock by Getty Images/masa44; Asset ID 1353368427. cooked white rice—iStock by Getty Images/supamas lhakjit; Asset ID 1215923364. La France pear—iStock by Getty Images/manbo-photo; Asset ID 1295574778. kangaroo—iStock by Getty Images/Smileus; Asset ID 105097179.
All the images used in the VISRET were obtained from iStock (https://www.istockphoto.com/jp) to ensure high-quality visual stimuli. These images are royalty-free and completely authorized for research use. The images for both the presentation and recognition phases were carefully selected to ensure diversity across multiple categories, including animals, plants, fruits and vegetables, vehicles, indoor and household items, tools, processed foods, buildings, symbols, and body parts. To reduce the cognitive load, images familiar to both younger and older Japanese adults were chosen. Each image was presented in color with a single object and no background to enhance clarity and prevent distraction.
Statistical analysis
All the statistical analyses were conducted using Python (version 3.10.12) using the following packages: pandas (2.2.3), numpy (1.26.4), scipy (1.15.1), statsmodels (0.14.4), scikit-learn (1.6.1), matplotlib (3.10.0), seaborn (0.13.2), and PyMC (5.7.2). Demographic comparisons were performed using the Mann–Whitney U test for continuous variables and Fisher's exact test for categorical variables. Reliability was assessed with a split-half approach in which even- and odd-numbered trials were treated as two halves. Spearman rank correlations were used between halves and the Spearman–Brown (SB) prophecy formula was applied to estimate full-length internal consistency since VISRET scores demonstrate ties and ceiling effects in HP and AP subgroups. To make the dependence of reliability on score variance explicit, we computed coefficients separately for each group (HP, AP, AD). In addition, we reported (i) a within-group–centered pooled coefficient (half-scores of each participant centered within group before pooling) to summarize internal consistency across the full performance range while avoiding inflation due to between-group mean differences, and (ii) a naïve pooled coefficient (not centered) reported with a caution that it conflates group differences. As rules of thumb, we interpreted SB ≥ 0.90 as excellent, 0.80–0.89 as good, 0.70–0.79 as acceptable, and <0.70 as limited internal consistency. 40
Two co-primary VISRET outcomes were pre-specified: the total score (out of 40) and the false recognition count. We controlled the family-wise error rate at 0.05 across these two outcomes using Holm's step-down procedure (two-sided). Effect sizes with 95% confidence intervals (CIs) are reported as Cliff's δ (and the equivalent area under the curve [AUC]) for the Mann–Whitney comparison of total scores, and as incidence rate ratios (IRR) from negative-binomial models for false recognitions. To justify design sensitivity without overloading this section, we pre-specified that minimum detectable effects (MDEs) and precision would be summarized; the numerical MDE values are reported in the Results, and power curves are provided in the Supplemental Material.
Because age and aphasia were prespecified as potential confounders—the AD group was significantly older than the HP group (see Results) and six AD cases exhibited isolated anomia—we implemented targeted adjustments and sensitivity analyses. Primary between-group comparisons used the Mann–Whitney test for total scores (reporting Cliff's δ and 95% CIs) and negative-binomial regression for false recognitions (reporting IRR and 95% CIs). The models were then fit age-adjusted (linear regression for total score; negative-binomial for false recognitions) and the primary comparisons within age strata were repeated (<75 versus ≥75 years). To address aphasia-related heterogeneity, we conducted a sensitivity analysis excluding the six aphasic AD cases, applying the same primary and age-adjusted methods.
Unless otherwise specified, all frequentist tests were two-sided with α = 0.05. Given the exploratory nature of several analyses in this study, multiplicity control was limited to the two co-primary outcomes, for which we controlled the family-wise error rate at 0.05 using Holm's step-down procedure. The p-values for secondary, sensitivity, and exploratory analyses are reported without further multiplicity adjustment and interpreted alongside effect sizes and 95% CIs.
The primary frequentist analyses were complemented with Bayesian models owing to exploratory nature of several analyses and modest sample sizes. In this setting, Bayesian estimation is useful because it (i) yields full posterior distributions that express uncertainty as probabilities and propagate that uncertainty to derived metrics, and (ii) provides gentle, weakly-informative regularization that stabilizes coefficient estimates and guards against overfitting without imposing strong prior information. 41 Accordingly, we fit a Bayesian logistic regression distinguishing AD from HP with standardized predictors (age, sex, education, VISRET total score, VISRET false recognitions) and Normal(0, 10) priors on coefficients. We report standardized odds ratios, posterior probabilities, and 95% (and, where relevant, 99%) highest posterior density (HPD) intervals; associations are regarded as strongly supported when the 95% HPD excludes the null and the posterior probability ≥0.95.41–42 To summarize discrimination with appropriate uncertainty, we computed a receiver operating curve (ROC) for each posterior draw and reported the posterior-mean AUC with its 95% HPD interval; where helpful, we also report P (AUC exceeding conventional thresholds) and describe discrimination as “good” when the 95% HPD lower bound ≥0.80 (or “acceptable” when ≥0.70). 43 These Bayesian results are presented as complementary to, and consistent with, the primary frequentist findings.
To examine convergent and discriminant validity while minimizing language confounding, we computed Spearman rank correlations within the AD subgroup without aphasia (n = 47). Specifically, we correlated the two VISRET measures—total score and false recognition count—with established neuropsychological tests: the SPA total score, WAB-J Naming and Word Comprehension, Digit Span (forward/backward), TMT-A/B (completion times), and RCPM. Non-normality and ceiling effects motivated the use of Spearman's ρ; two-sided p-values are reported for completeness, but interpretation emphasizes effect sizes (ρ). For time-based indices (TMT-A/B), negative ρ indicate that slower performance is associated with poorer VISRET performance. WMS-R measures were not analyzed here because they were not administered in the non-aphasic AD subgroup.
For cutoff determination, thresholds were derived exclusively from the HP distribution; patient data (AD/AP) were not used to set thresholds and were reserved only for evaluating diagnostic performance (ROC analyses, confusion matrices, and cross-validation).
Step 1: Bayesian Linear Regression (1) was fit with the VISRET total score as the dependent variable and cognitive/demographic variables (e.g., VISRET false recognitions, age, sex, education) as predictors (all variables standardized; weakly informative Normal(0, 10) priors as specified above). Standardized coefficients were considered supported when their 95% HPD excluded 0; posterior probability ≥0.95 was interpreted as strong evidence. This step estimates weights only and does not use patient data to set any threshold.
Step 2: A Memory Score was defined using the median standardized coefficients from Step 1:
Step 3: Using HP-only Memory Scores, we computed 95% and 99% HPD intervals; the lower bounds of these intervals defined the cutoff thresholds. The AD/AP data did not contribute to this derivation.
Step 4: We evaluated validity using memory scores to distinguish HP from AD via ROC analysis and confusion matrices at both HP-derived thresholds, defined as the lower bounds of the HP 95%-HPD and 99%-HPD intervals. To mitigate optimism and assess generalizability, we implemented five-fold cross-validation focused on thresholds: the HP sample was partitioned into five folds; in each iteration, the 95%-HPD and 99%-HPD lower-bound cut-offs were re-estimated from the training HP folds only and then applied to the held-out HP fold, together with the fixed AD group. In the main text we summarized out-of-fold sensitivity, specificity, and AUC by the fold-wise mean and range. For practical deployment, we also reported full-sample HP cut-offs (95%- and 99%-HPD lower bounds) with their sensitivity and specificity, noting that cross-validated estimates better reflect out-of-sample performance.
Bayesian linear regression was conducted to assess the impact of aphasia on test performance. Bayesian Linear Regression (2) was performed using the VISRET total score as the dependent variable and group classification (HP versus AP), age, education level, and VISRET false recognition count as explanatory variables. Bayesian Linear Regression (3) was conducted using the VISRET false recognition count as the dependent variable, and group classification, age, years of education, and VISRET total score as explanatory variables. Sex was excluded from these analyses, as preliminary demographic comparisons revealed no significant sex differences among the groups (Table 1), ensuring model parsimony and stability given the limited sample size. In both models, the significance of the group classification variable was evaluated to determine the presence of a significant difference between the groups.
Demographic data of healthy participants, patients with Alzheimer's disease, and patients with aphasia.
HP: healthy participants; AD: Alzheimer's disease; AP: aphasic patients.
Model convergence was evaluated by computing the R-hat and effective sample size (ESS) for all regression coefficients in the Bayesian models. R-hat values ≤ 1.01 and ESS ≥ 400 were considered indicative of successful convergence, following established recommendations for Bayesian modeling.41,44 All Python codes used for statistical analyses are made publicly available in the GitHub repository of the authors (https://github.com/dreamycat925/visret-study).
Results
Demographic data are summarized in Table 1: 62 HP, 53 patients with AD, and 34 AP (19 PPA: nonfluent/agrammatic variant of PPA = 18, logopenic variant PPA = 1; 15 cerebrovascular aphasia: 10 infarction, five hemorrhage). Patients with AD were older and less educated than HP (both p < 0.01), whereas HP and AP did not differ significantly in age (p = 0.16) or education (p = 0.07). Sex distributions were similar across groups (HP versus AP p = 1.00; HP versus AD p = 0.45, Fisher's exact test).
We summarized outcomes of brief functional-vision checks conducted alongside the neuropsychological battery to preclude visual confounds. Near vision was adequate before VISRET: HP showed no difficulties on 24-point kana/kanji reading or WAB-J word-comprehension (picture-pointing); all participants with AD performed picture-pointing without errors, with six participants showing picture-naming errors; AP participants had 24 picture-naming errors and four picture-pointing errors, yet all four completed the 24-point reading task. The WAB-J checks used line-drawing stimuli smaller than VISRET images, which were presented at a larger size. We also screened for higher-order visuospatial deficits, including unilateral neglect (Line Bisection) and dorsal-type simultanagnosia (Dot Counting Task), with no positive cases observed. Collectively, these findings indicate that no visual limitations were expected to interfere with VISRET under our testing conditions.
Split-half reliability was estimated per group using Spearman correlations with SB correction. HP: r = 0.46, SB = 0.63 (n = 62); AP: r = −0.07, SB = −0.14 (n = 34); AD: r = 0.76, SB = 0.87 (n = 53). We also summarized internal consistency across groups: the within-group–centered pooled estimate was r = 0.73, SB = 0.84 (n = 149), whereas the naïve pooled estimate was r = 0.89, SB = 0.94 (n = 149), the latter being inflated by between-group differences. The relatively lower coefficients in HP/AP are consistent with ceiling effects and restricted variance, while AD shows good internal consistency (SB ≈ 0.87) in the clinically relevant performance range.
Known-groups discrimination between HP (n = 62) and AD (n = 53) was very large. As shown in Table 2, patients with AD exhibited markedly lower VISRET total scores and higher false recognition counts compared with HP. For the VISRET total score, the Mann–Whitney comparison yielded Cliff's δ = −0.96 (95% CI −0.99 to −0.92), and equivalent AUC = 0.98 (0.96–1.00; p < 10−19). For false recognitions, negative-binomial regression estimated IRR = 38.60 (95% CI 13.99–106.56). Both co-primary outcomes remain significant under Holm step-down control (FWER = 0.05). Design-based sensitivity (80% power) indicated the study could detect δ ≈ 0.30/AUC ≈ 0.65 and IRR ≈ 5.74 at α = 0.05, and δ ≈ 0.33/AUC ≈ 0.67 and IRR ≈ 6.73 at α1 = 0.025, with the observed effects far exceeding these thresholds. Age-adjusted models were concordant (total-score group coefficient −6.95, 95% CI −9.71 to −4.19; false-recognition IRR = 38.39, 95% CI 14.13–104.30; p < 10−20). Power curves for the realized sample sizes are shown in Supplemental Figure 1.
Cognitive measures of healthy participants, patients with Alzheimer's disease, and patients with aphasia.
HP: healthy participants; AD: Alzheimer's disease; AP: aphasic patients; VISRET: Visual Image Simple Recognition Test; TMT: Trail Making Test; RCPM: Raven's Coloured Progressive Matrices; MMSE-J: Japanese version of the Mini-Mental State Examination; WAB-J: Japanese version of the Western Aphasia Battery.
To address the age imbalance between groups, we repeated the primary comparisons within age strata (< 75 versus ≥ 75 years). Effects were consistent: in < 75 (HP = 41, AD = 12), VISRET total showed a large HP > AD difference (Cliff's δ = −0.89, 95% CI −0.98 to −0.77; AUC = 0.95; p = 3.06 × 10−7) and false recognitions were markedly higher in AD (IRR = 78.58, 95% CI 16.60–372.10); in ≥ 75 (HP = 21, AD = 41), effects remained very large (δ = −0.99, 95% CI −1.00 to −0.96; AUC = 0.99; p = 1.41 × 10−10) with elevated false recognitions (IRR = 20.32, 95% CI 5.76–71.68). Age-adjusted models were concordant (total-score β = −6.95, 95% CI −9.71 to −4.19; false-recognitions IRR = 38.39, 95% CI 14.13–104.30). To address potential heterogeneity introduced by aphasia within the AD group, we repeated the co-primary comparisons excluding the six AD cases with isolated anomia (HP = 62, AD = 47). Results were essentially unchanged (δ = −0.96, 95% CI −0.99 to −0.91; AUC = 0.98; p = 5.09 × 10−19 false recognitions IRR = 38.26, 95% CI 14.52–100.79), and age-adjusted estimates remained similar (total-score β = −6.94, 95% CI −8.43 to −5.45; false-recognitions IRR = 37.92, 95% CI 13.91–103.34). All effects remained significant under Holm control, indicating that the HP–AD differences are not explained by age and are robust to aphasia-related heterogeneity.
The validity of VISRET was examined using Bayesian logistic regression. As shown in Figure 2, patients with AD exhibited lower VISRET scores and a higher number of false recognitions than those in the HP and AP groups. Bayesian logistic regression confirmed that the VISRET scores, false recognition counts, and age were significant predictors of group classification, as indicated by the standardized odds ratios and posterior probabilities in Table 3. The ROC curve (Figure 3) demonstrated excellent classification performance, with a posterior mean AUC of 0.99 (95% HPD: 0.97–0.99), suggesting that VISRET has a high discriminative ability to distinguish the AD group from the HP group.

VISRET performance across groups. Box plots with overlaid strip plots showing VISRET performance in participants classified as the HP, AD, and AP groups. The box represents the IQR, with the central line indicating the median score, and whiskers extending to 1.5 times the IQR. Individual data points are shown as strip plots. (a) Total VISRET scores: Patients with AD showed lower recognition scores than the HP and AP groups. (b) VISRET false recognition errors: Patients with AD exhibited a higher number of false recognitions than the HP and AP groups. VISRET: Visual Image Simple Recognition Test; HP: healthy participants; AD: Alzheimer's disease; AP: patients with aphasia; IQR, interquartile range.

Receiver operating characteristic curve for Bayesian logistic regression distinguishing healthy participants and Alzheimer's disease patients. ROC curve showing the classification performance of the Bayesian logistic regression that included VISRET total score, VISRET false recognitions, age, sex, and education. The posterior mean AUC was 0.99 with a 95% HPD interval of 0.97–0.99, indicating excellent discrimination. The dashed diagonal denotes chance performance. VISRET: Visual Image Simple Recognition Test; HP: healthy participants; AD: Alzheimer's disease; ROC: receiver operating characteristic; AUC: area under the curve; HPD: highest posterior density.
Standardized odds ratios (logistic regression) and coefficients (linear regression) and posterior probabilities based on the Bayesian logistic and linear regression analyses.
Bayesian logistic and linear regression analyses were used to estimate the influence of various factors on the VISRET performance.
Bayesian Logistic Regression Analysis: Standardized odds ratios (95% HPD) and posterior probabilities for age, sex, years of education, VISRET score, and VISRET false recognition.
Bayesian Linear Regression (1): Baseline estimation model examining the relationship between the VISRET score and explanatory variables.
Bayesian Linear Regression (2) and (3): Comparisons between the HP and AP groups with VISRET score (2) or VISRET false recognition (3) as dependent variables.
Standardized odds ratios (logistic regression) or standardized coefficients (linear regression) are reported with 95% HPD intervals and posterior probabilities. Significant variables (bold) were defined as those with both (1) 95% HPD intervals excluding 1.0 (odds ratio) or 0 (coefficients), and (2) posterior probability ≥ 95%.
VISRET: Visual Image Simple Recognition Test; HPD: highest posterior density.
In the non-aphasic AD subgroup, VISRET showed the expected convergent pattern with established memory testing (Table 4). The VISRET total score correlated moderately to strongly with SPA (ρ = 0.60, p = 0.01), whereas associations with non-memory domains were smaller (e.g., WAB-J Naming ρ = 0.33, p = 0.03; Digit Span Backward ρ = 0.36, p = 0.01; TMT-A time ρ = −0.33, p = 0.02), and the relation with nonverbal reasoning was near-zero (RCPM ρ = 0.02). The false-recognition count showed generally weak relations to comparator tests, with the exception of a moderate negative association with TMT-B time (ρ = −0.61, p = 0.01). Taken together, these correlations support convergent validity with standard memory measures and relative specificity with respect to language, attention/working memory, executive, and reasoning tasks. These findings complement the primary between-group results, reinforcing the validity of VISRET as a language-minimized measure of episodic memory in AD.
Spearman correlations between VISRET measures and comparator neuropsychological tests in the Alzheimer's disease group without aphasia.
Cells report Spearman's rank correlation coefficients (ρ) for VISRET total score and VISRET false recognition count versus each comparator test in the non-aphasic AD subgroup. Asterisks indicate two-sided p < 0.05 (uncorrected). For time-based measures (TMT-A/B), larger values reflect slower performance.
VISRET: Visual Image Simple Recognition Test; TMT: Trail Making Test; RCPM: Raven's Coloured Progressive Matrices; MMSE-J: Japanese version of the Mini-Mental State Examination; WAB-J: Japanese version of the Western Aphasia Battery; SPA: Standardized Verbal Paired Associates Learning Test.
Bayesian Linear Regression (1) identified false recognitions as the only significant predictor (median standardized β = −0.52). We therefore defined a Memory Score as

Memory score distribution across groups with cutoff thresholds. Strip plot of memory scores in participants classified as the HP, AD, and AP groups. The red line represents the lower bound of the 95% HPD interval and the green line represents the lower bound of the 99% HPD interval, both derived from the HP group. VISRET: Visual Image Simple Recognition Test; HP: healthy participants; AD: Alzheimer’s disease; AP: aphasic patients; HPD: highest posterior density.
To gauge the impact of aphasia, we compared AP and HP (Figure 2). Bayesian Linear Regressions (2) and (3) showed significant group effects for both total score and false recognitions (Table 3), but absolute differences were minimal (total 39.6 versus 39.4; false recognitions 0.1 versus 0.3), indicating limited clinical impact.
The Bayesian logistic and linear regression models exhibited satisfactory convergence, as all R-hat values were ≤1.01 and ESS exceeded 400, ensuring stable parameter estimation (Supplemental Figure 3, Supplemental Table 2).
Discussion
In this preliminary evaluation, VISRET showed encouraging psychometric performance as a language-minimized measure of episodic recognition. Internal consistency was adequate where it matters clinically—good in the AD group and acceptable when pooled—while lower coefficients in HP/AP were consistent with ceiling-level performance, indicating reliability in the relevant range without over-interpreting near-perfect healthy scores. Known-groups validity was strong; simple between-group tests on the two co-primary outcomes (total score, false recognitions) differentiated AD from HP with very large effects, and a complementary Bayesian logistic model yielded concordant discrimination without relying on strong priors. HP-derived Memory Score cut-offs showed stable cross-validated performance, with the 95%-HPD threshold providing balanced screening sensitivity/specificity and the 99%-HPD threshold prioritizing specificity when false positives carry higher clinical costs. Notably, the stricter 99%-HPD threshold appears to better separate AP from AD groups (Figure 4), suggesting its potential utility for excluding confounding effects of aphasia or executive dysfunction when such factors are suspected clinically. Convergent/discriminant patterns supported interpretation as a memory measure; in AD without aphasia, the VISRET total score aligned with SPA, whereas relations with language, attention/working-memory, frontal executive, and reasoning indices were weaker. Potential confounds were addressed and did not explain the findings; group differences persisted after age adjustment and within both < 75 and ≥ 75 strata; and results were essentially unchanged when the small subset of AD cases with isolated anomia was excluded. Finally, although HP–AP comparisons were statistically significant, absolute differences were trivial and scores clustered near ceiling, suggesting that VISRET performance is relatively robust to aphasia in practical use.
As summarized in Supplemental Table 3, recognition-only memory tests are uncommon; most instruments introduce either verbal recall or drawing, re-exposing performance to language or constructional/motor demands. Among “visual” tests, the VAT is widely used but, in its standard protocol, requires verbal retrieval at response and is therefore susceptible to aphasia and speech disorders;18–19,21 the VAT-E adds choice-based recognition items, yet these remain adjunctive rather than an independently validated index and are administered together with verbal recall. 23 By contrast, the CANTAB visual-memory suite—PRM—comprises selection-based recognition tasks that avoid drawing and spoken output; 25 PAL taps visual associative learning and is not pure recognition, but it likewise uses manual choice and is relatively insulated from non-mnestic confounds. 24 Practical constraints (hardware/software and licensing), however, limit routine bedside use. 26 Against this backdrop, VISRET employs brief visual encoding and binary yes/no recognition with an optional pointing response and, in our data, showed minimal language dependence in practice (AP–HP differences were small and near ceiling) while preserving strong AD–HP discrimination. These observations are preliminary, but they support positioning VISRET as a language-minimized, low-motor-load visual-memory measure with empirically demonstrated low susceptibility to aphasia, to be confirmed in larger external cohorts.
False recognitions have been identified as an important component in memory assessment.45–47 False recognitions are integral to recognition-memory performance and, by design, tend to vary inversely with total score. In our data, Bayesian linear regression with standardized variables indicated that this relationship was moderate rather than deterministic, suggesting that false alarms and hits provide complementary—rather than redundant—information. We quantified this relation with Bayesian linear regression on standardized variables and used it pragmatically to construct a unified Memory Score that weights false recognitions alongside total score. Beyond their quantitative contribution to overall performance, false recognitions may also provide qualitative insights into underlying cognitive mechanisms. Within the non-aphasic AD subgroup, false recognitions were selectively associated with TMT-B (not TMT-A). Although TMT-B is typically interpreted as a dorsolateral prefrontal set-shifting measure, it also taxes response inhibition; 48 coupled with reports that false recognition relates to inhibitory/monitoring deficits,49–52 this pattern may reflect a shared inhibitory-control contribution. Given the subgroup size, this remains hypothesis-generating. Clinically, considering false recognitions together with total score may help discriminate amnestic memory loss from patterns in which executive vulnerability elevates false alarms and may support using stricter decision thresholds when executive dysfunction is suspected.
We deliberately assembled a heterogeneous aphasia cohort spanning both PPA and post-stroke aphasia to evaluate whether aphasia materially affects VISRET performance with a performable task. The AP participants in our study exhibited varying degrees of speech-production difficulties (apraxia of speech and/or dysarthria), anomia, and single-word comprehension deficits. However, the recognition-based design of VISRET with binary yes/no responses and minimal instruction-level comprehension demands should limit such effects when basic comprehension is intact. In our sample, aphasia-related impact appeared minimal, though uneven representation across etiologies limits generalization. Disease etiology may differentially influence VISRET through distinct mechanisms. Although both entities present with aphasia, the accompanying syndromic profiles often diverge by etiology: semantic variant PPA typically yields progressive semantic degradation (semantic anomia), which is uncommon in cerebrovascular aphasia, whereas post-stroke cohort more often features unilateral spatial neglect.53–54 Semantic variant PPA produces progressive degradation of object knowledge (semantic anomia) that may impair visual recognition independent of episodic memory. 55 Notably, our sample did not include semantic variant PPA or advanced nonfluent/agrammatic cases with widespread frontal dysfunction that could compromise inhibitory control, monitoring, or task set. 55 For cerebrovascular aphasia, lesion location matters: ventral occipitotemporal damage typical of posterior cerebral artery infarcts—more common with ischemic events 56 —has been linked to category-specific visual agnosias (faces, houses/places, words) that can degrade visual recognition despite preserved episodic memory. Our pre-testing screens (near-vision reading, WAB-J picture-pointing, and neglect screening) detected no gross visuoperceptual deficits, reducing the likelihood that such factors influenced performance. Future studies should stratify by etiology and aphasia variant and incorporate lesion–symptom mapping to isolate non-mnestic influences on VISRET performance.
The study has some limitations. First, the overall sample—especially the aphasia cohort—was modest and unevenly composed across aphasia etiologies and variants, limiting generalizability and precluding adequately powered subtype-specific analyses; moreover, latent neurodegeneration cannot be fully excluded despite clinical screening. Second, potential non-mnestic influences were only partially controlled: the cerebrovascular subgroup combined ischemic and hemorrhagic cases with different lesion predilection sites that can yield ventral-stream visual agnosias; vision was assessed with brief functional checks rather than formal ophthalmologic testing; and other cognitive domains (e.g., neglect, executive control, semantic access) were screened but not comprehensively profiled. Third, ceiling effects were evident in the HP group, supporting specificity but potentially reducing sensitivity in younger adults; parameter adjustments and age-/education-stratified norms will be needed for broader applicability. Fourth, the HP-derived cut-off thresholds were internally cross-validated but remain provisional pending external validation and, if necessary, recalibration. Fifth, the study was conducted in a single cultural/clinical context, which may further limit generalizability until replicated in independent cohorts.
In conclusion, the VISRET is a novel tool designed to evaluate episodic memory impairment while minimizing the influence of aphasia. This preliminary study suggests that the test may enable improved assessment of memory deficits in individuals with aphasia, including those with AD and PPA. By reducing the reliance on language processing, VISRET shows promise for providing diagnostic information in conditions where language impairment coexists with memory dysfunction, though larger validation studies are needed to establish its clinical utility.
Supplemental Material
sj-docx-1-alz-10.1177_13872877251405433 - Supplemental material for The Visual Image Simple Recognition Test, a language-minimized recognition test: Psychometric and clinical evaluation in Alzheimer's disease
Supplemental material, sj-docx-1-alz-10.1177_13872877251405433 for The Visual Image Simple Recognition Test, a language-minimized recognition test: Psychometric and clinical evaluation in Alzheimer's disease by Shun Akaike, Akihiko Ogata, Yoshitsugu Nakagawa, Shigehisa Ura, Kimito Kondo, Ryota Imashiro, Shigeki Hashimoto, Ichiro Yabe and Mika Otsuki in Journal of Alzheimer's Disease
Footnotes
Ethical considerations
This study was approved by the Ethics Committee of the Faculty of Health Sciences, Hokkaido University (approval number: 23-84) and conducted in accordance with the Declaration of Helsinki.
Consent to participate
Informed consent was obtained from all participants or their legal guardians prior to participation in the study.
Consent for publication
Consent for publication was obtained from all participants or their legal guardians.
Author contribution(s)
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data are available on reasonable request. Because VISRET stimuli were sourced from iStock under standard royalty-free licenses, the image files themselves cannot be redistributed or released under a Creative Commons license. Moreover, in keeping with test-security practices for clinical neuropsychological assessments, the stimulus materials will not be publicly posted; access will be limited to qualified clinicians and researchers to prevent patient exposure and protect the validity of the test. We will not provide the image set; however, upon reasonable request from qualified clinicians or researchers, we will supply the metadata needed to recreate the test (iStock contributor/source and asset identification numbers for each stimulus, stimulus category list), along with presentation parameters (timing, display size, order/randomization) and the task script so that the test can be implemented using appropriately licensed images. Requests should be directed to the corresponding author.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
