Abstract
Background:
A standardized tool for evaluating semantic knowledge of the Korean population is needed.
Objective:
The purpose of this study was to develop a neuropsychological test for the evaluation of semantic knowledge in the Korean elderly population.
Methods:
The Korean version of the Size/Weight Attribute Test (SWAT-K) was developed in reference to the original version. The diagnostic validity of SWAT-K was evaluated with 95 elderly outpatients [67 normal controls; 18 with Alzheimer’s disease (AD); 10 with semantic-variant progressive aphasia (SV-PPA)]. Voxel-based morphometry (VBM) was employed to examine associations between SWAT-K scores and morphological changes of the brain.
Results:
SWAT-K could discriminate the three subject groups (normal >AD, p < 0.001; AD >SV-PPA, p = 0.040), whereas Boston Naming Test could not distinguish SV-PPA from AD. ROC curve analysis confirmed high levels of sensitivity (0.90) and specificity (0.93) for SWAT-K. The test’s inter-rater reliability (ICC = 0.827) and test-retest reliability (ICC = 0.666) were assessed as well. VBM found a significant positive correlation (uncorrected p < 0.005, k > 100) between SWAT-K scores and gray matter volume in right inferior frontal cortex (T = 4.08, k = 191) and bilateral temporal cortices (left, T = 4.42, k = 135; right, T = 3.55, k = 253), the areas the most affected in SV-PPA.
Conclusions:
SWAT-K is a sensitive and reliable test for evaluating semantic knowledge in the Korean elderly population. Strong positive correlations between SWAT-K scores and the brain areas responsible for semantic processing further corroborate the validity of SWAT-K.
Keywords
INTRODUCTION
Semantic dementia, a neurodegenerative disease with the selective deterioration of semantic memory [1], was first described by Arnold Pick at the start of the 20th century [2, 3] and characterized in modern times by several teams of researchers [4–7]. Now classified as the semantic variant of primary progressive aphasia (SV-PPA), it is considered as one of the different manifestations of frontotemporal lobar degeneration, a clinical syndrome that is associated with the circumscribed degeneration of the prefrontal and anterior temporal lobes [1, 8–11].
As of now, two exams are mainly used for the comprehensive neuropsychological evaluation of dementia in South Korea: the Korean version of Alzheimer’s Disease Assessment Scale [12] and the Korean version of the Consortium to Establish a Registry for Alzheimer’s Disease Assessment Packet (CERAD-K) [13]. However, no standardized tool to evaluate SV-PPA in the Korean population exists. Thus, a novel test regime that encompasses and more sensitively evaluates a wider range of neuropathological symptoms for different types of dementia is called for.
Semantic knowledge may be divided into the knowledge of superordinate categories and subordinate knowledge, the latter of which is further divided into attribute and associative knowledge [4]. It has been reported that during the course of SV-PPA, attribute and associative knowledge is impaired during the early stages of the disease, while the superordinate knowledge is relatively preserved [4, 15]. While patients with Alzheimer’s disease (AD) also exhibit loss of semantic knowledge, their degree of impairment is generally less severe than that of patients with SV-PPA [16, 17].
Size/Weight Attribute Test (SWAT) [17] is a neuropsychological test for evaluating semantic knowledge and the extent of its impairment in the subjects. The test probes the attribute knowledge, which deteriorates relatively early in SV-PPA patients, by asking the patients to select either the largest (in case of animals) or the heaviest (in case of objects) among a set of three options. The test does not require cross-modal matching, compared to classical confrontational naming tasks such as the Boston Naming Test (BNT) [18], and probes the same attribute dimension for each category, thereby reducing the possible effects of other confounding factors including the impairment of abstract reasoning ability, executive control andflexibility.
The aim of this study was to develop a Korean version of the SWAT (SWAT-K). The sensitivity and specificity of SWAT-K was determined, and a ROC curve analysis was conducted. Also, the scores of the two subtests (Animal-Size and Object-Weight) were compared within each dementia group to look for any possible category effect. The ability of the test to sensitively discriminate between AD and SV-PPA was investigated as well.
Several recent studies have used voxel-based morphometry (VBM) in order to find correlations between the gray matter volumes of specific brain regions and the scores of neuropsychological tests. All have reported left, and to a lesser extent right, anterior temporal lobes as the chief sites of correlation with tests of semantic knowledge in AD and SV-PPA patients [19–22]. The right anterior temporal lobe has also been reported to be associated with other aspects of semantic processing, such as emotion processing, social cognition, and musical knowledge [23–25]. A positive correlation between the gray matter volumes of these areas with the SWAT-K test scores would serve to further validate the test’s potentials for the evaluation of semantic knowledge and the extent of its impairment. Thus, the brain imaging data of the test subjects were examined to find such correlations.
METHODS
The research was conducted in compliance with the ethical standards as put forth by the Helsinki Declaration; the ethics review board of our institution, Boramae Medical Center of South Korea, approved the study protocol.
Participants
A total of 95 participants, all of whom were older than 60, were recruited from SMG-SNU Boramae Medical Center of South Korea from December 2013 to March 2014. All participants were able to speak Korean to a fluent level; however, some were not as capable as others in terms of reading abilities due to a comparably lower education level. These participants were not gathered through advertisements or selected randomly. All participants signed the consent form before proceeding to the questionnaire. On the basis of NINCDS-ADRDA and the diagnostic criteria for PPA variants by Gorno-Tempini et al. [11], the participants were classified into three distinguishable groups: 67 were normal, 18 were diagnosed with AD, and 10 with SV-PPA. Those within the AD and SV-PPA criteria were diagnosed by a psychiatrist. A brain imaging scan, either MRI or PET, were performed fordiagnosing the patients with AD or SV-PPA. The CERAD-K Neuropsychological Assessment Battery [13, 26] was also administered by neuropsychologists to test the cognitive functions of participants and facilitate the diagnosis of AD and SV-PPA. The CERAD-K consists of 9 neuropsychological tests: Verbal Fluency Test, 15-item (short-form) BNT, Mini-Mental State Examination in the Korean version of the CERAD Assessment Packet (MMSE-KC), Word List Memory Test, Constructional Praxis Test, Word List Recall Test, Word List Recognition Test, Constructional Recall Test, and Trail Making Test. All instruments were validated in the Korean population.
Exclusion criteria of participants
The exclusion criteria for participating in this study included: participants who disagreed to the consent form; known history of alcohol or drug addiction; known history of neurological disease, head trauma, stroke, or any other physical illnesses that could affect cognitive functions; participants with visual or hearing difficulties that could interfere with the test procedure; participants with motor impairment that can affect test scores; and participants whose brain imaging data showed lesions that indicate a neurological disorder other than AD or SV-PPA. This research was completed in accordance with the guidelines of the Helsinki Declaration. Dementia was diagnosed based on the criteria of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) by a trained psychiatrist and a neurologist.
All dementia subjects have received neurological examinations and brain MRI to exclude other brain diseases. The MMSE, the Clinical Dementia Rating (CDR), and the short form-BNT (sBNT) were administered by a psychologist. Among the normal control (NC) group, 44 (65.7% ) received 0 (normal) for the CDR-Global score, while 23 (34.3% ) received 0.5 (questionable dementia). In the AD group, 3 people (16.7% ) received a score of 0.5, while 13 (72.2% ) were 1 (mild dementia) and 2 (11.1% ) were 2 (moderate dementia). In the SV-PPA group, 2 people (20.0% ) received a score of 0.5, 6 (60.0% ) received 1, and 1 received 2. The respective average disease durations from symptom onset and standard deviations for the diseased subject groups were: 5.28 years (standard deviation 1.74 years) for the AD group, and 3.80 years (standard deviation 1.32 years) for the SV-PPA group.
Scale development
A Korean version of SWAT (SWAT-K) was developed in reference to the original one [17] with approval from the authors. As in the original version of the test, the frame of reference for all animate items was their size and for all man-made objects their weight. 15 sets of words for animal and 15 sets of words for tools (three words for each set) were adapted from the original version of the test as published by Warrington and Crutch [17] and translated into Korean. Among the words, 4 animal-words and 8 object-words were less naturally and culturally familiar to the Korean elderly population and were substituted with more well-known ones: for example, the word ‘kangaroo’, since it is not native to the Korean peninsula, was changed to ‘donkey’. One animal-word and 5 object-words were changed because while culturally relevant, their frequencies of usage in the Korean language was relatively low; for example, the word ‘step ladder’ was changed to ‘ladder’. The frequencies of usage of the words were confirmed with the 7th Yonsei Corpus, developed by the Yonsei Institute of Language and Information Studies (https://ilis.yonsei.ac.kr/).
A pilot study was conducted with 45 Korean older adults (15 NC, 15 AD, and 15 with mild cognitive impairment) with the test stimuli in the form of written words. The words in the test sets that all three subject groups failed to respond correctly were either rearranged into different sets (4 animal-words) or replaced with another high-frequency word (4 animal-words and 13 object-words). Based on the results of the study, it was decided that both verbal and visual stimuli would be provided simultaneously in order to provide to the test subjects a more uniform semantic representation for each stimulus. The final version of the SWAT-K can be found in Supplementary Fig. 1.
Procedures
The study was first initiated in Dementia and Alzheimer’s Disease clinic in Dongjak-gu, Seoul and Boramae Seoul National Hospital from April of 2013 to February of 2014. 19 subjects were evaluated by two independent raters at the same time for the assessment of inter-rater reliability; 17 were retested 2 months after the first testing to check for test-retest reliability.
Assessment
Mini-Mental state examination
The MMSE is a neurocognitive test for screening cognitive impairments [27]. Its scores range from 0 to 30; higher scores indicate better cognition and scores below 25 indicate cognitive impairment. The test can be administered in 5 to 10 minutes. As a part of the CERAD-K packet, MMSE-KC was developed and validated for use with the Korean elderly populations [13]. It consists of orientation (10 points), short-term memory registration and recall (6 points), attention (5 points), naming (2 points), following verbal commands (4 points), judgment (2 points), and copying a double pentagon (1 point).
Clinical dementia Rating
The CDR, developed by Hughes and colleagues [28] and partially revised by Morris [29], is a representative measurement for assessing the severity of dementia. It is designed to produce a global composite CDR score for dementia severity combining the clinical evaluation of a clinician on impairment in the six areas of memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. The composite rating consists of five levels: 0 (none), 0.5 (questionable), 1 (mild), 2 (moderate), and 3 (severe). A Korean version of the test was validated by Choi and colleagues [30].
Boston naming Test – short form
The BNT is one of the most commonly used confrontational naming test developed by Kaplan, Goodglass, and Weintraub [18]. The subject is shown a series of visual stimuli (60 drawn pictures, in the case of BNT) and asked to give a name for each of the stimulus. A Korean version of the test was developed by Kim and Na [31], from which a shorter, 15-item version (sBNT) was developed as a part of the CERAD-K packet [13].
Assessment of regional correlates of test scores: Voxel-based morphometry
Thirty-two of the total participants underwent T1-weighted MRI in a 3.0 T MRI scanner (Achieva, Philips Healthcare, Netherlands), and VBM was employed to examine associations between cognitive performances measures and morphological changes. Among the participants, 25 were normal controls and 7 were AD patients. SV-PPA patients were excluded from the analysis because there is the possibility that the brain atrophy pattern of the diagnostic group may confound the results of VBM (the result of a VBM analysis with NC and SV-PPA subjects is shown in Supplementary Figure 2; it is similar to the result of VBM analysis obtained with the MRI images of NC and AD subjects).
The image preprocessing steps and statistical analysis for VBM were performed using Statistical Parametric Mapping 8 (SPM8, http://www.fil.ion.ac.uk/spm[fil.ion.ac.uk/spm]) implemented on MATLAB 2009b (http://www.mathworks.com). The gray matter, white matter, and cerebrospinal fluids of T1-weighted magnetic resonance images were identified and segmented using tissue probability maps included in SPM8. Intersubject registrations of gray matter images were performed by nonlinear deformations using DARTEL (diffeomorphic anatomical registration using exponentiated lie algebra) toolbox [32] in SPM8, and modulated to preserve tissue volumes after warping. The modulated gray matter images were then spatially normalized to MNI (Montreal Neurological Institute) space and smoothed with a Gaussian kernel of 10 × 10 × 10 mm.
For statistical analysis, the smoothed gray matter images were globally normalized by total intracranial volumes, and a masking threshold of 0.1 was applied. The final images were entered into a multiple regression model to examine regional correlates of cognitive performances in a voxel-wise manner, and age, gender, and year of education were added as covariates of no interest. The statistical thresholds were set at p < 0.005 uncorrected for multiple comparisons and an extent threshold of 100 voxels.
Data analysis
SPSS for Windows 18.0 (SPSS Inc., Chicago, IL, USA) was used to analyze the data. ANOVA and chi-square tests were run across the demographic variables (age, education, and gender) in order to verify between-group differences. One-way ANCOVAs were conducted to test for any significant effects of dementia status (Normal, AD, SV-PPA) with age and education as covariates. The LSD (least significant difference) method was implemented for the post-hoc analyses of the test scores. Wilcoxon signed-rank tests were performed to compare the two subtest scores within each dementia group. ROC curves were plotted to assess the ability of SWAT-K and sBNT to screen AD and SV-PPA. Intraclass correlation coefficients (ICCs) were assessed to confirm the inter-rater reliability and test-retest reliability of SWAT-K. The significance level for all statistical tests was set to a two-tailed p-value of 0.05.
RESULTS
Demographics
Table 1 shows the demographic information of the NC, AD and SV-PPA groups. The mean age and years of education as well as the scores of MMSE-KC were significantly different between the three groups, while the gender ratio did not. The AD group was the oldest, while the NC group had the longest years of education and scored the highest in MMSE-KC.
SWAT-K and sBNT scores
The means and standard deviations of the SWAT-K and sBNT scores of the respective subject groups were calculated. Also, an ANCOVA for the test scores among the three groups was conducted with age and level of education as covariates. For all tests the NC group had the highest mean scores: Animal-Size subtest, 11.31; Object-Weight subtest, 13.63; total SWAT-K, 24.94; sBNT, 12.46. The mean test scores of the AD group were: Animal-Size subtest, 7.78; Object-Weight subtest, 9.78, total SWAT-K, 17.56; sBNT, 7.72. The mean test scores of the SV-PPA group were: Animal-Size subtest, 5.90; Object-Weight subtest, 6.60; total SWAT-K, 12.50; sBNT, 7.56. All test scores, including the two subtests for SWAT-K, were significantly different among the groups (F values of the test scores were: Animal-Size subtest, 24.928; Object-Weight subtest, 32.600; total SWAT-K, 33.473; sBNT, 25.060; for all test scores, p < 0.001). There was a concurrent validity of the SWAT-K compared to the preexisting sBNT. The SWAT-K scores showed a positive correlation with the sBNT scores; the Pearson’s correlation coefficient for the two test scores were 0.751 (p < 0.001).
Category dissociation of SWAT-K subtests
The differences between the two SWAT-K subtests were examined. Wilcoxon signed-rank tests were performed for the NC and AD subject groups; a paired t-test was performed for the SV-PPA group. The mean score of the Object-Weight subtest was higher than that of the Animal-Size subtest in both NC and AD groups; for the NC group, the mean difference was 6.59 (p < 0.001) and for the AD group the mean difference was 2.81 (p = 0.005). Meanwhile, no significant difference was found in the SV-PPA group.
For the subject groups that showed dissociation in the two semantic categories, the results were further examined to see whether any gender-related effect was present. In the NC group, the mean score of the Object-Weight subtest was significantly higher than that of the Animal-Size subtest for both male (mean difference = 3.56, p < 0.001) and female (mean difference = 5.62, p < 0.001) subjects. In the AD group, while the male subjects showed a significant preference for the Object-Weight subtest over the Animal-Size subtest (mean difference = 2.75, p = 0.004), no significant difference between the two subtests were found in the female subjects.
Post-hoc analysis of SWAT-K scores
Since all test scores were significantly different among the subject groups, a post-hoc analysis was conducted to confirm SWAT-K’s utility as a neurocognitive test that can successfully discriminate among the different dementia status. The estimated marginal means of each test score were compared among the groups and the between-group differences were calculated, as illustrated in Fig. 1. An LSD post-hoc analysis confirmed significant differences of the Animal-Size and Object-Weight subtest scores as well as the total SWAT-K test scores among the groups. Especially, the total SWAT-K score could differentiate all three groups from one another, as could the Object-Weight subtest score; on the other hand, the sBNT scores failed to distinguish the AD group from the SD group (the significance of the pairwise comparison between AD and SV-PPA groups was p = 0.910).
ROC curve analysis
A ROC curve analysis was carried out for SWAT-K and sBNT in order to confirm their diagnostic validities for AD and SV-PPA. Supplementary Figure 3 shows the respective ROC curves of the test scores for AD and SV-PPA patients, and Supplementary Table 1 shows the AUC (area under curve), cut-off points, sensitivity and specificity of the tests. The AUCs of the ROC curves of SWAT-K for both AD and SV-PPA patients were larger than 0.9 (AUC of SWAT-K for AD patients, 0.929; AUC of SWAT-K for SV-PPA patients, 0.953) (Supplementary Table 1). In particular, the sensitivity and specificity of SWAT-K was much higher than those of sBNT for SV-PPA patients (SWAT-K: sensitivity 0.90, specificity 0.93; sBNT: sensitivity 0.78, specificity 0.90).
Inter-rater and test-retest reliability
The data for the inter-rater reliability of SWAT-K was collected from 19 subjects with 2 raters. The ICCs of the tests were: Animal-Size subtest, 0.856 (p < 0.001); Object-Weight subtest, 0.785 (p = 0.001); total score, 0.827 (p < 0.001). The ICCs of the test scores indicate that there is a high inter-rater concordance for SWAT-K and its subtests.
The data for the test-retest reliability of SWAT-Kwas collected from 17 subjects at a 2-month interval. The ICCs of the tests were: Animal-Size subtest, 0.752 (p = 0.004); Object-Weight subtest, 0.330 (p = 0.216); total score, 0.666 (p = 0.017). The ICCs of the test scores indicate that there is a high inter-raterconcordance for SWAT-K and the Animal-Sizesubtest.
Regional correlates of SWAT-K scores: Voxel-based morphometry
The voxel-based multiple regression analysis on neuroanatomical correlates of the SWAT-K score is summarized in Table 2. High scores on SWAT-K were strongly correlated with greater gray matter volume in several brain regions including right inferior frontal cortex (T = 4.08, k = 191) and bilateral temporal cortex (left, T = 4.42, k = 135; right, T = 3.55, k = 235) as illustrated in Fig. 2. The voxel-based multiple regression analysis on neuroanatomical correlates of the sBNT score is summarized in Supplementary Table 2.
DISCUSSION
SWAT-K as a neuropsychological test for SV-PPA
The aim of this study was to validate SWAT-K as a neurocognitive test of attributive semantic knowledge for the Korean elderly population that can discriminate SV-PPA patients from not only people without illness but those with other types of dementia. The results indicate that the SWAT-K and its subtests can successfully differentiate between the three subject groups (NC, AD, SV-PPA) while sBNT, a conventional confrontational naming task that has been commonly used in Korea for the evaluation of SV-PPA patients, failed to distinguish between AD and SV-PPA patients. The SWAT-K had a much higher sensitivity and specificity for SV-PPA patients compared to those of sBNT; moreover, the inter-rater reliability and test-retest reliability were high for the total SWAT-K score. These results indicate that the SWAT-K is a sensitive and reliable neuropsychological test for the assessment of attributive semantic knowledge that can accurately distinguish SV-PPA from other types of dementia.
The superior performance of the SWAT-K in discriminating AD and SV-PPA, compared to the sBNT, may be attributed to the different natures of the two tasks. The sBNT is a confrontational naming task that involves multiple neurological processes: extraction of semantic information from the visual stimuli; comparison of the information to the existing semantic knowledge; retrieval of verbal memory related to the pertaining semantic knowledge; and lastly, the utterance of the verbal memory. Thus, when a subject fails the sBNT it may be either because of a deficit in semantic knowledge (in the case of the SV-PPA group) or in verbal memory (in the case of the AD group); it is impossible to discriminate between the two based on the results of sBNT alone. Meanwhile, SWAT-K is a comprehensive task in which the subject only has to point to one of the three pictures—that is, no verbal memory has to be retrieved. Therefore, a defect in semantic knowledge will result in the failure of the SWAT-K, while a defect in verbal memory will not.
Brain area correlations
Right inferior frontal cortex and bilateral temporal cortices were found to be correlated with SWAT-K scores (Table 2, Fig. 2). Since the bilateral temporal cortices, especially their anterior inferior parts, are almost always affected in the brains of SV-PPA patients and is thought to carry out an integral function in semantic processing [6, 33–37], it seems fitting that the SWAT-K scores should be correlated to its volume. Right inferior frontal cortex, another region of correlation, has typically been associated with the inhibition of a specific response [38]. Right inferior frontal cortex has also been reported to contribute to the retrieval of semantic information [39, 40]; the inhibition of competing semantic representation during the retrieval process is thought to be its main role [41]. That such neural correlates were not found for the sBNT further validates the SWAT-K and shows that its neural correlates did not result from the pattern of atrophy of the predominant diagnostic group.
Other brain regions found to be correlated with the SWAT-K scores also seem to support the test’s ability to evaluate linguistic functions. Fusiform gyrus is another brain region that is most severely affected in the brains of SV-PPA patients along with the temporal lobes [42–45]; the region is speculated to have a role in picture processing. Cerebellum was also found to be correlated with the SWAT-K scores; with the cerebellum’s purported role in auditory speech perception [46–48], it is unsurprising that the scores of SWAT-K, whose stimuli were given in auditory forms as well as pictures, are correlated with this area of the brain.
Among the brain regions that have been found to play a role in semantic memory [49], some, such as posterior inferior parietal lobe, dorsomedial and ventromedial prefrontal cortices, and posterior cingulate gyrus, were not found to be correlated with the SWAT-K score. Moreover, some other brain regions, such as middle occipital gyrus and superior temporal gyrus, were unexpectedly found to be have a correlation with the score. The characteristics of the task or the participants may have affected the results of VBM; more research regarding the SWAT-K and its correlation with specific brain regions is needed.
Difference between the subtest scores
While no category effect was found for SV-PPA patients, AD patients scored higher in the Object-Weight subtest compared to the Animal-Size subtest. This is in contrast to the original study for the English version of SWAT, in which no category effect was found for both AD and SV-PPA patients [17].
One possible source of this discrepancy could be the difference in the difficulties between the two subtests; the Object-Weight subtest could have a lower level of difficulty compared to that of the Animal-Size subtest. The mean score of the Object-Weight subtest was lower than that of the Animal-Size subtest in the NC group as well. Moreover, the cut-off point for the discrimination of NC subjects from SV-PPA patients was higher for the Object-Weight subtest than for the Animal-Size subtest. These pieces of evidence support the hypothesis that the different levels of difficulty contributed to the apparent category effect.
Another possible reason that could explain the discrepancy between the two subtest scores has to do with the issue of familiarity. Having been living in a modernized, industrialized condition, the test subjects may have been exposed more often to objects, which they have been using in everyday life, than animals presented during the course of the test. In particular, past studies have shown that there is a gender-related effect on the semantic category dissociation; that is, males show a selective preference to tools over animals, while females show the opposite trend [50, 51]. Indeed, when our data was examined, a gender-related effect also emerged. In male AD patients, the Object-Weight subtest scores were significantly higher than the Animal-Size subtest scores; on the other hand, there was no significant difference between the two SWAT-K subtests in female AD patients. These results suggest the possibility that the issue of familiarity may have also contributed to the category effect.
Limitations of the study
Due to the limited number of SV-PPA patients in South Korea, the sample size of SV-PPA patients investigated in this study is relatively small (n = 10); this could pose a risk for type II error. This limitation may be overcome in the future through a multi-institutional study amassing SV-PPA patients from across the Korean peninsula.
