Abstract
Verbal fluency tasks are widely applied in a variety of languages, but whether the quality and quantity of responses are comparable across structurally different writing systems is debatable. For example, since there are no letters in a logographic, non-alphabetic language such as Chinese, the mechanisms speakers use to generate a list of words in a letter fluency task might be structurally different than those used by speakers of alphabetic languages. In this study, we investigated lexical retrieval strategies and approaches in letter and category fluency tasks among monolingual Mandarin speakers compared to monolingual English speakers. We found that the responses of Mandarin speakers are both qualitatively and quantitatively different in letter fluency, and qualitatively different in category fluency. These results suggest that differences in task completion among non-English-speaking populations are important to consider when using this extensively utilised cognitive and linguistic measure in research and clinic.
Introduction
Verbal fluency (VF) tasks have been widely used in clinical settings and research to evaluate neuropsychological functioning (Lezak et al., 2004). These tasks assess the processes of word search by examining how an individual accesses information stored in lexical memory (Roberts & Le Dorze, 1997). There are two types of VF tasks: category fluency test (CFT) and letter fluency test (LFT). The typical procedure involves asking the subject to, in 60 s, generate as many items belonging to a given semantic category (e.g., animal) in CFT or generate words that begin with a given letter (e.g., F) in LFT. Reflecting the nature of each task, individuals are expected to use semantic attributes for CFT and phonological/lexical cues for LFT (Troyer, Moscovitch, & Winocur, 1997).
Under time constraints, generating words with circumscribed parameters requires focused attention to access the mental lexicon. Since these tests tap into various cognitive processes like inhibition, cognitive speed, and attention (e.g., Shao, Janse, Visser, & Meyer, 2014), they are sensitive in detecting cognitive impairment, and as such, these are widely used as a screening tool for different clinical populations. Patients with Alzheimer’s Disease as well as individuals with amnestic mild cognitive impairment tend to exhibit greater relative impairment on CFT than LFT (Alladi, Arnold, Mitchell, Nestor, & Hodges, 2006; Auriacombe et al., 2006; Henry, Crawford, & Phillips, 2004; Lonie et al., 2009; Murphy, Rich, & Troyer, 2006). In contrast, individuals with Amyotrophic Lateral Sclerosis often perform more poorly in LFT than CFT (Quinn et al., 2012). The discriminability between the tests among patients with Parkinson’s disease is yet ambiguous, with some studies showing that these patients have more difficulty in LFT (Azuma et al., 1997; Flowers, Robertson, & Sheridan, 1995), while others report larger deficits in CFT (Henry & Crawford, 2004; Obeso, Casabona, Bringas, Alvarez, & Jahanshahi, 2012).
The portability of VF tasks provides the opportunity to cross-linguistically investigate language factors and cultural differences in lexical organisation and lexical access among the many languages in which these tasks are used, for example, Spanish (Benito-Cuadrado, Esteba-Castillo, Böhm, Cejudo-Bolívar, & Pena-Casanova, 2002; Rosselli et al., 2002), Italian (Capitani, Laiacona, & Barbarotto, 1999), Cantonese (Chan & Poon, 1999; Lee, Yuen, & Chan, 2002), Hebrew (Kavé, 2005), Korean (Kim et al., 2013), Japanese (Sumiyoshia et al., 2003), and Dutch (Van der Elst, van Boxtel, Van Breukelen, & Jolles, 2006). Currently, cross-linguistic investigations are inconsistent with respect to the contributions of culture and linguistics on VF task performance. For instance, Kempler, Teng, Dick, Taussig, and Davis (1998) compared CFT performance on the category animals among English speakers and US immigrants who spoke Spanish, Chinese (dialect not indicated), and Vietnamese. Whereas Vietnamese speakers generated the most responses and the Spanish speakers generated the fewest; the authors proposed that word length differences between these two languages might contribute to their findings. In another study, native Spanish speakers performed better than native English speakers in the category animals, which was attributed to differences in early living environments and exposure to different types of animals of different languages. In contrast, no significant difference in the number of correct responses between English and Finnish speakers was found in CFT on the category of animals, despite the contrasting characteristics of two languages (Pekkala et al., 2009). The lack of agreement among studies of VF in the animal category is one reason why continued investigation in this area is critical in determining whether this task is applicable to speakers of different language.
Comparing languages with different writing systems, for example, logographic Chinese versus phonographic (alphabetic) Latin, can be particularly interesting with regard to LFT and CFT performances to investigate linguistic and cultural influences on these production tasks. Comparisons of CFT between Chinese and Western languages showed quantitatively similar performances between the populations (e.g., Chan & Poon, 1999). However, the research on LFT is specifically limited, perhaps because of the confining role that letters (i.e., graphemes) may play across languages to compare performances. While comparisons in LFT performance have been made among speakers with various phonographic systems, such as Spanish, Russian, Hebrew, Japanese, and Korean (e.g., Kavé, 2005; Kim et al., 2013; Rosselli et al., 2002; Snodgrass & Tsivkin, 1995; Sumiyoshia et al., 2003), modification and scoring of LFT is not very different among these languages as each has an alphabetic system. The question can be raised, however, whether CFT and LFT scoring norms set by Western languages would also apply to Chinese which uses a logographic writing system, as semantic and phonemic network activations in the brain might be influenced by one’s linguistic and cultural background. Yet to date, there has not been a systematic evaluation of VF behaviours in speakers of Chinese.
Chinese is a notably different language from English. For the purposes of this article, the term “Chinese” denotes the language and “Mandarin” refers to the dialect of Chinese spoken by subjects in this study. Although there are more than 200 dialects of Chinese that are mutually unintelligible, it is the written language that unites the dialects and so in this article, subjects write Chinese but they speak Mandarin. The character (字) is the basic perceptual unit in spoken and written Chinese. All characters are monosyllabic and with very few exceptions, most are free morphemes. Superimposed on the syllable is lexical tone, which is the manipulation of pitch contour of the spoken syllable. Much like segmental information (i.e., vowels and consonants), tones contribute to word meaning. Mandarin has four distinctive tones. For example, the syllable /ma/ spoken in Tone 1, means “mother” (媽); in Tone 2, means “to bother” (麻); in Tone 3 means “horse” (馬), and in Tone 4 means “reprimand” (罵). Even with segmental and supra-segmental (i.e., phonological aspects extending beyond one sound, such as tone) information, there remains a large amount of homophony in Mandarin (and other Chinese dialects). Orthographically, a character is defined by a fixed amount of space in print (Feldman & Siok, 1999). That is, regardless of how many strokes make up a character, 機 (“machine”) versus 二 (“two”), all characters occupy the same amount of space. Many Chinese characters have orthographic radicals. An orthographic radical is defined as a component of a character that can carry semantic or phonological information, to varying degrees of reliability. A phonological radical is one that dictates the pronunciation of the character. For example, 請 (please, to ask), 清 (clear), 情 (emotion), and 晴 (clear, fine) all share the same segmental string [qing] dictated by the shared radical 青 (blue/green), albeit different lexical tone. There are radical-sharing characters that sound differently but share semantic features (e.g., 推, to push and 拉, to pull); these are semantic radicals. Finally, there are characters that sound the same but do not share phonological radicals or semantic features (e.g., 聽 (to listen) and停 (to stop), pronounced as [ting1] and [ting2], respectively).
The purpose of this study is to examine VF performance in speakers of Mandarin 1 compared to native English speakers. The research questions are the following: (a) Is CFT performance of Mandarin speakers quantitatively and characteristically different from that of English speakers? (b) Is LFT performance of the Mandarin speakers quantitatively and characteristically different from that of English speakers? Characteristic differences in VF performance can be defined in terms of clustering and switching; Troyer et al. (1997) introduced these two important underlying components of VF, suggesting that the number of correct words, which is the most common VF measure, alone may not fully capture the multifactorial characteristics of VF.
Based on the findings by Chan and Poon (1999), we hypothesise that the performance of the two language groups on CFT will be quantitatively similar in the number of correct responses generated, but characteristically different in terms of the organisation of the mental lexicon as evidenced by the clustering patterns and subcategories that subjects use across the two languages. For LFT, we hypothesise that the performance of the two groups will be different both quantitatively and characteristically for two reasons. First, Mandarin and English have different orthographic systems and the degree to which phonology is orthographically represented is substantially distinct. Second, alphabet letters are used in Mandarin specifically for access to the segmental string (i.e., the pronunciation) of a written character (i.e., morpheme) as letter strings do not offer any access to lexical information; specifically, lexical tone is not conveyed by letters. Therefore, we predict that the shallow function of letters in the language will impede the performance of Mandarin speakers on this task.
Study 1 (CFT)
Method
A total of 60 subjects participated in this study: 30 American-English speakers and 30 Chinese Mandarin speakers. See Table 1. All subjects have at least a high school degree and were in the process or recently finished obtaining a university-level degree; none reported any history of learning, cognitive, or emotional challenges. All Mandarin speakers were recruited from a local college campus in Wuhan, China, using online and printed ads. Each subject reported that he or she is a monolingual Mandarin speaker with minimal, if any, exposure to another language. All English speakers were recruited from a local college campus in New York, USA, using announcements to student organisations. Each subject reported that he or she is a monolingual English speaker with minimal, if any, exposure to another language.
Demographic characteristics of participants in the CFT condition.
CFT: category fluency test.
Standard deviations are shown in parentheses.
Measures and scoring
Procedure
The examiner instructed subjects to name as many animals as they can say in 60 s. Subjects were also told not to repeat any responses and that proper and/or fictional names are not allowed. For example, “Lassie” is not allowed nor is “Hello Kitty.” English and Mandarin subjects received instructions and subsequently performed the task in English and Mandarin, respectively. All performances were taped using an audio recorder and manually transcribed by one of the investigators.
Five scores were obtained for each participant’s performance: (a) the number of correct responses, (b) the number of errors, (c) the mean cluster size, (d) the ratio of switches, and (e) the number of different subcategories revealed by the organisation of responses. Subcategories were established a posteriori.
Definition of terms
(a) Errors—Errors include a repeated word, unrelated word (i.e., a word that does not meet the task criteria), and non-word (i.e., a response that is not accepted as a word by native speakers). (b) Cluster—A cluster is defined as a group of two or more items that belong to the same subcategory. (c) Cluster size—The cluster size refers to the number of items belonging to a cluster. (d) Switch—A switch occurs when two consecutive items do not belong to a cluster. A switch can occur when there is a transition between different clusters, from one non-clustered item to another non-clustered item, from a non-clustered item to a clustered item, or vice versa. (e) Subcategory—The subcategory is the theme that a subject used to retrieve words in a cluster. There are two broad subcategories: a semantic/lexical category and a phonological category. The semantic/lexical category could include extinct animals, farm animals, pets, birds, and so on. While most subcategories (e.g., animals living in water, insects, predator-prey) can be applied to both languages, some are language-specific, for example, the animals used for human consumption and the animals of the Chinese Zodiac for Mandarin speakers or “lion-tiger-bear” (from The Wizard of Oz) for English speakers. As for the phonological category, different criteria were applied to English and Mandarin, reflecting the linguistic nature of each language. The phonological category for English includes consonant blends, the same initial letter, and/or the same final letter, all of which is on a phonemic level. In contrast, the phonological category equivalent for Mandarin is homophony at level of the morpheme (i.e., character). For example, 渴 (thirsty) and可 (very) share the string /ke/.
General scoring rule
Scoring rules are partly adopted from Troyer et al. (1997) and supplemented with unique rules to account for the Mandarin responses (see Supplementary Material for detailed scoring rules and examples). All relevant first appearances are considered a valid response; an invalid response would be a non-animal one such as “zoo” (a place where animals are found) or “vegetarian” (a person who does not eat meat), and so on. When a subject used a colloquial form for the same referent (e.g., 鼠 and 老鼠, the first being a colloquial form for the latter, meaning “rat” in Mandarin or kitty vs kitten in English), credit was afforded to the first item, while the second is considered a repetition. Errors were excluded from the number of correct responses. However, repetition errors were considered in cluster size so long as the exemplar met criteria for subcategory inclusion since such errors reflect lexical organisation and provide information about the underlying cognitive processes. In the case where an exemplar applicably appeared in two different subcategories, then credit was afforded to both occurrences; for example, “fish” can appear in the subcategory of “pets” and “water animals.” When an exemplar overlapped in two categories, that item was assigned to both categories; for example, “Bear-baboon-monkey”: baboon and bear was assigned to the phonological category for the subcategory of words beginning with the same first letter, and baboon, with monkey, was also assigned to the semantic/lexical category for the subcategory of primates. When a smaller cluster is embedded in a larger cluster, the larger cluster was considered the cluster and the responses forming the smaller cluster were included in the total cluster size; for example, when shellfish appeared within a group of responses considered to be water animals, the shellfish exemplars were considered as water animals, rather than being assigned to a different subcategory. Finally, two or more exemplars are needed to establish a category; thus, exemplars unrelated to either the category before or after it are excluded. Please refer to the Supplementary Material for more information on category fluency rules for Chinese and English.
Interrater reliability
Interrater reliability analyses were performed for Study 1 (and 2). For the Chinese Mandarin data, two native Mandarin speakers independently judged each response, for both CFT and LFT, to determine the cluster-membership of a word. Subsequently, the raters discussed discrepancies to achieve mutual agreement on all but nine responses for the CFT. On these nine responses, no agreement could be reached between the two raters, for which a third native English–Mandarin speaker (N.E.) provided the final decision. For the English data, two native English speakers independently judged cluster-membership for the CFT. Subsequently, the ratings were discussed and mutual agreement was achieved on all but three responses. A third rater (N.E.) provided the final decision on these three responses.
Results and discussion
An independent t-test analysis showed no language-specific difference in the number of correct responses generated on CFT between Mandarin and English, t(58) = –1.145, p = .257, yet Mandarin speakers committed significantly more errors than English speakers, t(41.605) = –2.591, p = .013. The mean cluster size was significantly larger in Mandarin than in English, t(45.890) = –3.409, p = .001. Both groups used switches to a certain extent, but we found a trend for slightly more switching among English speakers than Mandarin speakers, t(58) = 1.752, p = .085 (Table 2).
CFT performance by English and Mandarin speakers.
CFT: category fluency test.
Standard deviations are shown in parentheses. Statistical analyses were conducted between language groups: *p < .05; **p < .01; .05 < †p < .1).
As expected, there are characteristic differences between Mandarin and English speakers in CFT performance (as measured by cluster size) though CFT performances are not quantitatively different from each other. Therefore, we, with caution, propose that CFT can be used with speakers of Mandarin and that responses can be scored and analysed much like those of English speakers.
The total number of correct responses in CFT was not significantly different between the two language groups. Whether this suggests that these speakers approached the task in the same manner needs to be further evaluated to decide if task is applicable across languages. Pekkala et al. (2009) reported similar productivity in the CFT performance for animals in both English and Finnish speakers, suggesting that the category of animals might not be susceptible to linguistic or cultural influences as speakers of different languages generate a robust number of exemplars for this category. In contrast, Kempler et al. (1998) suggested that the difference in number of responses between Vietnamese speakers and Spanish speakers might be attributed to word length in the two languages. In particular, almost 80% of the Vietnamese responses were one-syllable words (20% were two-syllable words) whereas 45% of the Spanish responses were two-syllable words (and 45% were at least three-syllable words). As for the Chinese and English responses from the Kempler study, about 50% of the Chinese responses were one-syllable words, 45% were two-syllable words, and 5% were three-syllable words. About 48% of the English responses were one-syllable words, 37% were two-syllable words, and 15% were three-syllable words. Differences as reported by Kempler et al. were not bore out in our data; that is, word length differences between Chinese and English did not affect the performance of our subjects. Accounting for the lack of differences between our two groups required more in-depth analyses of the data. As result, we examined patterns of errors in our samples and the way responses clustered in the two languages.
Unlike the number of correct responses, we observed a substantive difference in the number of errors committed by the two language groups, with the Mandarin speakers making significantly more errors than the English speakers. In English, the few errors observed were described as repetition errors, unintelligible responses or else the use of a descriptive phrase such as “brown bear” or a proper name such as “Pooh Bear.” In Mandarin, a small percentage of errors (18%) was due to use of proper names, 孫悟空 and 豬八戒—two major characters from the 15th century Chinese literary classic, Journey to the West whereas the majority of the errors (82%) were repetition errors. About 70% of the repetition errors were related to the subcategory of the Chinese Zodiac. The remaining 30% were characterised as follows: use of formal versus informal terms such as蜥蜴 and四條腿蛇 to refer to “lizard” as well as use of diminutives such as 熊熊 and 貓咪 for bear cub and kitten, respectively, while having named bear and cat earlier in the sequence.
Interestingly, there is a Chinese verbal routine involving the animals of the Zodiac that is introduced in the preschool years—somewhat akin to the alphabet song for English speakers. Approximately, half of the Mandarin speakers named these animals in rote fashion—suggesting that this sequence is stored as a lexical whole, which would explain the speakers’ ability to quickly retrieve these exemplars in a fixed order. Animals of the Chinese Zodiac also belong to other subcategories (for example, “rat” appears in the subcategory of rodents; “cow” in the farm animal subcategory, etc.). The automaticity of generating the string of Zodiac animals likely leads to a weak activation one’s working memory, given that these animals would appear once again in a different category of animals. That repetition was not avoided would suggest that certain exemplars have features that allows for membership in more than one category and that (some) memberships are salient enough to defy executive suppression during this task.
The cluster size was also different between the two groups. On average, the Mandarin speakers generated larger clusters than the English speakers. The automatic productivity of the animals of the Chinese Zodiac contributed the most to the larger cluster size of the Mandarin speakers. With 13 of 30 subjects producing animals of the Chinese Zodiac (to varying degrees of completeness), the average cluster size of this subcategory was 7.54, constituting the largest cluster size for all subcategories across the two languages. In a post hoc analysis, we examined whether the characteristic difference in cluster size between Mandarin and English speakers was exclusively driven by the influence of the Zodiac category. We compared the average cluster size of Mandarin speakers that did not name any Zodiac animals (n = 17) with the average cluster size of the English speakers. Notably, the average cluster size of Mandarin speakers remained significantly larger than that of English speakers, t(45) = 2.278, p = .028. This result indicates a broad cultural influence—beyond the specific category of Zodiac animals—on the kind of responses produced by Mandarin speakers on a CFT, that is different than those produced by English speakers.
We also looked at whether the lexical nature of words in Mandarin influenced clustering tendencies. Approximately 80% of modern Chinese words are compound words (DeFrancis, 1984) where words are formed by putting together two free morphemes; for a detailed discussion on the morpho-syntactic nature of Chinese compound words and their underlying processes, the reader is directed to John Dai’s (1998) work. Many animal names are compound words that use a bound root to anchor another character for lexical information. For example, zebra is斑馬 a compound of 斑 (striped) and horse (馬), and hippopotamus is 河馬 a compound of 河 (river) and horse (馬). In our study, 19% of the total Mandarin responses were generated consecutively with one or more items sharing the same character, compared to 2% in the English responses (e.g., lion and sea-lion). Although this lexical feature of Mandarin did not lead to a larger cluster size, with the average cluster size of the words using a character to anchor words being 2.78, it is noteworthy this word formation strategy lends itself to lexical organisation for Chinese Mandarin speakers.
The ratio of switches was only marginally significant between two groups as the English speakers produced slightly more non-clustered items in general. With regard to subcategories, both groups predominantly used the semantic subcategory as opposed to the phonological subcategory—a finding that is consistent with Troyer et al. (1997). In our study, both groups of speakers overwhelmingly exhibited a semantic strategy with 97.8% and 98.7% of responses based on semantic association with the immediate past response in English and Mandarin, respectively. Our subjects’ responses do suggest that speakers share some sub-categorical organisation but at the same time, their responses also reveal language-specific strategies. We note that 76.2% of the Mandarin responses were assigned to subcategories that were used by both groups of speakers, whereas 90.6% of the English responses were assigned to the same (shared) subcategories. This means that almost one-fourth of the responses in Mandarin (23.4%) belong to the Mandarin-specific subcategory such as animals of the Chinese Zodiac and the set of the animals used for meat consumed by human; these are subcategories unique to the Mandarin speakers. In contrast, only 9.4% of the English responses belong to the English-specific subcategory such as animals in North America and “lion-tiger-bear,” the quote from The Wizard of Oz.
In sum, both groups of speakers made use of lexical and semantic features in lexical retrieval. There was no evidence to suggest that either group used phonological strategies—suggesting that phonemes may not be an efficient strategy for lexical organisation for English or Mandarin. Results suggest cultural and linguistic variables influence CFT performance on a characteristic level even though, at first glance, there may not be any quantitative differences. These results are unlikely generalisable to other language pairs but, instead, demand more critical evaluation of data from different language pairs.
Study 2 (LFT)
Method
A total of 66 subjects participated in this study: 33 American-English speakers and 33 Chinese Mandarin speakers (see Table 3), adhering to the same inclusion/exclusion criteria and recruitment process as the participants in Study 1. All subjects have at least a high school degree and were in the process or recently finished obtaining a university-level degree; none reported any history of learning, cognitive, or emotional challenges. All Mandarin speakers were recruited from a local college campus in Wuhan, China, using online and printed ads. Each subject reported that he or she is a monolingual Mandarin speaker with minimal, if any, exposure to another language. All English speakers were recruited from a local college campus in New York, USA, using announcements to student organisations. Each subject reported that he or she is a monolingual English speaker with minimal, if any, exposure to another language. Subjects who participated in this study were different from those who participated in the CFT.
Demographic characteristics of participants in the LFT condition.
LFT: letter fluency test.
Standard deviations are shown in parentheses.
Measures and scoring
Procedure
Given the large amount of homophony 2 in spoken Mandarin, written responses (instead of spoken ones) were collected in individual sessions from both groups of subjects to confirm intended responses. The examiner instructed the English subjects to write as many words as they can in 60 s that begin with the letters F, T, and M3 in three separate trials. The Mandarin subjects were instructed to write as many characters (字) as they can in 60 s that begin with the letters F, T, and M in three separate trials; recall that letters are used in Chinese for the sole purpose of accessing the pronunciation of characters. Specifically, we instructed the Chinese Mandarin subjects to write characters where the associated pinyin 4 begins with a particular letter. The order of letters was randomised across participants. Five scores were obtained for each participant’s performance: (a) the number of correct responses, (b) the number of errors, (c) the mean cluster size, (d) the ratio of switches, and (e) the type of subcategory.
Definition of terms
(a) Errors, (b) Cluster, (c) Cluster size, and (d) Switch are defined in the same manner as in Study 1. (e) Subcategory. The subcategory is the theme that a subject used to retrieve words in a cluster (specific examples for each subcategory are provided in the Supplementary Material). There are three broad subcategories: orthographic, phonological, and semantic/lexical category. For the English responses, deciding whether a subject was motivated by a phonological versus an orthographic strategy was extremely difficult to determine. As result, we determined orthographic strategies to be those where responses shared at least the first two letters. The phonological category was assigned to homonyms. The semantic/lexical category includes affixation, irregular changes, semantic association, words within the same grammatical category, compound words, and others (e.g., synonyms).
For the Chinese Mandarin responses, the orthographic category was assigned when a character is offered based on a radical strategy. For example, a subject relying on a phonological radical could produce “death,” “web,” and “forgot” (亡, 網, 忘, respectively) where the shared phonological radical (亡) may be motivating her responses. The phonological category was assigned to characters that sound the same but are not signalled by phonological radicals. For example, 松 (pine), 宋 (Soong—a surname), and 鬆 (loose) all share the same phonetic string but do not share a phonological radical. Semantic/lexical category was assigned to characters having similar grammatical functions, for example, 麼 and 嗎, which are particles used at the end of some questions or characters that generate a compound word when they are combined, for example, 咪 as in 貓咪 (kitten) and 媽咪 (mommy) to form the diminutive.
General scoring rule
Scoring rules are partly adopted from Troyer et al. (1997), in addition to creating new ones for a more in-depth analysis (see the Supplementary Material for more detailed scoring rules and examples). All relevant first appearances of responses were considered valid. Invalid responses were repetition of a previous response, responses that did not start with a given letter, non-words, misspelled words, and proper names. Errors were excluded from the number of correct responses. When a smaller cluster is embedded a larger cluster, the larger cluster was considered the cluster and the responses forming the smaller cluster was included in the total cluster size. When items met more than one subcategory, these items were assigned to both categories (e.g., “fly-flight”: They belong to orthographic category for the subcategory of the same first two letters, as well as semantic/lexical category for the subcategory of words within the lexical category). The broad categories were applied to determine whether there is a switch. Whether or not clustered, the first item of a task was not viewed as a switch. Please refer to the Supplementary Material for more information on letter fluency rules for Chinese Mandarin and English.
Interrater reliability
An interrater reliability analysis was conducted. For the Chinese Mandarin data, two native Mandarin speakers independently judged each response on LFT, to determine the cluster-membership of a word. Subsequently, the raters discussed discrepancies to achieve mutual agreement on all responses. For the English data, two native English speakers independently judged cluster-membership for the CFT. Subsequently, the ratings were discussed and mutual agreement was achieved on all responses.
Results and discussion
Shown by an independent t-test analysis, English speakers generated a significantly larger number of correct responses in LFT than Mandarin speakers, indicating an overall effect of language, t(64) = 3.932, p < .001. There was no significant difference between the languages with regard to the number of errors, t(52.669) = 1.622, p = .111. The mean cluster size showed a marginally significant difference with a larger size in Chinese Mandarin than in English, t(64) = –1.923, p = .059. The ratio of switches was significantly higher in the responses of the English speakers than that of Mandarin speakers, t(64) = 3.994, p < .001 (Table 4).
LFT performance by English and Mandarin speakers.
LFT: letter fluency test.
Standard deviations are shown in parentheses. Statistical analyses were conducted between language groups: ***p < .001; .05 < †p < .1.
The performance on LFT differed significantly between English and Mandarin speakers in both a characteristic and quantitative manner. However, the results should be understood in a linguistic and cultural context to interpret these differences. If adapted with modification, reflecting the linguistic characteristics of Chinese Mandarin, we conclude that the LFT can also be applied in Mandarin.
Unlike in the CFT, there was a difference in the number of correct responses in the LFT as English speakers generated a significantly larger number of correct responses compared to Mandarin speakers. To understand this result, the familiarity of the task should be taken into account. Because of the role of letters in word formation for English speakers, they pay attention to initial letters in words and, more generally, have notions about word decoding that involve attention to the sounds of words. In contrast, Chinese Mandarin speakers do not attend to the syllabic features of characters such as consonants and vowels. That is, the lack of phoneme level representation for character identification challenges Mandarin speakers to approach the task in a contrived manner. Syllables, the basic structure of the language, are most prominently represented in the respective orthographies in Chinese, and phonemes are most prominently represented in writing (i.e., graphemes) in English (McBride-Chang, Cheung, Chow, Chow, & Choi, 2006). Despite the use of the alphabet to access the pronunciation of Chinese characters both within and beyond China, our Mandarin-speaking subjects have had limited use of letters in the way the LFT requires. Generally, this specific use of letters can be thought of as a way to index words in the mental lexicon; hence, words can be accessed by attending to initial letters in an alphabetic language. Mandarin speakers, especially young people, have very limited alphabet experiences. Because the primary function of letters in Mandarin is to provide phonetic information about the pronunciation of Chinese characters, that is, pinyin, letters do not have the same linguistic value as they do in a language such as English. Pinyin, in addition to providing pronunciation cues, is the primary means of accessing written characters for Chinese word processing programmes. Our millennial subjects, despite having learned pinyin as young children and having to use the system to interface with word processing programmes, nonetheless, have less phoneme sensitivity than the English speakers simply because it is the character that is the building block of words in Chinese—not sounds. This idea is supported by their notably inferior LFT performance; even though the task is administered using letters, phonology plays an important role, as shown by the observed phonological strategies used by the participants. Applying the phonemic component to a task is likely to increase cognitive effort for speakers who otherwise would not organise the mental lexicon based on the sounds of the language. The nature of the language itself likely contributed to the limited number of responses generated on the LFT by Mandarin speakers.
The lower production rate in the Chinese Mandarin responses may also be attributed to the complexity of lexical processes in performing the LFT in Mandarin. When a letter is given, Mandarin subjects would first select a syllable beginning with the letter, and then select a character among many homophones; for example, the syllable “yu” is assigned to many different characters including 雨 (rain), 玉 (jade), 宇 (global), and 魚 (fish). The next step is to retrieve the grapheme of the character that the subject wants to write. Finally, the subject must decide whether to stay in the same syllable or select another syllable for the next item. In Mandarin Pinyin, nine syllables begin with F and 19 syllables begin with T and M, respectively, 5 when lexical tone is ignored. When a subject selects the syllable fa, she could write one of 47 characters pronounced as fa, where those characters would differ in tone. For example, fa in Tone 2 can represent the characters 乏, 罰, and 伐. The next item can be either another fa character, with a different lexical tone or a character pronounced with a syllable starting with F. Naturally, the size of the homophone range varies by syllable. While there are more than 40 characters pronounced as fa, there are about 200 characters pronounced as fu. In our study, despite these differences, almost the same number of fa characters and fu characters were generated in all. However, six unique characters were selected in fa, whereas 21 unique characters were selected in fu, indicating that the size of the character pool qualitatively affects the task performance. Most of the responses were high-frequency characters, from which participants can rapidly retrieve easy characters. Compared to Chinese Mandarin subjects, English subjects accessed much larger pools of words. Our results are therefore in line with what Snodgrass and Tsivkin (1995) suggested, namely, that speakers of different languages use different cognitive strategies, such as phonological recall and orthographic idiosyncrasies, based on their writing system when they retrieve words in LFT.
Cluster size was only marginally different, with the clusters in Chinese Mandarin being slightly larger than in English. English speakers produced 4.47 clusters per task, more than, the average 3.33 clusters in Chinese Mandarin. Despite the similar cluster size, English subjects produced significantly more switches and in these switches, there were more clusters and more non-clustered items than the Mandarin subjects. The percentage of non-clustered items was 30% and 21% in English and Chinese Mandarin, respectively.
As we expected, the subcategories that the English and the Mandarin subjects used within clusters were characteristically different, indicating that there is a language effect. While 72.7% of the English responses belong to the clusters of orthographic strategy (followed by phonological strategy at 17% and semantic/lexical strategy at 10.3%), 77.2% of the Chinese Mandarin responses belong to those of phonological strategy (followed by orthographic strategy at 17.4% and semantic/lexical strategy at 5.4%). Within the English items retrieved by an orthographic strategy, 76% were words having the initial and second letters in common, and 24% were words having similar syllabic structures. Within the Mandarin items retrieved by a phonological strategy, 60% were characters having the same sound with no shared radical, and 38% were characters having the same sound with the same phonetic radical. This result shows that while the phonetic radical 6 is an available strategy, it is less productive for LFT performance in Mandarin because focus on the syllable may be a faster route to lexical access, as compared to focus on the orthographic component.
General conclusion
Our results suggest that there are similarities and differences between the performance of healthy young Mandarin and English speakers in VF task performance. For animals on the CFT, both groups demonstrated quantitative similarity, but characteristic differences as the cluster size was significantly larger in Mandarin than English speakers and English speakers switched marginally significantly more. In contrast, there were marked quantitative differences in LFT, as English speakers produced significantly more words; characteristically, the cluster size was only marginally significantly larger in Mandarin than English speakers, but the ratio of switches was significantly higher in English than Mandarin speakers.
Thus, specifically the LFT results distinctively separated the two groups. That is, administered in written format, there were significant differences in the number of responses offered by the two subject groups across tasks, suggesting a strategic difference in approaching the tasks by the two groups of speakers. That there are marked differences in how letters are used in the respective languages supports the notion that different skills were recruited to perform LFT compared to CFT and such differences are bore out in the data. In addition, both CFT and LFT revealed practical linguistic and cultural differences in terms of the subcategories between our two groups of subjects, underlining the importance of culturally and linguistically unbiased VF norms for the Mandarin-speaking subject. On one hand, performance on a CFT taps lexical/semantic information that reflects cultural and linguistic differences across languages. On the other hand, performance on a LFT makes executive function demands of varying complexity—depending on the language of the subject.
The primary objective of this study was to provide some baseline data regarding the performance of Mandarin speakers on two types of VF tasks that have enjoyed wide-spread use in the clinical setting. As part of any comprehensive neuropsychological assessment, having normative data allows the clinician to evaluate the extent to which a Mandarin-speaking patient might be experiencing weaknesses in these aspects of cognitive functioning. Instead of viewing the paucity of responses on a LFT as pathological, the clinician must consider the linguistic frame in which letters function and evaluate how much that may be contributing to the patient’s performance. Likewise, that a patient might name exemplars of a given category in a manner that might impress as different, the clinician must consider the lexical frame in which responses are offered. To date, normative data on Mandarin speakers in these areas of lexical organisation and access remain restricted.
There are several limitations to this study. First, to provide information on lexical retrieval behaviour in Mandarin, we focused on young, healthy speakers; we did not engage older speakers for this study. Since fluent speakers of Mandarin have limited application for letters, as letters are primarily used accessing the Internet and in some cases, for word processing—inclusion of older speakers may not have captured the way letters are most commonly used by Mandarin speakers today. That is, range of Internet use among older Mandarin speakers may be simply too wide to collect meaningful data on letter use. Second, we did not include a set of bilingual speakers whose data, given the stark linguistic differences between Mandarin and English, may offer some interesting information about how letters are managed by bilingual subjects who use letters very differently in their two languages. Finally, we also did not include a set of patients with language impairments in this study as this was simply beyond the scope of the project. Nonetheless, VF performance of clinical populations can provide valuable about the manifestation of language impairment in different languages.
Directions for future research are to investigate the performance of younger and older bilingual (Mandarin/English) subjects as well as Chinese Mandarin-speaking subjects who present with different clinical profiles. That is, collecting data on the performance of Chinese Mandarin speakers with brain damage as well as those who struggle with language learning could shed additional light on how culture and language can influence performance on this cognitive task. Given the wide acceptance on VF tasks, it is imperative to gather culturally and linguistically valid data to inform clinical decisions.
Supplemental Material
QJE-STD_16-339.R2-Supplementary_Material – Supplemental material for A cross-linguistic comparison of category and letter fluency: Mandarin and English
Supplemental material, QJE-STD_16-339.R2-Supplementary_Material for A cross-linguistic comparison of category and letter fluency: Mandarin and English by Nancy Eng, Jet MJ Vonk, Melissa Salzberger and Nakyung Yoo in Quarterly Journal of Experimental Psychology
Footnotes
Acknowledgements
The authors acknowledge the following students for their help in piloting earlier versions of this task and for collecting and coding our data: Jennifer Chen, Yuk Lan Peng, and Natalie Buzzeo.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
