Abstract
Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners’ L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners’ abilities to make associations among the forms, sounds, and meanings of 40 Chinese characters, and administered it to 173 five- and six-year-old L2 kindergarteners in Hong Kong. We found a high model-to-data fit using the two-parameter logistic item response theory model (MSinfit = .87 to 1.07, ps > .41). Of the 40 items, 36 exhibited a range of difficulty and good discrimination. Internal consistency reliability and inter-rater reliability were also high. The children scored highest in the category of meaning-sound associations, followed by mapping from meaning to form. The results demonstrate that the instrument is adequately valid and reliable for assessing L2 kindergarteners’ character acquisition and imply that child L2 learners may develop abilities to map meanings and sounds prior to their development of form-related mapping. With minor modifications, the developed instrument can be used in various child Chinese L2 programs around the world.
Keywords
As classrooms around the world become increasingly multicultural, a global sense of urgency accompanies the early identification of second-language (L2) learners’ language skills (Lesaux, Koda, Siegel, & Shanahan, 2006), so that teachers and schools can provide suitable interventions in support of their L2 acquisition (García, McKoon, & August, 2006). Identification of Chinese L2 learners’ language skills is particularly urgent in Hong Kong. The ethnic-minority population expanded by more than 70% between 2006 and 2016, from 342,198 to 584,383, making up 8% of the total population (Hong Kong Census and Statistics Department, 2017). In 2015, around two-thirds of the kindergartens in Hong Kong admitted L2 speakers of Chinese, notably including Pakistani, Nepali, Indian, and Filipino students (Hong Kong Education Bureau, 2017). Concurrently, the importance of Chinese language proficiency to individuals’ employment and educational prospects has increased greatly since Hong Kong’s handover to China in 1997. Under its current biliteracy and trilingualism language policy, for instance, K–12 students are expected to develop literacy in both Chinese and English, and oracy in Cantonese, English, and Mandarin. Chinese applies to the written language, whereas Cantonese and Mandarin are the spoken languages. The Hong Kong Education Bureau has repeatedly stressed the need to develop L2 Chinese learners’ language proficiency to aid their early integration into the education system and wider community, and provided funding and curriculum guidelines in pursuit of that goal (e.g., Hong Kong Education Bureau, 2014). This governmental attention, however, has tended to focus on primary and secondary schools, despite the kindergarten stage’s demonstrated importance to L2 learners’ transition to formal education (Hau, 2010).
Adding to these concerns is the notion that L2 learners have found it difficult to learn Chinese under Hong Kong’s current education system. Rectifying this problem has been complicated by a lack of language assessments. Moreover, when such assessment has occurred, it has revealed that even those L2 learners who were born and educated locally tended to have low Chinese language and literacy skills, even when it came to their ability to read and write Chinese characters, a skill that is taught from kindergarten onwards (Ku, Chan, & Sandhu, 2005; Tsung, Zhang, & Cruickshank, 2010). In Hong Kong, kindergarten education is offered as follows for children aged from three to six: nursery class (K1; three to four years old), lower kindergarten class (K2; four to five years old), and upper kindergarten class (K3; five to six years old). Because the kindergarten stage strongly predicts later language development (e.g., Cunningham & Stanovich, 1997), it is essential to gain a clear picture of five- and six-year-olds’ Chinese character acquisition skills in order to establish a base of knowledge pertinent to such learners’ subsequent L2 development.
Chinese character acquisition
Much like decoding in alphabetic language learning, Chinese character acquisition is a fundamental skill for Chinese language development among both first-language (L1) and L2 learners, and especially beginning readers (Shen, 2013; Tse, 2002). For this reason, primary education in Hong Kong has placed considerable emphasis on cultivating children’s character acquisition. By the end of Key Stage 1 (i.e., Primary 1 to 3, approximately six to nine years of age), students are expected to know 2169 characters (Hong Kong Education Bureau, 2008). However, character acquisition is a major challenge for L2 learners, whose first languages display significant differences from Chinese in both their spoken and written characteristics (Lin & Collins, 2012; Shen, 2013).
Chinese writing is considered logographic, insofar as its script represents meanings (Taylor & Olson, 1995). The majority of Chinese characters are morphosyllabic; that is, each character is directly mapped to a morpheme and a syllable (DeFrancis, 1984). Each syllable is composed of an onset, a final, and a tone, but the pronunciations of the syllable may vary across Chinese dialects. The character 永, for example, is pronounced yǒng in Mandarin, but wing5 1 in Cantonese.
Cantonese, the main Chinese dialect spoken in Hong Kong, has 20 onsets, 53 finals, and six tones (Tse, 2002). Unlike in non-tonal alphabetic languages, which use pitch for stress and intonation, lexical tones in spoken Cantonese differentiate the meanings of their associated syllables (Ki, Marton, & Pang, 2010). Orthographically, Chinese characters are made up of eight types of strokes, including dots, lines, and hooks (DeFrancis, 1984). They may consist of single components (simple characters) or multiple ones (compound characters). Around 90% of the compound characters are semantic-phonetic; that is, each character consists of a semantic component and a phonetic one, which respectively provide cues to the character’s meaning and sound to a certain extent (Perfetti, Cao, & Booth, 2013; Hoosain, 1991). Taking the character 媽 (mother) maa1 as an example, the semantic component 女 neoi5 depicts the meaning “female”, and the phonetic component 馬 maa5 carries the sound cue; in this case, 馬 maa5 and 媽 maa1 share the same onset and same final, but carry different tones. In such ways, Chinese characters are made up of 200 semantic components and 800 phonetic components (Hoosain, 1991).
The definitions of the terms “character” and “word” in the context of the Chinese language are highly debatable, however. From a sociological perspective, characters are often considered the key linguistic unit by native Chinese speakers, who therefore use the two terms interchangeably (Packard, 2002). This study adopts the widespread view that a word is an independent, “syntactically free” form (Parkard, 2002, p. 12), whereas a character may be a word on its own (i.e., a free character), or two or more characters can combine to form words (Hoosain, 1991). That is, while most characters are morphemes, they can be free morphemes (i.e., capable of standing alone as words) or bound morphemes (i.e., requiring attachment to other characters to form words). As such, this study adopts Tse and Zhu’s (2001) theoretical definition of Chinese character acquisition as the ability to (1) accurately produce a character’s sound, (2) distinguish its written form, (3) understand its meaning, and (4) establish associations among its sound, written form, and meaning.
According to Ai (1949), if a person has acquired a Chinese character, when presented with one of its constituents, the person will be able to retrieve the other two. Tse and Zhu (2001) further elaborated this idea by proposing three conditions of Chinese character acquisition: (1) knowing a character’s sound and meaning upon seeing its written form; (2) visualizing its written form and understanding its meaning when hearing its sound; and (3) being able to represent it via both written forms and sounds when one has its meaning in mind. If any association fails, one has not fully acquired the character in question (Ai, 1949). The associations among three constituents of character acquisition are summarized in Figure 1.

Associations among the three constituents of Chinese characters.
As noted above, the form-sound-meaning relationship of Chinese and phonetic scripts is different: form maps directly to sound and meaning in Chinese, whereas in phonetic scripts, form maps to sound, which in turn maps to meaning. Variations in scripts have impacts on language learners, whose main task is to master the link between oracy and literacy; and such impacts on beginner readers are greater than on skilled readers, whose word processing has ipso facto become automatic (Leong, 1995).
Assessing Chinese conventional language abilities in young children
Given the unique features of Chinese characters, it would seem unwise simply to adopt or translate measures developed based on phonetic scripts to assess L2 Chinese learners’ character acquisition. Accordingly, in this section we review the existing assessments that measure young learners’ mapping of Chinese character or word constituents.
Assessment of meaning-sound mapping is typically used in receptive and expressive vocabulary tests. The former typically assess learners’ ability to comprehend the meaning of an orally produced word (Dunn & Dunn, 2007), and the latter, their ability to orally produce a word based on its meaning (Brownell, 2000). The Hong Kong Cantonese Receptive Vocabulary Test (Cheung, Lee, & Lee, 1997) was developed especially for Cantonese-speaking children aged two to six years. In it, the test-taker picks one from among a set of four pictures (a target, a phonological distractor, a semantic distractor, and an unrelated distractor) for each orally presented vocabulary item. Cheung et al. (1997) found that older children were more likely to choose phonological or semantic distractors than unrelated ones, indicating that they were acquiring partial knowledge of word sound or meaning over time.
It is common, however, to use translated tests: Tong, Ting, and McBride-Chang (2011), for example, used a translation of the Peabody Picture Vocabulary Test, Third Version (PPVT-III) to measure receptive vocabulary, and others have translated and adapted the vocabulary subtest of the Stanford-Binet Intelligence Scale to measure vocabulary depth (e.g., McBride-Chang & Ho, 2000; McBride-Chang, Shu, Zhou, Wat, & Wagner, 2003). As the translated tests were developed for L1 English learners, words included in them may not be equal, either in difficulty or pedagogical significance, to their Chinese equivalents.
The two most common Chinese language form-sound mapping measures for early learners are character-reading tasks (Rathvon, 2004) and word-reading tasks (e.g., Ho & Bryant, 1997; Ho, Chan, Tsang, & Lee, 2000; So & Siegel, 1997), in which children are asked to read out presented characters or words, respectively. Encoding assessments tap sound-to-form mapping. A typical example is a Chinese dictation task, which has been tested with primary schoolers but not kindergarteners (e.g., Ho et al., 2000; Lo, Yeung, Ho, Chan & Chung, 2016). Dictations are commonly used to measure Chinese character acquisition in school settings, with the child writing down characters read out by teachers.
All the aforementioned Chinese character assessment tests have measured children’s ability to map between a particular pair of constituents in isolation from the other possible pairings. However, some Chinese language tests have been designed to measure multiple abilities for character or word acquisition. For example, the Preschool and Primary Chinese Literacy Scale (PPCLS; Li, 2015) includes four subscales: (a) a character-picture matching task (form to meaning); (b) a listen-and-point task (sound to form); (c) a recognize-and-read task (form to sound); and (d) a read-and-say task (form to sound and meaning).
Based on the foregoing review, three issues motivated us to develop the new character acquisition assessment for K3 L2 learners. First, no assessment has hitherto been developed for, or normed with, child L2 Chinese learners, and an assessment that takes into account children’s developmental, cultural, and linguistic backgrounds is therefore urgently needed. Second, none of the existing assessments include the mapping of all three character constituents, in part because most follow English word-acquisition tests, and thus predominantly measure vocabulary and decoding abilities. As McBride (2016) has noted, character- and word-reading processes overlap strongly, but are not identical as they involve different skills. Thus, researchers do not yet understand fully the relationships among form, sound, and meaning in L2 Chinese learning. And third, few if any studies have focused on five- to six-year-olds, despite this age group’s providing a crucial baseline for children’s Chinese character acquisition before formal schooling. It is also important to ensure the developed assessment’s validity for measuring young L2 children’s character acquisition, with validity being, according to Messick (1989), “the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores” (p. 6). Accordingly, our validation inquiry is informed by both the theoretical design of the test and the psychometric properties of a sample of L2 kindergarteners.
The present study’s research questions are as follows:
To what extent is the Chinese character acquisition assessment (CCAA) valid in assessing the character acquisition of L2 learners in Hong Kong kindergartens?
What are the Chinese character acquisition abilities of L2 learners in Hong Kong kindergartens as reflected by the CCAA?
Methods
Chinese character acquisition assessment (CCAA) development
Task construction
For the CCAA, we developed six subtests based on Tse and Zhu’s (2001) theoretical definition of Chinese character acquisition, reflecting six sets of associations among the three character constituents: forms, sounds, and meanings (Table 1). In line with our participants’ linguistic context, we used traditional Chinese characters. Subtest A was Picture Naming (meaning to sound), and subtest B was Identifying Character Forms from Pictures (meaning to form). Children were presented with pictures with prompts; in the first subtest, they were asked to say characters aloud in Cantonese, which was the medium of instruction in their kindergartens, and in the second, they were asked to pick the corresponding character form from among four options. We piloted the picture stimuli with L2 kindergarteners to evaluate whether they could elicit the target meanings. Linguistically, some of the selected characters were bound morphemes, but our field observations revealed that it was a common practice for kindergarten teachers to ask children the meanings of these characters. For this reason, we adopted the meanings associated with the words in which they typically appeared in the children’s curricula. For example, 果 gwo2 typically means fruits when attached to other morphemes, for example, in 水果, 生果, and 果子. Likewise, we deemed responses consisting of multiple-character words containing the target character as acceptable.
Summary of task content for the CCAA’s six subtests.
Subtest C was Character Reading (form to sound), and subtest D was Matching Pictures to Character Forms (form to meaning). In the first, children were asked to read out written characters; and in the second, children chose the picture that best depicted each written character’s meaning. Subtest E was Matching Pictures to Sounds (sound to meaning), and subtest F, Identifying Character Forms from Sounds (sound to form). For each item in both subtests, an audio recording of the character’s sound was played once. Both male and female voices were used, and counterbalanced between children.
Four subtests (i.e., B, D, E, and F) comprised multiple-choice tasks, and in each case, four choices were provided for each item. These included the target character and the three distractors related to it, one orthographically (sharing the same radicals, structure, stroke combinations, and/or stroke distribution); one phonologically (sharing the same onset, final, and/or tone); and one semantically (related in meaning). This distractor arrangement was based on prior theories (Ai, 1949; Tse & Zhu, 2001) that emphasized the simultaneity of mastering character form, sound, and meaning. Prior Chinese character assessments (e.g., Cheung et al., 1997) utilized only phonological and semantic distractors. We added orthographic ones to allow the exploration of children’s distractor preferences to illuminate individual learners’ acquisition processes. For instance, if children tended to choose phonetic distractors, it might mean that phonological cues were activated before orthographic or semantic ones (see Perfetti & Tan, 1999).
To control for difficulty, all the distractors were selected from the same 200-character list as the 40 target characters were (for more details, see “Item Selection,” below). In selecting the distractors, the expert panel also included teachers’ feedback on which characters L2 learners commonly confused with one another. For example, 手 (which carries the meaning of hand) is a simple character pronounced as sau2. The orthographic distractor was 毛 (feather) mou4, also a simple character sharing similar stroke distribution; the phonologic distractor, 狗 (dog) gau2, has the same tone and final as the target character, but a different onset; and 腳 (leg) goek3 was selected as the semantic distractor, with both hand and leg being body parts.
Item selection
The same 40 characters were used in each subtest, yielding a total of 240 items, and were selected from among the top 200 characters by frequency in the K1, K2, and K3 teaching plans of 21 Hong Kong kindergartens, all of whose student bodies included L2 learners. This selection was made by a panel consisting of four university researchers with expertise in L1 and L2 Chinese teaching and learning; of these four, three were drawn from our research team and had an average of 10 years of language-teaching experience, whereas the other had more than 20 years’ experience as a language teacher and school principal. We directed them to make their selections according to the following criteria: (1) variation in frequency, orthography, phonology, and morphology; (2) alignment with Hong Kong’s primary Chinese language education curriculum for first through third grades (Hong Kong Education Bureau, 2008); and (3) cultural appropriateness. The selected characters included 16 simple and 24 compound ones, with the latter category including 10 upper-lower, nine left-right, two upper-mid-lower, two left-mid-right, and one surround. Phonologically, the 40 characters covered all six tones: nine high even (tone 1), eight high rising (tone 2), four middle even (tone 3), three low falling (tone 4), seven low rising (tone 5), and nine low even (tone 6) tones. Semantically, the characters included 30 nouns, four verbs and six attributes, all of which would be encountered by L2 kindergarteners (e.g., nature, people and animals, body parts, transportation, numbers, and colors).
Format and instructions
The test administrator presented the instructions and stimuli as a series of mini-games within an electronic slideshow displayed on a 13-inch monitor. Instructions were bilingual in Cantonese and English, because our prior field experience had revealed that L2 children at our research sites were more proficient in English than in Cantonese, and we wanted to ensure that the instructions were as clearly understood as possible. There was one trial item for each subtest, and if a particular child did not know how to respond to it, the test administrator would provide him or her with guidance. The children were encouraged to guess the answers to questions if they were uncertain about them.
Child participants
We recruited 173 K3 children (95 girls, Mage = 66.60 months, SDage = 7.14), all registered as non-native speakers of Chinese but having no special educational needs, from 12 Hong Kong kindergartens geographically distributed across the three regions of Hong Kong (i.e., Hong Kong Island, Kowloon, and New Territories). The proportion of L2 learners to the total enrollment in each kindergarten ranged from 4.2% to 100% (M = 37.47%, SD = 35.52). Children were mainly Nepali (37%), Pakistani (28.9%), Indian (14.5%), or Filipino (13.3%), with the remainder (6.3%) comprising Indonesian, Thai, American, African, Korean, and mixed-ethnicity children, as well as one whose ethnicity was not reported. The ethnic distribution approximated that of Hong Kong’s L2 Chinese kindergarteners (Hong Kong Legislative Council, 2017).
Procedures
Before data collection, test administrators attended training that covered task administration, together with the learning needs of L2 learners, and practiced rating against videotaped performances. These administrators, all of whom were native Cantonese speakers, included graduate and undergraduate students in education, psychology, and arts subjects.
Written informed consent was obtained from the children’s parents and principals. The test was individually administered in two 15-minute sessions on two consecutive days and the subtests were administered in a randomized order. Children and test administrators both listened to the standardized instructions via headphones in order to minimize distractions.
Scoring
We scored the multiple-choice subtests B, D, E, and F dichotomously, with a rating of 1 being awarded for correct responses, and 0 for choosing any of the three distractors. We scored the open-ended responses in subtests A and C with a partial-credit rubric that awarded a rating of 1 for a correctly pronounced character (onset, final, and tone correct), 0.5 for a mispronounced character (only one or two of the onset, final, or tone correct), and 0 for wholly incorrect responses (onset, final, and tone incorrect). For example, for the character 書 (book) syu1, the following sounds would be considered a mispronunciation and given a rating of 0.5: syu2 (onset and final correct), zyu1 (final and tone correct), and syut3 (onset correct). This partial scoring system was designed on the basis that early-stage L2 learners of Chinese have difficulties in mastering Cantonese pronunciations.
As noted above, the children were encouraged to guess rather than omit items when uncertain. Because partial knowledge may enable a child to eliminate distractors, resulting in higher odds of obtaining correct answers than would occur for random guessing, corrections using formula scoring (Lord, 1975) were made to allow meaningful comparisons among subtests, as recommended by Kaplan and Saccuzzo (2013):
After such adjustment, we computed the children’s composite scores by summing the subtest scores.
Results
Item analysis
Item difficulty and discrimination are important test characteristics. One-parameter logistic item response theory (IRT) models evaluate item difficulty, and two-parameter IRT models evaluate both these characteristics. According to the Akaike and Bayesian information-criterion indices, the data was a better fit with the two-parameter than the one-parameter model across subtests (χ2s > 82.24, ps < .001), and hence we adopted the two-parameter model. The items in each subtest were separately calibrated, and item analyses conducted to determine model-data fit, discrimination, and difficulty. Item-fit statistics and item-characteristic curves evaluated the model-data fit. The items’ weighted mean square fit statistics ranged from .87 to 1.07 (ps > .35), indicating a high model–data fit.
Item discrimination parameter estimates (a or slope) reveal how well an item discriminates among individuals, with steeper slopes indicating higher discrimination, and values above .50 being ideal (DeMars, 2010). This threshold was exceeded by nearly all items in subtests A, C, and E, and by about three-quarters of the items in other subtests (see Table 2).
Ranges, means, and distributions of difficulty parameters and discrimination parameter estimates (N = 173).
Item difficulty parameter indices (b) for most items were within the −3.0 to +3.0 logits range for appropriate difficulty levels (Baker, 2001), the exceptions being some items in subtests A, C, and E that were above +3 logits. Considered together, the six subtests showed a spread of item difficulties (see Table 2).
In light of the above statistical results, all items were re-reviewed for their appropriateness by the same expert panel that had selected them. As this study’s framework stresses the simultaneity of six mapping abilities among character constituents, we considered the item statistics of each character as a function of all six subtests simultaneously. Our review also took into consideration the relative significance of each character to Chinese learning, and we evaluated each character’s properties relative to the whole scale. For that reason, we retained some items despite their item statistics not being ideal. For instance, we retained item 33 (冷 [cold]), despite having been marked as high difficulty in subtest C, because it was one of just a few characters that were represented differently in their written and spoken forms 2 , as well as one of the few adjectives. When deciding on item deletion, we also took into account the need to maintain a range of item difficulties.
In all, the panel recommended that removing four items, 11 (海 [sea]), 18 (路 [road]), 36 (穿 [wear]), and 40 (葉 [leaf]), leaving 36 items for use in each subtest, and 216 in the CCAA as a whole. Items 11 and 18 were both unacceptably difficult in subtests A and C (bsitem 11 > 3.97; bsitem 18 > 3.59), and item 18 had low discrimination for subtests B, D, and F (as < .18). Items 36 and 40 had difficulty parameters well above the accepted upper range of +3 logits in subtest C (bitem 36 = 21.21; bitem 40 = 6.15), and their discrimination abilities were low (subtests B, D, and F for item 36: as < .21; subtest F for item 40: a = .27).
Reliability analyses
Empirical reliability estimates for the revised, that is, 36-item versions of each subtest ranged from .85 to .94; and their Cronbach’s alpha estimates of internal consistency ranged from .82 to .96. Split-half reliabilities, as represented by Spearman–Brown coefficients (based on odd-even splits) ranged from .81 to .97. For inter-rater reliability, Pearson product-moment correlation between the scores of two independent raters ranged from .99 to 1.00 (ps < .05) for 15% randomly selected conducted tests.
Comparisons among subtests
Descriptive statistics are in Table 3. We used a one-way ANOVA to explore whether the children’s performance was affected by subtest administration order, and found no main effect of such order on scores, ps > .441. Similarly, there was no significant effect of the sequence of female and male voices in subtests E and F, ps > .070.
Means and standard deviations of CCAA scores of K3 L2 children in Hong Kong kindergartens (N = 173).
Note: Reliabilities are represented by Cronbach’s alpha coefficients; maximum score for subtests A to F is 36 and maximum composite score is 216; Rel = Reliabilities; M = Mean; SD = Standard deviation; Min = Minimum; Max = Maximum.
Negative scores for subtests B, D, E, and F were generated in some cases because those subtests were adjusted for random guessing.
We explored the effects of subtests with repeated-measures ANOVA. Mauchly’s Test of Sphericity indicated that the assumption of sphericity was violated (χ2(14) = 227.99, p < .001), and interaction effects were therefore corrected using the Greenhouse–Geisser method. There was a main effect of subtest, F (5.89, 453.46) = 98.55, p < .001, ηp2 = .39, ε = .58, and according to Bonferroni post-hoc analysis, other than C, D, and F that were not significantly different from one another, significant differences were found among all other 12 combinations, ps < .04. Children scored highest on subtest A (M = 15.51, SD = 10.41), followed by subtests E (M = 14.05, SD = 9.67) and B (M = 9.14, SD = 8.95). The scores for the other subtests were as follows: C (M = 7.16, SD = 7.29), D (M = 7.48, SD = 8.66), and F (M = 7.17, SD = 8.48).
Discussion
Using a theoretically grounded task design, we have evidence that the proposed CCAA’s content is valid for measuring the character acquisition of preschool-age learners of L2 Chinese in Hong Kong. Because acquiring characters requires the ability to make associations among their forms, sounds, and meanings simultaneously (Ai, 1949; Tse & Zhu, 2001), we developed six subtests to tap the children’s abilities to map characters. The instrument’s task formats, that is, the use of pictures and multiple-choice options, followed the developmentally and linguistically appropriate pattern set by well-established assessments of young learners’ word and character abilities. However, we adopted partial-credit scoring for tasks involving oral responses, which is uncommon for vocabulary/word tests that involve oral production. In our case, this was of key importance for capturing K3 L2 learners’ acquisition of onsets, finals, and tones, while acknowledging their partial knowledge of sounds. As mentioned in the introduction, pitch is used only for stress and intonation in non-tonal languages, and hence tones may be challenging for learners with non-tonal first languages to acquire. Another unique aspect of CCAA’s development involved selecting items from a list of Chinese characters frequently taught to L2 Chinese kindergarteners in Hong Kong by an expert panel. The panel discussed the items’ orthographic, phonological, and semantic properties, as well as their significance to beginning learners, to ensure that the instrument was well aligned with teaching and learning.
In terms of psychometrics, weighted item-fit statistics (MSinfit = .87 to 1.07, ps > .35) indicated that the data fit well with the model. Most items for subtests A, B, D, E, and F were within the acceptable range of difficulty for the L2 learners’ abilities, and showed a range of difficulty levels, mostly in the range of average difficulty (−1 to +2 logits), although subtest C was more difficult, with more than half of the items beyond +3 logits. Subtest E had the best difficulty distribution of any of the subtests, with multiple relatively easy and relatively difficult items. The observed variation in item difficulty could have been related to learners’ exposure to certain characters at school and in their daily lives, and/or to certain characters being inherently more complex than others. Despite our exclusion of characters that were not frequently taught in Hong Kong kindergartens, however, few items were found to be easy, probably because of the limited number of Chinese characters that the L2 learners had acquired. With item difficulties generated, a discontinue rule could be applied in future CCAA administrations; that is, the subtest should be terminated if a child has scored zero on consecutive items when re-ordered in ascending order of item difficulty in order to reduce children’s cognitive burden and to make the assessment more psychologically appropriate for young children with low or pre-emergent skill levels.
Most items had good discrimination abilities, although response formats (i.e., the open-ended format of subtests A and C vs. the multiple-choice format of the other subtests) were associated with variation in both item difficulty and discrimination. Specifically, the multiple-choice format was both less difficult and less discriminating, insofar as a child could arrive at the correct answer via guessing, unlike on subtests A and C.
After reviewing item parameters, the panel suggested removing the same four items from all subtests, as possessing very low discrimination and being too difficult. However, some items with manifestations of the same two problems were retained because they were important to understanding L2 learners’ character acquisition. The main consideration was whether the complete set of items adequately represented the range of orthographic, phonological, and semantic features that are fundamental to Chinese learning. As the teaching and learning of young L2 Chinese learners in Hong Kong is a developing field, retaining these items will enable the CCAA to measure learners’ growth across cohorts. Qualitative research is needed to understand if there is any psychological effect of presenting young children with these types of items that may be too hard for them (for an example of such qualitative research, see Winke, Lee, Ahn, Choi, Cui, & Yoon, 2018). Such findings from interviews with the children or their teachers could inform the development of such tests, and allow developers to know whether computer adaptive tests should be used to prevent low-skilled children from receiving higher-difficulty-level items.
Reliability was evaluated in terms of the internal consistency and inter-rater reliability, and was found to be high: with empirical reliability, Cronbach’s alpha, and split-half reliabilities ranged from .81 to .97. These were comparable with other Chinese language tests for young Chinese children using similar formats, including the PPCLS (Li, 2015), which ranged from .72 to .84 for a Hong Kong sample of the same age group (Li, 1999). The CCAA also showed high inter-rater reliability for all subtests (rs > .99), reflecting the stability of scores across raters. Moreover, the children’s performance was not affected by the subtests’ sequence or by their male or female voice versions, and thus the tests can be administered in any sequence and either version in the future.
More importantly, the average scores attained on the six subtests provide new evidence regarding the Chinese characters that L2 learners acquire by late in their time at kindergarten. While subtests A and E (meaning-sound associations) received the highest scores overall, subtest B (meaning to form) yielded significantly higher scores than the other three subtests that involved character forms. This suggests that L2 learners may acquire the ability to map character meanings to forms before they develop the ability to map between forms and sounds, at least in the case of certain characters. Given the morphosyllabic nature of Chinese orthography, it is reasonable to expect that such form-meaning relationships could emerge without a solid form–sound association. This echoes prior suggestions regarding the importance of orthographic-semantic connections when learning to read Chinese scripts (e.g., Ju & Jackson, 1995; Perfetti et al., 2013; Shu & Anderson, 1997); however, this observation needs to be verified.
We found that meaning-sound associations were relatively easy for young L2 learners. However, Shen (2010) reported that of shape, sound, and meaning, novice adult L2 Chinese learners regarded sound as the most difficult to learn at the radical level, particularly when it came to the following: tone discrimination and memorization; the connection of sound with the other two constituents; and accurately producing sounds. Radicals are orthographic units of characters, and many Chinese characters are made up of phonetic and semantic radicals, which provide clues to the sound and meaning of characters respectively. Even though Shen’s findings were based on adult learners’ perceptions and not their actual abilities, this discrepancy hints that L2 learning for beginners may differ sharply between young children and adults. However, it is also possible that this large discrepancy resulted from Shen’s study being conducted in a foreign-language environment, whereas in the current study, children had Cantonese sound exposure outside class, which might have helped them build meaning-sound associations. More nuanced, future explorations within this topic are warranted to better understand the size and shape of such discrepancies and how both age and L2-contact amount may affect them.
Although the present research focused on L2 learners’ average performance, large differences among individuals, as shown by both the ranges and standard deviations (from 7.29 to 10.41), should also be acknowledged. There were several high scorers among the children, reflected by high outliers and extremes, whereas a much larger group had scores near zero, reflecting that they had not acquired any Chinese characters by K3. This high variability is consistent with the findings of a survey of Hong Kong kindergartens conducted by Hong Kong Unison (2012), in which most reported that individual differences in L2 learners’ Chinese language proficiency posed challenges to Chinese teaching. Such marked heterogeneity should be borne in mind when interpreting the present findings, and also raises the question of why some L2 children acquire Chinese characters accurately, whereas others are left with almost no knowledge of them despite receiving the same kindergarten education. A more contextualized look at how Chinese characters are taught in kindergarten may be warranted, especially if it appears that tests such as the CCAA are measuring character learning that may occur outside the classroom context. Such evidence might strongly influence recommendations on how the scores from the CCAA can be used appropriately (e.g., whether the assessment is appropriate to measure a child’s individual character acquisition, or to measure a kindergarten program’s effectiveness in teaching characters).
Several limitations should be noted. First, owing to the highly varied learner characteristics noted above, and because enrollment data on L2 learners that would have enabled random sampling or stratified random sampling was not publicly available, it was difficult to ensure the children’s representativeness, despite efforts to recruit a sample that accurately reflected Hong Kong L2 learners’ gender, ethnicity, school composition, and school location. Second, L2 learners were more difficult to reach than L1 learners because they had higher rates of absenteeism, and because their parents tended to communicate less with kindergartens, which could have resulted in sampling bias. Third, regarding the use of the CCAA as a measure of Chinese character acquisition, it should be remembered that the scores relied on children’s responses, which could have been affected by contextual and personal factors, including their attention level and mood when assessed. Finally, an anonymous reviewer of this manuscript expressed concerns about the validity of testing isolated bound morphemes, as the test tasks do not mirror language in the real world, although this task does represent the way in which characters are commonly taught to L2 learners in classroom settings. We agree with the reviewer, and also with the reviewer’s suggestion that this is a philosophical issue that can and should be debated in the future by Chinese language teachers and researchers.
Conclusion
To conclude, the instrument we developed and investigated provides a means of comprehensively exploring all six associations among the three constituents of Chinese characters, which have seldom, if ever, been measured simultaneously at the character and word level. The Chinese script’s unique orthographic, phonological, and semantic features mandate that early skills in the Chinese language be measured and described in a distinctive manner: a manner, moreover, that differs sharply from the approaches that prevail in the literature.
Owing to the pioneering and exploratory nature of the study, future research should seek to extend applied linguists’ understanding of the dynamics of Chinese L2 learning, especially among very young children. Nevertheless, the CCAA not only facilitates exploration of such learning among kindergarteners, but also provides a wholly novel, but reliable, means of understanding Chinese character acquisition systematically. Our research provided evidence that the content within the test is predominately valid for measuring young children’s Chinese character acquisition. The CCAA could also be adapted to the assessment of beginning L2 children learning other Chinese dialects, including Mandarin. This could be achieved by reviewing the character list in order to evaluate whether the characters are pedagogically significant in that context, and modifying the multiple-choice tasks to ensure that target characters and phonological distractors remain phonologically similar.
Another promising direction for future research involves the errors made and distractors chosen by L2 learners across the six subtests. The unique properties of Chinese orthography allowed for a systematic arrangement of multiple-choice distractors, in terms of their orthographic, phonological, and semantic resemblance to the target characters, which would hardly be possible in assessments involving alphabetic languages. The reasons underlying the observed variations in the associations among character form, sound, and meaning also invite further exploration.
Lastly, future research should establish the predictive and concurrent validity of CCAA scores using other data (especially qualitative data; see Winke et al., 2018), which would also enhance the interpretability of such scores. Owing to time and resource constraints, we were unable to follow fully Kane’s validity framework (2013), and only focused on our assessment’s content validity and psychometrics. The validity of the test’s score interpretations could be further examined by comparing its scores against those arrived at via other early Chinese language measures and assessment methods, including observations and reports from teachers and parents (García et al., 2006).
Our findings about L2 Chinese learners before formal schooling provide a baseline for future research on developmental trends, by allowing comparisons both with other age groups and with native Chinese speakers of the same age. As transnational mobility increases, learning languages other than one’s own (notably, Chinese) is becoming increasingly important. This work is therefore an important first step towards understanding the early acquisition of L2 Chinese before formal schooling, which can inform L2 acquisition theory, pedagogical design, and policy decisions to support the teaching and learning of L2 Chinese, both in Chinese-speaking societies and beyond.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the General Research Fund from the Research Grants Council of Hong Kong under File no.1760615, Hong Kong.
