Abstract
A teacher’s assessment literacy refers to her or his demonstrated understanding of the principles behind selecting and designing tasks, judging student work, and interpreting and using assessment data to support student learning. This study examines the development of the task design aspect of assessment literacy in 12 Chinese language teachers as they participated in a two-year authentic assessment professional development program. By analysing the quality of assessment tasks designed by the teachers over time, we found that, although teachers quickly grasped many aspects of task design, they found it difficult to incorporate certain knowledge manipulation criteria into their assessments. The study provides insights into the contextual and discipline-embedded challenges that face language teachers with regard to assessment.
Keywords
I Introduction
The notion of ‘assessment literacy’ has been a recurring theme in the assessment literature since it was first coined by Stiggins (1991) for use in general education. Typically, the term literacy refers to the ability to read and write but it is also used to indicate an individual’s knowledge or competency in a specified subject area. Over the past two decades, assessment literacy, assessment knowledge, or assessment competence have been used interchangeably in general education to describe ‘teachers’ understanding of assessment processes as well as their capacities to design assessment tasks, develop adequate criteria for making valid judgments on the quality of students’ performances, and understand and act upon the information that is collected through assessment’ (Hay & Penney, 2013, pp. 69–70).
In the field of language testing and assessment, there is a growing interest in developing teachers’ assessment literacy due to a substantive increase in the testing and assessment responsibilities placed upon language teachers. Although the terms testing and assessment are often used interchangeably in language and linguistics research, in this article we draw a distinction between the purposes and functions of large-scale testing and ongoing classroom assessment. We acknowledge that it is important for language teachers to know how to use and interpret large-scale tests, but we argue that language teachers also need to be competent in designing and using high-quality classroom assessments to elicit students’ demonstrations of higher-order language competence and to support student learning. However, according to Taylor (2009), many language teachers and language teacher educators receive inadequate training in the theory and practice of assessment. As noted by Webb (2009), ‘facilitating change in teachers’ assessment practice is not so much a resource problem’ (p. 3). Rather, it is about ‘helping teachers develop a “designers’ eye” for selecting, adapting, and designing [high-quality assessment] tasks’ (p. 3). Without this, teachers may blindly adopt commercialized assessment packages or online assessment resources that do not adequately match the needs and intentions of their own classroom. Hence, our study was focused on an important aspect of assessment literacy, that is, how the teachers developed their understandings and applications of the principles of assessment task design.
In a context of global educational policies focused on the identification of core outcomes and standards (Luke, 2011), the success of curriculum reforms and assessment-related policy initiatives in language teaching hinges upon language teachers’ assessment literacy. Language teachers’ sound knowledge and understanding of the principles and practices of assessment will enable them to select, adapt, and design assessment tasks that are well aligned with the intended learning outcomes in a new curriculum, and to adopt pedagogical approaches that support authentic and substantive student learning of those outcomes. Despite the availability of a considerable literature on the development of teachers’ assessment literacy in English-language classrooms (e.g. Fulcher, 2012; Taylor, 2009) and other core subjects including preservice teacher education (e.g. Koh, 2011a; Mertler, 2009; Newmann & Associates, 1996; Volante & Fazio, 2007), little attention has been given to the assessment needs of Chinese language teachers, who play a pivotal role in engaging Chinese language learners in the learning of Chinese (Mandarin) in multilingual and multicultural contexts.
Given the rapid rise of China as a major economic power in recent decades, there has been an unprecedented growth in the learning of Chinese in North America, the United Kingdom, and other parts of the world. This is evidenced by the establishment of more than 300 Confucius Institutes worldwide (Qi & Zhang, 2014) as part of an official Chinese governmental strategy to promote the teaching, learning, and assessment of Chinese as a foreign language – a not-uncontroversial move in the context of China’s geopolitical expansion. In addition, many Chinese language learners study Mandarin Chinese as a ‘mother tongue’, or heritage language – ranging from localized diasporic Chinese in places such as Singapore to large cohorts of first- and second-generation Chinese immigrant children in countries such as Canada, Australia, New Zealand, the United Kingdom, and the USA. According to Duff (2008), ‘Mandarin Chinese is being acquired and used by many tens of thousands of Chinese children in diaspora contexts worldwide as well as by foreign language learners’ (p. 6). Standard Chinese (Mandarin) is the third most widely spoken language in homes in Canada and the USA (Statistics Canada, 2011; US Census Bureau, 2011). This article reports on the development of Chinese language teachers’ assessment literacy in the multicultural and multilingual society of Singapore.
To date, only a few empirical studies on teacher professional development in assessment literacy have centered on the quality of teachers’ task design, and these have focused on core subject areas such as English, science, and mathematics (Koh, 2011a; Lingard et al., 2001; Newmann & Associates, 1996). At the International Research Symposium on Chinese Language Education and Teacher Development, Duff (2008) made the case for the development of Chinese language teachers’ capacity in using language assessment tasks that are better aligned with the intended curriculum that emphasizes contextualized linguistic knowledge and communication skills. Her comments highlighted the potential role of authentic assessments in Chinese language teaching and learning.
II The study
This study reports on task design assessment literacy development of a group of Chinese language teachers during a program of professional development. Research on teacher professional development has shown that ongoing, sustained professional development is more effective than ad-hoc, one-shot workshops designed to improve teachers’ classroom practices (Clark, 2001). This is because ongoing, sustained professional development engages teachers in active and collective learning over a period of time in what should be a non-threatening and collegial learning environment (Wiliam & Thompson, 2008). Often, professional learning communities are incorporated into teachers’ sustained professional development. In this study, a collaborative learning approach was adopted to enable the participating teachers and the researchers to work side by side. As Tierney (2006) suggests, ‘Rather than one-shot training sessions delivered by assessment experts, more collaborative professional development projects that significantly involve educators at the local level will be needed’ (p. 259). This study utilized a professional learning community context to probe the development of Chinese language teachers’ assessment literacy through the following research questions:
How did Chinese language teachers develop in their understanding of quality authentic assessment task design?
How did the quality of students’ work in Chinese demonstrate the changes in the quality of teachers’ assessment tasks?
Although in-service teachers are expected to have acquired some subject-specific knowledge and skills in assessment during their pre-service education, their level of assessment literacy may be inadequate. They may be challenged in their ability to design and implement authentic assessments that are instructionally sensitive, and that can provide an accurate picture of what their students know and can do in the real world (Popham, 2009). Often teachers are given pre-designed and/or prescribed assessment resources that are then utilized with little critical reflection. Assessment tasks and rubrics that are not well aligned with the classroom teaching and learning context could result in misleading assessment information, which might adversely affect the quality of feedback given to students as part of the learning process. Quality feedback is one of the essential formative assessment strategies that enables a productive learning environment (Hattie & Timperley, 2007; Shepard, 2000). Hence, quality assessment task design is a prerequisite for effective formative assessment practice.
Since the beginning of the 21st century, the concept of assessment literacy has become increasingly complex and dynamic due to a paradigm shift from cognitive views of learning and psychometric testing to social-constructivist views of learning and authentic assessment (Shepard, 2000; Stiggins, 2002). In the context of this study, we have adopted Stiggins’ (1991) definition of assessment literacy, with a specific focus on building teachers’ capacity to design authentic assessments that are intellectually challenging for students in the learning of Chinese.
1 Teacher professional development in assessment literacy
Previous research has shown that when assessment tasks are authentic and intellectually demanding, the affiliated student artefacts and achievements indicate demonstrably greater intellectual depth and substance, and connectedness to the world (Koh & Luke, 2009; Lingard et al., 2001; Newmann & Associates, 1996). Authentic assessment tasks replicate the real-world challenges and standards of performance that experts or professionals typically face in the field (Wiggins, 1989). In this study, in order to guide teachers’ design of authentic assessment tasks in the professional development program and to assess the relationship between the quality of teachers’ assessment tasks and the quality of students’ work, the framework of authentic intellectual quality (AIQ) was used (Koh, 2011b). The AIQ framework consists of a set of criteria that has served as guideposts for teachers to develop high-quality authentic assessment tasks in a variety of disciplines. Core definitions of the AIQ framework are described in a later section.
2 Authenticity in language assessment
Authentic assessments incorporate open-ended performance tasks that enable a valid demonstration of students’ capabilities such as knowledge production, extended communication, higher-order thinking, and creative problem solving. In the context of language assessment, student performance on authentic language tasks represents students’ knowledge, proficiency and competence in the language, all of which can be readily transferred to future use of the target language. This notion of authenticity in designing language assessment tasks has long been advocated in the language assessment work of Bachman (1991). 1 Following Bachman, there are two types of authenticity in language teaching and learning probed in this study: situational authenticity and interactional authenticity.
‘Situational authenticity’ refers to the matching of the features of language tasks to the target language use. For Bachman (1991), the key is the ‘perceived relevance’ (p. 691) of the assessment or task to the substantive situation in which the language is being used by the task taker. We have referred to this as the ‘fit’ between task and target. For example, if the curricular aim is to promote in-the-moment, face-to-face communicative competence of students, the task must be designed to incorporate those same face-to-face opportunities. This would be in contrast to focusing on, for example, rote reproduction of specific phrases or grammatical constructions in the target language that depict conversation in a written format. Comparable principles are well established in the field of TESOL (Teaching English to Speakers of Other Languages), where the development of large-scale assessment instruments has grappled with the complexities of assessing communicative competence and generic mastery, which rely upon the variables of cultural context and social situation (e.g. Byram, 1997). The balance of this dynamic has been a recurrent problem in large-scale assessments such as the International English Language Testing System (IELTS) (Brown, 2003), and in face-to-face language teaching and assessment practices (Guariento & Morley, 2001).
‘Interactional authenticity’ refers to the degree to which the task taker can access and connect with the task (Lewkowicz, 2000). Bachman (1991) has defined this mode of authenticity as ‘a function of the extent and type of involvement of task takers’ language ability in accomplishing a test task’ (p. 691). He emphasized the challenge of designing tasks that allow students to demonstrate the full depth of their language competencies. Applied to a classroom context, this means that the teacher should not make the assumption that the skills they intend to assess in a task are readily discerned by the student. Following this definition in the context of existing approaches to authentic assessment, we would make the case that when students perceive the relevance of authentic language tasks to their real-life needs, their motivation in learning and skill demonstration in the language will increase. Broadly speaking, when situational authenticity is established, interactional authenticity is more likely to occur. For example, when the task is perceived as an authentic or genuine reflection of substantive and real language use, then the motivation and capacity to evoke the appropriate skills to complete the assessment are likely to be higher. Across other curriculum areas, authentic assessment tasks have been demonstrated to motivate student learning (Stipek, 2002). In their study of authentic assessment in Australian secondary schooling, Cumming and Maxwell (1999) have found that ‘motivational benefits are expected to accrue’ (p. 177) when students are able to perceive that their mastery of the learning goals, as assessed by authentic assessment tasks, is highly likely to transfer readily to life beyond school.
3 Criteria for authentic intellectual quality
Koh has described use of the authentic intellectual quality (AIQ) assessment criteria in multiple contexts (Koh, 2011a, 2011b). In the Singaporean study presented here, we have focused our attention on five of the AIQ criteria that are applicable to the professional development context. The indicators for each of the AIQ criteria were specifically tailored to Chinese as developed by a panel of Chinese language researchers and educators. These criteria have been applied in a previous study with Chinese language teachers (Koh & Gong, 2008) to explore the quality of assessment tasks. In this study the criteria were used to guide the participating teachers in designing authentic assessment tasks for the teaching of Chinese. The five criteria selected were: depth of knowledge, knowledge criticism, knowledge manipulation, sustained writing, and making connections to the real world beyond the classroom.
Under depth of knowledge, there are three types of knowledge sub-categorized: factual knowledge, procedural knowledge, and advanced concepts. Based on the revised edition of Bloom’s knowledge taxonomy (Anderson & Krathwohl, 2001), ability to identify syntactical rules, make meaningful inferences from texts, and read textual clues are indicators of deep knowledge as categorized under the advanced concepts criterion. Higher-order thinking in language learning is measured by two criteria: knowledge criticism and knowledge manipulation. Knowledge criticism is exemplified by tasks that require students to compare and contrast different sources of information and to critique knowledge. Knowledge manipulation tasks require students to organize, analyse, interpret, synthesize and/or evaluate information, apply knowledge and skills, and construct new meaning or knowledge. Following the ‘authentic intellectual work’ framework presented by Newmann and Associates (1996), sustained writing and making connections to the real world beyond the classroom are two of the key criteria for authentic intellectual quality that we find applicable to the teacher development framework of this study. The indicators of each of the criteria for AIQ are presented in Appendix 1. The same AIQ criteria were used to examine the quality of student work in order to ascertain how effectively changes in teachers’ task design elicit shifts in the quality of work that students are able to demonstrate.
III The study context
Singapore is a small city state in Southeast Asia with three major ethnic groups: Chinese (74.7%), Malay (13.6%), and Indian (8.9%). Since 1965, the Singapore government has been adopting a bilingual policy, with English as the first language and a ‘mother tongue’ language as a second language for all students in the education system (e.g. Chinese for ethnic Chinese children, Malay for ethnic Malay children, and Tamil for most ethnic Indian children). As a country with a rich colonial history, Singapore has a complex relationship with western philosophies and systems of education. Whilst seeking to prepare her citizens for sustained economic success in a competitive global marketplace, the country fosters the preservation of cultural values and identities through its bilingual language policy.
With the emergence of China as one of the major global economies over the past three decades, the range of Mandarin Chinese language learners has widened significantly. At one end of the spectrum are learners of Chinese language as native language (e.g. learners from Mainland China and Taiwan) and at the other end are learners of Chinese language as a foreign language (e.g. students from non-Chinese ethnic backgrounds). Somewhere in between are ethnic Chinese in Singapore who are learning Mandarin Chinese as heritage language learners. One defining feature of this group is that they live in an environment where the dominant language for daily communication is typically English yet the learning of Chinese is deemed highly desirable. With the firm establishment of English as the dominant language in Singapore and its ever-gathering momentum as the sole or dominant home language, a majority of ethnic Chinese children enter primary schools with limited knowledge of Chinese.
In order to improve the teaching of Chinese in Singapore, since 2004 the Ministry of Education has undertaken a variety of initiatives, from publishing a new Chinese language curriculum and revising teaching materials, to encouraging the use of innovative teaching methods in the classroom (Ministry of Education Singapore, 2004). Schools also have been encouraged to explore the use of new assessment methods that might contribute to boosting students’ interest in learning Chinese. Authentic assessment is one of the recommended options because of its emphasis on authentic learning experience and the fact that it is seen as more progressive than the conventional assessment practices extant in Singapore classrooms. Despite the vision of a reformed curriculum, pedagogy, and assessment in the Chinese language, a large-scale empirical study of Chinese teachers’ assessment practices in both Singapore primary/elementary and secondary classrooms showed that the majority of the Chinese language assessment tasks were close-ended and for the purpose of assessing discrete linguistic knowledge; that is, rote knowledge and written use of Chinese characters, phrases, and sentence construction (Koh & Gong, 2008). It is against this backdrop that the study reported in this article was conducted.
IV Method
1 Study design
A two-year professional development program was conducted that focused on developing Chinese language teachers’ capacity in the design and use of authentic intellectual tasks. Each school year consisted of four terms of approximately equal length (10 weeks per term). During the first year of the study, baseline data were collected before starting the professional development sessions; that is, during the first two terms (Terms 1 and 2) of the school year while the participating teachers taught Primary 4 (average student age 9 to 10 years). The baseline data included teachers’ assessment tasks and related student work samples prior to the professional development. Professional development activities took place in Terms 3 and 4, with two more phases of professional development occurring during the second school year while the teachers taught the same cohort of students in Primary 5 (average student age 10 to 11 years). After each phase of professional development, teacher-designed assessment tasks and student work samples were collected. These were used to examine the development of teacher’s understanding of quality authentic assessment task design and to see the extent to which the teacher-designed tasks elicit the students’ demonstration of Chinese language competency. Table 1 summarizes the professional development activities and data collection timeline.
Sequencing of professional development and data collection.
2 The professional development program
As a result of the Ministry of Education’s Chinese language curriculum reforms in Singapore, our research team was approached by senior teachers from two different schools on two separate occasions. These teachers were seeking support for themselves and their department members in the implementation of the new curriculum requirements, particularly where higher-order skill development was concerned. One specific aspect of this professional development – concerning authentic assessment task design – is reported in this article. The professional development program that we designed involved four key approaches to teacher learning. In the first approach, the participating teachers were taught the importance of identifying and stating student learning goals or outcomes, the principles and features of authentic assessment and rubric design, and the criteria for AIQ including interpretation of AIQ indicators in Chinese language learning. These theoretical understandings were developed during an intensive two-day professional development session after the baseline data were collected. Second, the teachers were involved in co-designing authentic assessment tasks and rubrics, with the help of assessment specialists and Chinese language content experts. Specifically, this hands-on practice took place over three full days of professional development sessions during terms 3 and 4 of the first year of the study and again in the second half of the second year of the study. During the professional development sessions, the teachers worked collaboratively to identify learning outcomes and determine assessment criteria. They then co-designed authentic assessment tasks to elicit students’ demonstrations of the identified learning outcomes. Rubrics were also developed based on the agreed-upon assessment criteria. The AIQ criteria served as guideposts for designing assessment tasks that placed greater emphasis on higher-order language competence. The participating teachers introduced their newly designed authentic tasks and rubrics into all classrooms in the grade. The third professional learning approach involved monthly professional learning communities (PLC). PLC meetings were held within each school, during which the participating teachers and the researchers discussed any problems arising from the design and implementation of the new authentic assessments. These PLC meetings served as follow-up sessions with the teachers, taking place one afternoon every month. Finally, in the fourth approach to professional learning, the participating teachers were taught how to select student work samples for the moderation sessions that took place during the final phase of the professional development. Prior to the moderation sessions, the teachers were trained on how to analyse the quality of assessment tasks and related student work samples using the criteria for authentic intellectual quality. Throughout the moderation sessions, they scored assessment tasks and student work samples using the criteria for authentic intellectual quality, shared their scores with the group, and then, supported by the researchers, they discussed discrepancies in scoring in order to reach consensus. Such a social moderation practice supports the dependability of teachers’ judgment (Klenowski & Wyatt-Smith, 2010) and allows teachers to identify how well the quality of their task design is reflected in the students’ work.
3 Participants
Twelve teachers of Chinese from two primary schools (five from school A and seven from school B) in Singapore were involved in ongoing, sustained professional development over the course of two school years. The two schools were selected based on specific requests for support made by senior teachers in each school. Both schools were medium ranking neighborhood schools in Singapore; this ranking is based on the Primary School Leaving Examination (PSLE) scores of students. The teachers ranged in age and teaching experience; the least experienced teacher had taught for only two years and the most experienced teacher had well over 10 years of experience. One favorable condition for the professional development was that the teachers had the full support of their principals and heads of departments to embark on the program related to the design and implementation of new forms of assessment. At the start of the program, all of the teachers had minimal experience in designing assessments, although each had attended one or two briefing sessions provided by the Ministry of Education.
4 Measures
To examine the development of Chinese language teachers’ assessment literacy, the artefacts (i.e. assessment tasks and related student work samples) collected throughout the professional development were rated by the teachers at the end of the program. For each assessment task, the participating teachers were asked to submit samples of student work that had gained high, moderate, and low achievement scores to provide a representative range of work from which demonstration of AIQ could be analysed. AIQ scoring of student work was independent of the grading already carried out by the teacher. Hence, the units of analysis comprised teachers’ assessment tasks and student work samples. Both teachers’ assessment tasks and student work samples were blindly rated by members of the research team as well as the teachers, using the criteria for AIQ with 4-point rating scales (ranging from 1 = no requirement/no demonstration to 4 = high requirement/high level demonstrated). To avoid bias in the rating, the participating teachers from school A were asked to score only the assessment tasks and student work samples from their teacher counterparts in school B, and vice versa. Discrepancies in scoring were discussed and jointly resolved by the teachers and researchers.
In both teachers’ assessment tasks and student work samples, the inter-rater reliability of teacher and researcher judgments for each of the AIQ criteria was determined using the percentage of exact agreement and the kappa coefficient. In both the assessment tasks and student work samples, the percentages of exact agreement between raters ranged from 70% to 100%, whereas the kappa coefficients were all above 0.60. Both indices showed satisfactory to good inter-rater reliability.
In short, the baseline data in our study included teachers’ assessment tasks and related student work samples prior to the professional development. In the subsequent phases of the study, we used similar units of analysis (i.e. teachers’ assessment tasks and related student work samples) as a comparison. All of the assessment tasks and student work samples were scored or quantified using the AIQ criteria so that statistical analysis could be applied for a comparison of their quality before, during, and after professional development.
V Results
For both the assessment tasks and student work samples, the mean score differences on the criteria for AIQ between baseline and the different time points of measurement (i.e. different phases of professional development) were analysed using the independent t-test with the p-value set at 0.05. As might be expected, the two years of professional development brought about measurable improvements in the quality of teachers’ assessment tasks and students’ work, as determined by the AIQ criteria. These developments, although not surprising, were not observed for every indicator of quality probed. What follows is an account of general trends and particular exceptions to those trends that were identified in this study.
1 Changes in the quality of teachers’ assessment tasks
Through the process of task design and implementation, teachers gained new experiences that differed considerably from their prior classroom assessment practices. Teachers explained that, before they were involved in the professional development program, they had neither considered the reasons why they assigned certain types of assessment for students nor identified the learning goals associated with the assessment tasks. After participating in the professional development, teachers relayed many accounts of how they now recognized the importance of identifying learning goals for individual assessment tasks. In doing so, they were able to re-align classroom assessment with higher-order instructional outcomes; thus, assessment had become an integral part of lesson planning during the professional development period. A teacher’s capacity to identify and recognize higher-order learning goals is expected to yield a significant improvement in the quality of the assessment tasks designed. Changes in the quality of teachers’ assessment tasks were revealed through a detailed analysis of the changes in the mean scores on the criteria for AIQ. Table 2 shows the mean score differences between baseline and the different phases of professional development.
Changes in the quality of teacher assessment tasks.
Notes. SD = standard deviation; * Signifies that the mean score differences between pre-professional development and each time point of post-professional development are significant (p < .05); values presented in bold are those that are discussed in greater detail in the results and discussion sections of this article.
Over the course of the professional development, and even after its conclusion at the end of the second year, the assessment tasks from the participating teachers tended to place less emphasis on factual knowledge and required less knowledge reproduction by students than prior to the professional development sessions. By the end of the program of professional development there was a statistically significant decrease (p < 0.05) in the mean scores for factual knowledge, knowledge reproduction, and presentation of knowledge as a given AIQ criteria. Two examples of assessment tasks designed and implemented by the participating teachers before and after professional development are used to explain the improvements in the quality of teachers’ assessment tasks.
2 Assessment tasks prior to professional development
Taking one example for the purpose of illustration, Appendix 2 presents a worksheet which is representative of assessment tasks collected at the research baseline (i.e. before professional development) from the participating teachers. The worksheet consists of three sections: fill in the blanks using single characters; complete the sentences; and fill in the blanks using hanyu pinyin (i.e. the recognized romanized spelling system of Chinese). Students were required to choose the correct answer from among a limited set of options. Such a worksheet was almost the default form of assessment used by the teachers to teach and assess students’ mastery of grammatical knowledge (the dominant form of knowledge assessed by the teachers prior to professional development). The scores on the criteria of AIQ are presented in Table 3.
A comparison of the quality of assessment tasks before and after professional development.
Note. All criteria were scored using 4-point rating scales (ranging from 1 = no requirement to 4 = high requirement).
The scores for factual knowledge, presentation of knowledge as a given, and knowledge reproduction are high. Prior to commencing the professional development program, the researchers discussed with teachers the notion that without much opportunity to engage in real-world communication tasks, students are treated as the passive recipients of knowledge. Teachers agreed that their baseline approach was not supporting their desire to promote higher-order learning.
3 Assessment tasks following professional development
Appendix 3 shows an authentic assessment task that was designed and implemented by the participating teachers after they had been working with the principles of assessment and criteria for AIQ for almost a year. This assessment task was designed with the following learning goals in mind: deepening students’ understanding of the text (i.e. ‘Monkeys Brush their Teeth’), raising students’ awareness of language register in actual situations, and training students’ production skills in oral and written language. Considerably different from the more traditional assessment tasks that tended to focus on the repetition of discrete linguistic knowledge fragments, the new assessment task required students to demonstrate their language skills, thinking skills, creativity, real-world problem solving skills, and communication skills.
The AIQ scores of this assessment task were almost the opposite of those for the baseline data. Scores were high in all the AIQ criteria and sub-criteria except factual knowledge, presentation of knowledge as a given, and reproduction of knowledge were low (see Table 3).
4 The effect of professional development on task design
When a selection of task samples – produced collaboratively by the teachers at the various phases of the professional development program – were examined more closely, the data revealed that the improvement in task design did not apply to all elements of assessment task quality, as determined by the AIQ evaluative tool, even when averaged over a number of tasks. For most of the AIQ criteria, the change in the quality of teacher assessment tasks showed an improvement relative to the baseline data but for some sub-criteria, improvement was not well demonstrated. Teachers did not demonstrate a decrease in the requirements for factual knowledge or knowledge reproduction in tasks until Phase II of the professional development. Similarly, an increase in the incorporation of tasks that probe the ability to compare and contrast knowledge was not demonstrated with any statistical significance until Phase III of the professional development. These challenges in reducing the prioritization of recitation of vocabulary and unified rote repetition of syntactical rules are as might be anticipated and reflect the difficulties of moving away from the forms of assessment that we have observed to be prevalent in the Singaporean context (Koh & Gong, 2008).
Although the program aimed to reduce the teacher’s reliance on assessment of procedural knowledge, we found that teachers were challenged in their attempts to reduce factual and procedural knowledge in the same task. One further point of note is the lack of improvement in the organization, interpretation, analysis, synthesis and/or evaluation sub-criterion of task design. This is a broad classification where teachers can incorporate any of these knowledge manipulation requirements into the assessment task but improvement in this area of task quality was not demonstrated within the duration of the two-year professional development program.
5 Changes in the AIQ of student work
We found an encouraging change in the AIQ (in terms of mean scores) demonstrated in student work from baseline to the different phases of professional development. As shown in Table 4, overall the students’ work demonstrated higher levels of AIQ over time. The mean scores on the following criteria and sub-criteria had increased significantly: advanced concepts, critique of knowledge, generation of new knowledge, sustained writing, and making connections to the real world beyond the classroom. In contrast, the mean scores of factual knowledge, presentation of knowledge as a given, and knowledge reproduction decreased over time. The mean score differences between baseline and the different phases of professional development were substantial and statistically significant (p < 0.05).
Changes in the quality of student work.
Notes. SD = standard deviation; * Signifies that the mean score differences between pre-professional development and each time point of post-professional development are significant (p < .05); values presented in bold are those that are discussed in greater detail in the results and discussion sections of this article.
There was a similar pattern of AIQ in teachers’ assessment tasks and the AIQ in student work. This reinforces a key axiom of curriculum and assessment: what we assess is what we get. Taking the sample authentic assessment task in Appendix 3 as an illustration, from the selected student work samples (see Figures 1−3; English translations in the student work samples have been provided by the authors of this article), we have observed some interesting features that are seldom found in more traditional assessment tasks in the teaching of Chinese. Students were able to display not only their abilities to connect their language knowledge in a real-world context (i.e. using Chinese words or characters to introduce and market a dental care product), but also their imagination and creativity, ability to target communication to an audience, and demonstrate higher-order thinking (i.e. understanding and applying a marketing concept and art design to a dental care product). These characteristics appeared in the student work samples largely because the assessment task lent itself to the demonstration of such features. Despite the increased breadth of skills demonstrated by the students, the fall in recitation practices was accompanied by a rise in procedural skills such as repetitious use of the same characters (reflecting a limited vocabulary used with accuracy) and repeated use of the same grammatical rules.

Snoopy toothpaste.

Strawberry and honey toothpaste.

Double chili toothpaste.
6 The effect of professional development on student work
Despite the general trend of improved intellectual quality reflected in student work samples, there were certain AIQ criteria that did not follow this pattern. As seen with the teacher data, improvement in the organization, interpretation, analysis, synthesis and/or evaluation sub-criterion was not illustrated until the final phase of professional development. This trend reflects the challenges that teachers experienced incorporating these assessment criteria into their task designs. Similarly, students were inconsistent in demonstrating their abilities to compare and contrast knowledge, reflecting the difficulties that teachers experienced when designing tasks. Indeed, at the end of Phase I, students demonstrated a significant decrease in their abilities to compare and contrast knowledge compared with baseline data.
In contrast to the teacher data, students were quick to decrease the quality markers associated with regurgitation of information: factual knowledge and knowledge reproduction sub-criteria. In addition, students demonstrated an increase in their use of procedural knowledge, despite the inconsistencies in the teacher’s incorporation of these skills into task designs and teachers’ attempts to decrease their emphasis on this AIQ criterion. However, students struggled to demonstrate their abilities to apply knowledge to novel situations (as assessed by the AIQ knowledge application sub-criterion) even though teachers successfully incorporated these skills during task design. These contrasts in teacher task design and student task completion are explored further below.
VI Discussion
If we are to take as our starting point the notion that the criteria assessed in the AIQ construct are, indeed, desirable skills to develop in Chinese language learners then there is much encouragement to be gained from the results of this study. In general, teachers were able to demonstrate sustained improvement in most aspects of assessment task design by the end of the first phase of professional development, i.e. within a few months of exposure to the AIQ criteria. Anecdotal evidence from informal classroom observations suggests that the authentic assessment tasks, with high intellectual demands, made the learning of Chinese more meaningful and enjoyable for students; the students were able to recognize the connections between what they did or performed in class and real-world applications (Cumming & Maxwell, 1999). In this regard, the improvement of the situational and interactional authenticity of the assessment tasks appeared to set the grounds for a significant improvement in the AIQ of student work.
Despite the reassurance of these general trends, the results also highlight areas where teachers were challenged in their abilities to incorporate quality criteria into the assessment tasks and, accordingly, this was manifest in the students’ inabilities to demonstrate acquisition of these skills. The results illustrate challenges for the teachers to incorporate compare and contrast elements in the design of their assessment tasks. When we reflect on the professional development program, we are satisfied that teachers gained a sufficient number of concrete examples with which to work during our sessions but we wonder if teachers were clear about how to find more examples when they change the task. A limitation of resources that are pitched at the right level for any given task might restrict the teacher’s readiness to challenge the students with compare and contrast tasks. We also suggest that, as many of the examples shared during the professional development sessions were derived from electronic media sources, a teacher’s degree of comfort with various digital media might influence how sources are selected. This, in turn might have a knock on effect for probing the organization, interpretation, analysis, synthesis, and/or evaluation sub-criterion, which relies on the students having access to a range of different source materials. In a future iteration of the professional development, we would incorporate source gathering, as well as source use skills into the program for teachers.
In addition, we found that, in spite of the professional development program’s focus on designing tasks that reduced the emphasis on procedural knowledge, teachers retained a reliance on procedural knowledge within tasks. This may have been done in order to develop a foundation upon which to scaffold advanced concepts, supporting the interactional authenticity of the task for the student. Teachers were able to reduce the emphasis on reciting words and repeatedly reproducing characters (identified as factual knowledge in the AIQ criteria) but seemed to view the clarity of characters and the repetition of grammatical rules (procedural knowledge) as fundamental to good communication in the Chinese language. These foundational skills provide students with entry points into the task and allow teachers to assess certain basic skills that could facilitate a student’s demonstration of higher-order learning. For second language acquisition, students need to practice newly acquired procedural knowledge until they are able to use it readily without being aware of the explicit rule. Fully automated procedural knowledge enables students to lighten their cognitive loads so that they can engage in higher-order thinking (Anderson & Krathwohl, 2001; DeKeyser, 1998; Marzano, 1992). We saw that students were already well versed in using procedural knowledge prior to commencement of the professional development program so we sought to encourage a decrease in the emphasis on procedural knowledge in teacher task design. According to Liaw (2007), higher-order thinking skills promote students’ higher-order learning, which in turn contributes to their higher levels of language proficiency. From the perspective of critical literacy, Luke (2012) contends that technical mastery of written language is a means to acquisition of complex reasoning and cognitive processes. In our study, teachers’ use of procedural knowledge components during assessment tasks varied across the three phases of the professional development indicating a variation in the foundational requirements for access to each task, as might be expected.
Similarly, students demonstrated a marked use of procedural knowledge when completing assignments. Such written expressions allowed students to demonstrate mastery of certain aspects of language acquisition that might be valued by teachers but which had not always been explicitly requested; their efforts were rewarded in the teacher’s positive evaluation of these skills in their work when providing achievement scores.
Finally, we observe that, even though students are given opportunities to demonstrate their abilities to apply their language skills in novel situations, they are unable to do so. This indicates that application skills may be difficult to acquire and may need more explicit classroom attention, outside of the more formal assessment opportunities.
VII Implications
With the emergence of China as a major geopolitical and economic power in recent years, the spectrum of Chinese language learners has widened significantly across the globe. Although this study has focused on the Singaporean context, the findings can provide insights for many of the multi-lingual contexts that are developing across North America, Australia, New Zealand, and the European countries.
This study has shown that, even though teachers were, at first, reluctant to move away from assessments that probed knowledge reproduction and factual knowledge – two skill areas that are intimately linked and demonstrate a more passive approach to knowledge acquisition – students showed no difficulty in letting go of these familiar modes of knowledge demonstration. It is assumed that development of higher-order linguistic skills would in no way hinder foundational skill development but our study suggests that some of the foundational skills are still important to facilitate access to the advanced concepts. We suggest that this is particularly important with young children where both mother-tongue and additional language are in the developmental stages. The extent to which a cumulative prioritization of language skill acquisition might exist is not clearly illustrated in this study but would warrant further investigation.
One concern that arises from students’ generally improved demonstration of advanced concepts is that the formats of the national high stakes examinations for Chinese language have not been adapted to reflect curriculum changes. Thus, students require a certain level of proficiency in skills that can be developed through lower order repetition-based practice and are not heavily challenged to demonstrate their abilities with the use of advanced concepts. As the national examination year loomed for the teachers and students in our study (in the students’ Primary 6 year) we wondered about the extent to which the authentic assessment task design practices would be maintained by our teachers, knowing that the public examinations present very little scope for assessing language development in situationally and interactionally authentic ways.
In our study we have seen how explicit exposure to the components of authentic task design supported teachers in their abilities to design and evaluate assessment tasks. This is particularly important if teachers are to develop a critical stance towards selection of ‘off the shelf’ assessment materials that might require less teacher time in preparation but which may not adequately serve the objectives of a particular teacher for a particular group of students. We saw how developing the teacher’s assessment literacy supported teachers on the track to improving classroom practice and incorporating opportunities for students to rehearse higher-order skills through classroom activities, not just formal assessments. This approach is characteristic of the principles of Wiggins and McTighe’s (1998) process of understanding by design and promotes intentional design of everyday classroom activities (encouraging assessment for learning) and not just the more formal assessments. Even in our North American context, we have seen that many teachers understand the principles of understanding by design but do not necessarily have an explicit understanding of how to design authentic assessment tasks (Volante & Fazio, 2007). Where assessment is concerned, educators rarely focus on the development of teachers’ learning, tending to focus on how assessment can support student learning but we have seen that if the tasks are not well designed, students are not able to use them to develop and demonstrate their learning.
The findings of this study shed light on the major potential for developing Chinese language teachers’ assessment literacy, with an eye toward building their capacity in the design and use of authentic language assessments that are well aligned with curricular reforms. Although teachers in our study were eager to develop their assessment literacy, we have seen that certain aspects of authentic assessment are difficult to design and some well-established assessment practices are difficult to relinquish. Given the growing interest in learning Chinese as a second, heritage or foreign language in schools around the globe, developing Chinese language teachers’ abilities in designing and using authentic assessments that have high intellectual demands and real-world values is fundamentally important for enhancing the quality of student learning experiences.
Footnotes
Appendix 1
Indicators of criteria for authentic intellectual quality in Chinese (translated from Chinese).
| Criteria | Indicators |
|---|---|
| Depth of knowledge: | |
| Factual knowledge | Reciting words/characters, vocabularies; Finding information from the texts |
| Procedural knowledge | Repeating the writing of characters and grammatical rules |
| Advanced concepts | Understanding the use of lexicon, syntactical rules, textual clues, or genres in different contexts; Making meaningful inferences from the texts; Linking language use with themes of economic, social, or cultural issues |
| Knowledge criticism: | |
| Presentation of knowledge as a given | Selecting correct answers from the options given; Spelling |
| Comparing and contrasting knowledge | Finding out new information from different sources, such as mass media, internet, books |
| Critique of knowledge | Reading a text with comprehension from a new perspective other than the teacher’s or the textbook author’s |
| Knowledge manipulation: | |
| Reproduction of knowledge | Reciting lexicon and syntactical rules; Reciting a poem; Chorus reading |
| Organization, interpretation, analysis, synthesis, and/or evaluation | Analysing different syntactic and genre structures by using appropriate language, such as noun, adjective, narrative, descriptive, or expositive |
| Knowledge application/problem-solving | Learning how to speak and write for real- world contexts; Using newly learned lexicon, sentence patterns, genre structures to discuss or write topics or problems in real-life situations |
| Generation of new knowledge | Writing a composition following a specific genre structure; Creating graphical representations, poems, narratives to present new ideas |
| Other: | |
| Sustained writing | Writing an extended piece of work |
| Making connections to the real world beyond the classroom | Discussing issues or problems that are relevant to students’ real-life circumstances |
Appendix 2
Appendix 3
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Education Singapore, grant number CRP25 05KKH.
