Test Takers’ Beliefs and Experiences of a High-stakes Computer-based English Listening and Speaking Test

Abstract

Test takers’ beliefs or experiences have been overlooked in most validation studies in language education. Meanwhile, a mutual exclusion has been observed in the literature, with little or no dialogue between validation studies and studies concerning the uses and consequences of testing. To help fill these research gaps, a group of Senior III students in Guangdong Province, mainland China, were interviewed concerning their views of the high-stakes Computer-based English Listening and Speaking Test (CELST) and their experiences of preparing for and taking the test. The data analysis indicated that the students had a distinct understanding of the CELST validity and also tentatively suggested a relationship between the students’ views of the CELST design, their test preparation practice and their test taking process. These findings provided information useful for sharpening a computer-based English listening and speaking test and for generating positive washback on English learning.

Keywords

Test taker perspectives washback test validity high-stakes test computer-based English listening and speaking test

Introduction

In recent years, language test designers have applied advances in computer technology to develop computer-based tests. These types of tests can measure English communicative skills with great accuracy and high efficiency (Choi et al., 2003). The rise of computer-based language tests has caught the attention of researchers and has raised familiar questions about validity. In general, the advantages of computer-based language tests for ensuring test validity do not seem evident when compared with other modes of testing (Qian, 2009). As Huff and Sireci (2001) have claimed, more research needs to be conducted to demonstrate the great potential of computer-based tests to enhance validity, because the technological innovations involved might not guarantee test validity.

The validity of tests is always a public concern, no matter whether a test is paper-and-pencil-based or computer-based. As researchers seek ways to verify test validity, the perceptions of test takers could provide researchers with important evidence (Michaelides, 2014). However, test takers’ opinions or experiences have usually been ignored in validation inquiries (Cheng, 2008; Smyth and Banks, 2012). Meanwhile, a mutual exclusion has been observed in the literature, with little or no dialogue between validation studies and studies concerning the uses and consequences of testing (Bachman, 2005).

In order to address the aforementioned research gaps, this study attempted to qualitatively explore the views and experiences of a group of Senior III students who took the high-stakes Computer-based English Listening and Speaking Test (CELST) in Guangdong Province, mainland China. Their beliefs and experiences provided information useful for sharpening a computer-based English listening and speaking test and for generating positive washback on English learning.

Test Validation and Test Takers’ Perspectives

Messick (1989: 14) defined test validity as ‘an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interferences and actions based on test scores or other modes of assessment’. This view on validity integrates consideration for both the validity of interpretations and the consequences of test use. Such a view has been widely accepted by language testing researchers and practitioners (Bachman, 2000).

Psychometric approaches are often used to obtain evidence for test validation (Cheng et al., 2007), and such validation evidence appears to come mainly from the test development community (Bachman, 2007). However, what test designers think they are measuring may be different from what test takers think the test is testing (Cheng et al., 2007; Wijgh, 1996). Test takers may correctly answer a question through making a correct hypothesis, a wrong one, or by wild guessing. Thus, understanding the test takers’ test taking process and strategies via verbal reports can have implications for refining a test instrument and making it more valid (Storey, 1997). Cheng, Fox and Zheng (2007) elicited the accounts of 16 native and non-native English learners concerning their experiences of taking the Ontario Secondary School Literacy Test. These participants provided valuable perceptions on the test content, task properties and the process of taking the test. Their perceptions helped to justify the interpretation of test scores. Qian (2009) evaluated the face validity of direct (face-to-face) and semi-direct (person-to-machine) testing modes by conducting a survey on Hong Kong university students immediately after they completed direct and semi-direct speaking tests. He found that compared with direct testing, semi-direct testing appeared to lack sufficient predictive validity due to its inability to allow real life communication between examinees and examiners during the test. However, the above-mentioned validation studies have not seriously investigated the test consequences.

Many assessment experts have argued that the individual and social consequences of test use should be considered as crucial components in the evaluation of test validity (e.g. Linn, 1998; Messick, 1989, 1996). Washback (i.e. the influences of a test on teaching and learning) is one form of consequences that should be considered in evaluating test validity (Messick, 1996). The previous studies of washback have focused on the effects of examinations on teaching rather than their effects on learning (Spratt, 2005).

The scarcity of washback research on learning has activated several relevant empirical studies, among which test preparation activities have become the major concern of researchers. For example, Mickan and Motteram (2009) reported that most of their informants preferred to use published tests as their examination preparation materials, and they admitted to a lack of strategies in preparing for the IELTS. Zhan and Andrews (2014) found that in the examination preparation period, the participants tended to change what they learned instead of how they learned. They adopted their usual ways of doing past test papers to prepare for the coming examination. Xie and Andrews (2013) explored how test design and uses of exams influenced test preparation. They found that the test takers’ beliefs about the test design influenced their choices of preparation strategies. However, most of the previous washback studies on test-preparation have not attempted to specifically address the question of test validity.

The existing validation studies in language assessment research have tended to neglect questions of test uses and consequences. In contrast, research on the uses and consequences of tests usually neglects to examine test validity (Bachman, 2005). With an awareness of this research gap, Cheng and Deluca (2011) explored the perspectives of 59 test takers who had once taken various large-scale English language tests in China. This study involved analysing the students’ journals, which detailed their test-taking experiences. The study showed the test takers’ understandings of the constructs, processes or uses of the tests, and suggested the interconnectedness between these elements. Michaelides (2014) evaluated the validity of the secondary school graduation and university-entrance national examinations in Cyprus. He interviewed first- and second-year college students’ about their perspectives on the fairness and appropriateness of the examination system and its effects on high school education. Michaelides was able to document the negative effects of the examination system on the schooling experience including an increased focus on private tutorial classes and a narrowing of the curriculum and of pedagogy. These studies have contributed to our understanding of the role of test takers in the ongoing process of test validation. However, as Cheng and Deluca (2011) have claimed, further studies must involve participants who are taking the same test, and the researchers must collect data through a more appropriate and adequate method applied immediately after or during the test. These suggestions have formed the basis for the design of this study.

Background of the Study

In 2004, the National Matriculation English Test (NMET) was allowed to be locally developed in seven provinces and four municipalities of China, within the broad guidelines of the NMET testing syllabus (Liu, 2010). Guangdong Province is one of these provinces, and has developed its own version of the NMET. Beginning in 2004, the Computer Oral English Test (COET) was introduced into the NMET for Guangdong students who aimed to specialize in foreign languages or related majors such as international trade and business in higher education (Zeng, 2010). In 2011, the CELST developed from the COET was established for all Guangdong secondary school graduates who intended to pursue higher education in China (Education Examinations Authority of Guangdong Province, 2011).

Senior III students attend the CELST test in language laboratories during March every year. The students are required to provide responses to the test stimuli, and their responses are recorded for computer-based double marking by trained raters. The weight of the CELST accounts for 10% of the NMET score. For the first time, speaking skills, which are more difficult and complicated to assess than any other language skills, have been tested on a large scale through a computer-based test.

The CELST has three parts that focus on various domains of listening and speaking abilities (Education Examinations Authority of Guangdong Province, 2014). Part 1 requires students to watch a one-minute video clip with subtitles on the screen. Subsequently, the students are instructed to practise reading the lines aloud for one minute. Then, they read aloud the subtitles displayed on the screen without hearing the speaker in the video clip. Part 2 is a role playing exercise, in which students are instructed to watch a two-minute video clip and then play an assigned role to communicate with the computer, which assumes another role. First, the students are required to watch part of the video clip to gain certain preliminary information, and they ask three questions with Chinese prompts. Twenty seconds of preparation time are allowed for composing each question. After the students propose their questions, they watch the rest of the video clip and take notes. Finally, they are asked to answer five questions to show their listening comprehension. Part 3 is a story retelling subtest, in which the students listen twice to a two-minute narration, and are then given one minute to prepare before retelling the story in their own English words. Table 1 presents the CELST in terms of test purposes, test content, testing time, and the weight of each subtest.

Table 1.

The Structure of CELST.

Subtest	Test Purposes	Test Format	No. of Test Item	Full Marks	Testing Time
Part I	Pronunciation and intonation	Reading aloud	1	20 points	About 30 minutes
Part II	Abilities of receiving and comprehending information; and abilities of asking and answering interactive questions	Role playing	1	16 points
Part III	Comprehensive abilities of listening and speaking	Story retelling	1	24 points
Total			3	60 points

Incorporating the CELST into the NMET was expected to enable a more comprehensive evaluation of the English communication abilities of secondary graduates, and to support the improvement of secondary-level English education (Liu, 2011). However, the public has been concerned about the abrupt introduction of the CELST, due to insufficient evidence of the test’s validity and the lack of relevant studies to support this innovation (Qi, 2010). In this situation, it has become critical to ‘assure that what we count counts’ (Bachman, 2000: 1).

The chief designer of the CELST, Zeng (2010) reported on the construct validation of the COET (the predecessor of the CELST) by soliciting expert judgments. Zeng found that although the COET involved unidirectional interaction, the communicative abilities of students could still be assessed to a high degree. However, Zeng’s study only represented the perspectives of experts in language testing and teaching concerning the validity of the COET. At this point, there is a need for further validation research on the CELST, which also involves assessment of listening ability, and has become a requirement for each secondary school graduate who takes the NMET in Guangdong Province.

Research Questions

Three linked research questions were asked to explore the validity of the CELST through the perspectives of student test takers.

What do students think of the design of the CELST?

How do students prepare for the CELST?

How do students deal with the CELST?

Methodology

Informants

Due to the high pressure to pass the National College Entrance Examination in Senior III (Kirkpatrick and Zang, 2011), it was difficult to invite a large group of students to participate in this study. A high school in Guangzhou (the capital of Guangdong Province) was selected as the research site of the study due to its accessibility and the established partnership of the first author (Zhan) with the school staff. The head teachers of Senior III classes were approached and invited to recommend their students to participate in this study. The potential participants had various levels of English speaking and listening abilities evaluated by their teachers. On a voluntary basis, a total of 20 students gave their informed consent. Table 2 summarizes the demographic characteristics of these 20 participants.

Table 2.

Biographic Details of Informants.

Gender	Male	12
Gender	Female	8
Age	17 years-old	2
	18 years-old	15
	19 years-old	3
Source	Arts	11
Source	Science	9
Speaking and Listening Ability (Evaluated by their Teachers)	High	6
	Medium	7
	Low	7

Data Collection

Individual semi-structured interviews were adopted to collect data on the participants’ beliefs, test preparation and test taking experiences immediately after they took the CELST. An interview protocol was pre-developed to provide a common direction for the interviews. The interview protocol addressed three major themes that were derived from the three research questions. Table 3 illustrates the examples of interview questions regarding Part 1 of the CELST, as classified by theme.

Table 3.

Examples of Interview Questions Regarding Part 1 of CELST Classified by Theme.

Views of the CESLT	a) What linguistic skills and knowledge do you need to do in Part 1?
	b) Is this part difficult for you? Why?
	c) Do you think the test format of Part 1 appropriate or not?
Test preparation	a) What materials do you prefer in your preparation for Part 1?
	b) What methods do you use to prepare for Part 1?
Test taking process	a) What strategies do you employ in handling Part 1?
	b) What difficulties do you meet in doing Part 1?

To encourage the participants to freely express what they thought, the questions were adjusted depending on the actual progress of the interview and were followed by elicitation of the reasons underlying students’ beliefs and experiences when necessary. The interviews were conducted and recorded in Chinese. The duration of the interviews varied from individual to individual. The longest interview lasted for 32 minutes, and the shortest lasted for 26 minutes.

Data Analysis

The content analysis approach was adopted to analyse the transcribed interview data by a standardized thematic coding process (Patton, 2002). A number of initial codes were generated by literally reading each line of the interview transcripts (e.g. the codes of timing, item construction, performing mock tests and reading aloud). The category system was then developed through dimensionalizing or by axial coding (Strauss and Corbin, 1990). All of the initial codes were integrated into broader categories. For example, the codes such as doing past or mock test papers, doing translation exercises, writing summaries of short stories, reading aloud, imitating others’ pronunciation and intonation as well as reciting of sentences and story templates were subsumed into the broader category of test preparation methods. The categories were further condensed into a high level of conceptualization through many rounds of repeated moves of the raw data, the temporary categories, guiding research questions and literature on validity. As a result, three main themes emerged, namely the students’ views of the CELST design, test preparation practice and test taking process.

Findings

Students’ Views of the CELST Design

The data analysis revealed that the students’ views of the CELST could be categorized into four aspects, namely the test weight, test difficulty, test content and test format.

Test Weight and Test Difficulty

Almost all of the participants welcomed the CELST, as they believed that this test would measure their English listening and speaking abilities, and that demonstrating these abilities would be crucial for their future lives and careers. However, 90% of the participants believed that the weight given to the CELST in the NMET was not heavy enough to motivate them or their teachers to invest a long period of time in preparation for the test in class. Meanwhile, 65% of the participants believed that the difficulty of the CELST was comparatively low. The perceived low test difficulty appeared to reduce the perceived importance of the test, thus decreasing the intensity of their test preparation both in and out of class. The following extract illustrates this popular view on the weight and difficulty of the CELST.

The weight of speaking and listening test in the NMET is light. I hope its weight would be increased, because the test is important and practical. Moreover, I believe that the test is not difficult, and our teacher told us that the differentiation in the test results was too little. Thus, we were not strongly motivated to prepare for the test, and our teacher did not spend much time and energy in helping us to prepare for it either.

Test Content

Most of the participants acknowledged that the CELST measured their listening and speaking abilities to some extent. In particular, the participants admitted that Part 1 evaluated their pronunciation and intonation, and that Part 2 assessed their listening ability for understanding the main ideas, factual information and the speakers’ intentions. Moreover, the participants believed that Part 3 examined their ability to receive information and to express ideas. The participants’ perceptions of the test content partly reflected the requirements of the CELST syllabus.

Nevertheless, a majority of the participants mentioned the skills required for the test that the test designers did not intentionally include. Seventy per cent of the participants believed that translation, summary-writing and shorthand skills were necessary skills that the test measured. For example, a student claimed that:

I believe that the ability to translate from Chinese into English is quite crucial in answering the questions of Part 2. In Part 2, Chinese sentences as prompts were provided. We just need to translate these Chinese sentences into English to propose our questions without attempting to understand what the speaker said.

One conclusion from the above extract is that as far as the student was concerned, the required listening ability was unnecessary but translation skills were crucial to deal with the test questions.

In addition, some aspects of communicative competence, such as pragmatic and strategic competencies which Zeng’s study (2010) claimed had been measured by the COET were not acknowledged as test content by 80% of participants. The perceived inauthenticity of the communicative test tasks, especially in the ‘role playing’ part, might explain why the participants held such views, as stated below.

When I played the role in Part 2, I did not think that the task was as authentic as daily conversations. The questions I had to ask were already provided in Chinese. Hence, I could already ask the questions even without listening to the speaker. When I answered the questions raised by the speaker, I just provided some factual information obtained from my listening, and did not need to add my own ideas. This is not a real conversation.

Test Format

Over a third of the participants pointed out the inappropriateness of the test format in the CELST. Interestingly, most of these students had been evaluated by their teachers as less able learners. Some of these students thought that the test format in the ‘reading aloud’ part did not present the subtitles in a consistent manner, which affected their performance, as explained in the following comment.

When we prepared for the “reading aloud” part, we were just allowed to read the subtitles without watching the video. However, when we officially did this part, there were video pictures. I was distracted by these pictures as I read the subtitles.

Other students said that they were disturbed by the scenes and characters in the video clip when doing the ‘role playing’ part. They found that it was really difficult for them to concentrate on listening. For example, a student claimed that: ‘The video clip in Part 2 often distracted me because funny scenes and beautiful women appeared’.

Student Test Preparation Practices

The participants reported on how they prepared for the CELST, by focusing on their preparation materials and methods. On the whole, the participants’ preparations for the CELST were examination oriented, with the primary purpose of gaining high marks in the test.

Test Preparation Materials

The preparation materials most frequently used by the participants were past and mock test papers. The participants also favoured teacher-designed translation exercise books and commercial reference books on test preparation. In addition, the participants used their textbooks and self-contained CDs to practise their pronunciation and listening skills.

Although a few participants expressed their intentions to use authentic English materials to practise their speaking and listening skills, they failed to use these materials in their test preparation. The following extract illustrates this experience.

I also wanted to listen to English radio programmes like BBC and VOA to practise my listening skills and pronunciation. You know, some talk show programmes are interesting too. In fact, I did not do that when I prepared for the test. My time was occupied by all kinds of assignments issued by my teacher. My teacher told us that doing past papers was the shortcut to gain high marks in the examination. I had to follow my teacher’s instruction to prepare for the examination.

The extract quoted above shows that the student appeared not to have autonomy in deciding what materials he would use for the test preparation, and his teacher was in charge of his exam preparation.

Test Preparation Methods

The participants used several methods to prepare for the test. Some of these methods, such as doing past test papers and examination-like exercises or rote memorization could be regarded as general washback (Watanabe, 2004). Similar washback appeared to apply for any other English tests that the participants took, such as the NMET. For example, when a student mentioned her way of preparing for the CELST, she used the phrase ‘nothing special’ to describe her method of doing past exam papers.

Other methods such as reading aloud, imitation and writing summaries might be regarded as specific washback (Watanabe, 2004) from the CELST. The following extract shows the specificity of the methods of reading aloud and imitation used in a student’s test preparation.

My pronunciation is not good. When I prepared for the test, I realised that pronunciation was one of the tested aspects. Therefore, I read the textbook and vocabulary list aloud in the morning and imitated my teachers’ pronunciation and intonation when she led us to read in class. I have never had such an experience before.

From the above extract, it can be concluded that the student’s adoption of the methods of reading aloud and imitation was closely related to the CELST. This specific washback appeared to be linked with his understanding of what would be tested in the test.

Student Test Taking Process

The participants described how they tackled each part of the CELST in real testing circumstances. Their reports centred on three aspects, namely test taking strategies, time constraints and physical testing conditions.

Test Taking Strategies

The most interesting finding regarding the participants’ test taking strategies was that the majority of the participants tended to count on compensatory coping strategies, such as translating, using the storytelling template, taking notes and writing as they processed the CELST tasks. For instance, a participant explained that he approached the story retelling test task in the following way.

First, I listened to the story very carefully. While I was listening, I took brief notes regarding the five Ws [i.e., When, Who, Where, What, How, author’s addition] in Chinese. Then I quickly put the information into the story template that I had recited before. I translated the Chinese into English in my mind. I also wrote down two difficult sentences completely, in case I would say those sentences wrongly in retelling.

When asked why she handled with the story-retelling task in such a manner, the student told the researcher that, ‘The reason is very simple. When I did past papers and mock examinations before the real test, I used such strategies, which seemed to work for me’.

Furthermore, the participants’ beliefs about what the CELST measured appeared to influence their choices of compensatory test-taking strategies. In the previous section of students’ views of the CELST, a student’s view on what was tested in the ‘role playing’ part was quoted. His beliefs about translation skills as test content suggested the way he did the first section of the ‘role playing’ part. He further explained as follows.

I actually did not pay much attention to the video clip. While the video was playing, I quickly looked through the Chinese prompts. Then I translated these sentences from Chinese into English, and wrote down my translation. When the preparation time was up, I read out the translated sentences to the computer.

Time Constraints

Eighty-five per cent of the participants complained about the time constraints enforced in the ‘story retelling’ part, which they believed had an effect on their performance. The following extract illustrates this concern.

Sixty seconds are quite short. When I did Part 3, I was very nervous because I thought I could not organise all the information into a story within the short time limit. Sometimes, I was stuck in some parts, which worsened my preparation. If I had more time, I would perform better.

Physical Testing Conditions

Three quarters of the participants indicated their concerns about the physical testing conditions. Some participants even believed that the physical conditions of the test setting affected their test performance. Some of them complained that the sound insulation was not effective as they performed the test. The following extract explains this point.

The examination room where I took the test was noisy. Despite wearing my earphones, I could still hear the voice of my classmate sitting beside me. I am sensitive to my surroundings. I was irritated by the surrounding noise as I took the test. This resulted in my failure in the test.

Some of the participants felt that the noisy testing environment could also enable cheating on the examination, which might undermine the validity of the test scores. For example, a student claimed that, ‘Actually, cheating was easy to do during the test. If a good student sat in front of you, you could find yourself lucky and hear clearly what he/she said’. In addition, the technical breakdowns of computers were believed to increase students’ anxiety levels. A student reported that his computer crashed while he was taking the test. The incident affected him emotionally, and he said, ‘I found it extremely difficult to focus on the test, even after I was reassigned to another computer’.

Discussion

Student test takers are the crucial stakeholders in large scale, high-stakes examinations. This study collected validation evidence on the CELST from the test takers’ perspectives. The data analysis revealed that they had a distinct understanding of the test validity. The students’ responses also tentatively suggested a close relationship between their perceptions of the test design and their practice of test preparation and test taking.

The participants reported their perceptions of the constructs tested in the CELST, which differed from those of test designers to some extent. Zeng (2010) claimed that the COET (the predecessor of CELST) was largely focused on measuring the students’ communicative competence. However, the participants believed that their pragmatic and strategic competencies in communication were not measured by the CELST. This view might have resulted partly from the students’ perception that the communicative tasks in the CELST were inauthentic. This finding is consistent with Qian’s (2009) claim on students’ acknowledgment of the inadequacy of a person-to-computer testing mode for evaluating real-life communication skills. The participants also claimed that the CELST successfully measured some abilities, such as translation, writing summaries and shorthand. However, these skills were unintended by the test designers. Such findings demonstrate the differences in the views on what a test is testing between test takers and test designers (Cheng et al., 2007; Wijgh, 1996).

Although the participants welcomed the CELST as a test innovation, they believed that the weight and difficulty of CELST needed to be enhanced to increase the importance and fairness of the test. The participants’ inferences concerning the test’s weight indicated a low status of the CELST within the NMET, which apparently caused a focus on short-term examination-oriented preparation. In addition, the majority of the participants believed that the CELST was not difficult, which in turn caused them to think that the test was not worth the expenditure of additional time and energy.

In this study, the students reported that they crammed for the examination by doing past test papers, test-like exercises and by performing rote-recitation. This practice represents a narrowing of the curriculum to a large extent. Similar test-like preparations and narrowing of the curriculum have also been reported in previous studies on washback on learning (e.g. Michaelides, 2014; Mickan and Motteram, 2009; Xie, 2013; Zhan and Andrews, 2014).

The negative washback effects of the CELST on the participants’ learning of English were probably caused by the students’ perceptions of the design of the CELST. A tentative relationship between what the students perceived to be measured by the CELST and the way they prepared for the test was indicated through comparisons between their perceptions of the CELST design and their test-preparation practices. For instance, the students believed that the CELST measured their translation skills in Parts 2 and 3. Accordingly, they reported that they did translation exercises and recited translated sentences in their test preparation. The observed relationship between the students’ perceptions of the test design and their test preparations was echoed in the washback study of Xie and Andrews (2013).

In addition, the participants clearly indicated their teachers’ tight control over their test preparation that seemingly mediated washback effects of the CELST on learning . The teachers’ examination-specific coaching and instruction were reflected in the students’ preparations for the examination. This finding supports the claim that ‘teacher variable is one of the most important factors as far as the presence of washback is concerned’ (Gosa, 2004: 227).

The participants reported their compensatory test taking strategies, such as translating, using the storytelling template, taking notes and writing summaries as they took the test. The study by Cheng et al. (2007) also revealed the L2 learners’ preferences for using compensatory test taking strategies in language testing. However, these compensatory test taking strategies might undermine the interpretation of test scores.

This study also tentatively suggested a relationship between test preparation and the test taking process, and a relationship between test taking process and the students’ perceptions of the test structure and content. For example, the way in which the participants dealt with past or mock test papers in their preparation could influence their test taking strategies in the real test. Another example is the reliance of some participants on the translation skills which they believed would be measured in the test’s ‘role playing’ part. This might have influenced their practice of translating Chinese prompts quickly without listening carefully to the speaker in the video clip during the real test. Similar tentative relationships have also been identified in prior studies (e.g. Cheng and Deluca, 2011; Cheng et al., 2007).

Conclusions and Implications

The study has contributed to understanding the relevant insights of test takers regarding the complexity of validating computer-based language tests, which represent a global trend in assessing the speaking and listening proficiencies of students. These findings provided information useful for sharpening a computer-based English listening and speaking test and for generating positive washback on English learning.

In this study, the participants perceived that the communicative tasks in the CELST were inauthentic, which might cause them to rely on their native language and overuse the compensatory testing taking strategies as they took the test. The test designers should include more communicative tasks such as interviews and group discussions, which require the use of communicative strategies for a smooth and appropriate execution. Such an interactive focus in testing would exert positive washback effects on English communicative teaching and learning. Moreover, this study found that the lightweight and low level of difficulty of the test tended to demotivate the students from learning for the test, which decreased the washback intensity. Thus, the weight and difficulty level of CELST should be enhanced to stimulate a more authentic and interactive style of learning.

This study further disclosed that teachers played crucial roles in predicting the types and degrees of washback effects that the CELST had on the participants. Examination oriented preparation practices, such as using previous test papers, can help to increase the students’ test scores, but such practices do not help the students in the long run (Xie, 2013). Such preparations are not good for the development of real language skills. Thus, this study suggests that teachers must discontinue their examination oriented training mode. In class, the teachers must design more authentic communicative tasks, teach language communication skills, and provide authentic opportunities for the students to experience and understand the English language.

The students in this study indicated positive attitudes toward the CELST and showed their desire to improve their speaking and listening proficiency. Hence, teachers must promote a convergence between the short-term goals (achieving high test scores) and the long-term goals (improving language proficiency) in the process of helping students prepare for the test.

This study was a small-scale study, involving only 20 senior students in a high school in China. Thus, generalizing the findings of this study must be done with caution. To validate the CELST, a large-scale study on the perspectives of student test takers should be conducted, based on the findings of this study. Moreover, this study’s findings reveal only the test takers’ accounts of their test-relevant beliefs and experiences. A variety of variables, such as individual, social and peer factors could have influenced their interpretation of the test’s validity (Michaelides, 2014). Thus, the subjectivity of the findings was inevitable. Further validation studies must involve various stakeholders to avoid any bias in evaluating test inferences and consequences (Moss et al., 2006). In addition, the relationships between the students’ views of the CELST design, their test preparation practices and their test taking process have been only tentatively suggested in this study. More empirical evidence collected through a rigorous research design is needed. Nevertheless, this study has featured the perspectives of student test takers in a validation inquiry. The voices of student test takers should be heard and not dismissed in the process of continuous test validation.

Footnotes

Funding

This work was supported by Guangdong University of Foreign Studies, China under Grant Ref. No. 12S3.

References

Bachman

(2000) Modern language testing at the turn of the century: assuring that what we count counts. Language Testing 17(1): 1–42.

Bachman

(2005) Building and supporting a case for test use. Language Assessment Quarterly 2: 1–34.

Bachman

(2007) What is construct? The dialectic abilities and contexts in defining constructs in language assessment. In: Fox

Wesche

Bayliss

Cheng

Turner

Doe

(eds) Language Testing Reconsidered. Ottawa: University of Ottawa, 41–72.

Cheng

(2008) Washback, impact and consequences. In: Shohamy

Hornberger

(eds) Encyclopedia of Language and Education, Vol. 7: Language Testing and Assessment. New York: Springer, 349–64.

Cheng

DeLuca

(2011) Voices from test-takers: further evidence for language assessment validation and use. Educational Assessment 16(2): 104–22.

Cheng

Fox

Zheng

(2007) Student accounts of the Ontario Secondary School Literacy Test: a case for validation. Canadian Modern Language Review 64(1): 69–98.

Choi

Kim

Boo

(2003) Comparability of a paper-based language test and a computer-based language test. Language Testing 20(3): 295–320.

Education Examinations Authority of Guangdong Province (EEAGP) (2011) Reform on national matriculation examination in Guangdong Province. Available at: http://gaokao.chsi.com.cn/gkxx/ss/200811/20081106/9745433-1.html

Education Examinations Authority of Guangdong Province (EEAGP) (2014) 2014 syllabus of computer-based listening and speaking English test in NMET in Guangdong Province. Available at http://www.5184.com/gk/news/201402/201402261393404560500.html

10.

Gosa

CMC

(2004) Investigating washback: a case study using student diaries. PhD dissertation, University of Lancaster, Lancaster.

11.

Huff

Sireci

(2001) Validity issues in computer-based testing. Educational Measurement: Issues and Practice 20(3): 16–25.

12.

Kirkpatrick

Zang

(2011) The negative influences of exam-oriented education on Chinese high school students: backwash from classroom to child. Language Testing in Asia 1(3): 36–45.

13.

Linn

(1998) Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice 17(2): 28–30.

14.

Liu

(2010) The national education examinations authority and its English language tests. In: Cheng

Curtis

(eds) English Language Assessment and the Chinese Learner. New York: Routledge, 29–43.

15.

Liu

(2011) Washback effects from Guangdong Computerized Listening and Speaking Test in NMET on students’ attitudes, strategies and motivation. Educational Measurement and Evaluation 6: 48–64.

16.

Messick

(1989) Validity. In: Linn

(ed.) Educational Measurement. New York: Macmillan, 13–103.

17.

Messick

(1996) Validity and washback in language testing. ETS Research Report Series 1: i–18.

18.

Michaelides

(2014) Validity considerations ensuring from examinees’ perceptions about high-stakes national examination in Cyprus. Assessment in Education: Principles, Policy & Practice 21(4): 427–41.

19.

Mickan

Motteram

(2009) The preparation practices of IELTS candidates: case studies. LELTS Research Report 10: 223–62.

20.

Moss

Girard

Haniford

(2006) Validity in educational assessment. Review of Research in Education 30:109–62.

21.

Patton

(2002) Qualitative Research and Evaluation Methods. Thousand Oaks, CA: Sage Publications.

22.

Qian

(2009) Comparing direct and semi-direct modes for speaking assessment: affective effects on test takers. Language Assessment Quarterly 6(2): 113–25.

23.

(2010) Should proofreading go? Examining the selection function and washback of the proofreading sub-test in the National Matriculation English Test. In: Cheng

Curtis

(eds) English Language Assessment and the Chinese Learner. New York: Routledge, 219–33.

24.

Smyth

Banks

(2012) High stakes testing and student perspectives on teaching and learning in the Republic of Ireland. Educational Assessment, Evaluation and Accountability 24(4): 283–306.

25.

Spratt

(2005) Washback and the classroom: the implications for teaching and learning of studies of washback from exams. Language Teaching Research 9(1): 5–29.

26.

Strauss

Corbin

(1990) Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park, CA: Sage Publications.

27.

Storey

(1997) Examining the test-taking process: a cognitive perspective on the discourse cloze test. Language Testing 14(2): 214–31.

28.

Watanabe

(2004) Methodology in washback studies. In: Cheng

Watanabe

Curtis

(eds) Washback in Language Testing: Research Contents and Methods. Mahwah, NJ: Lawrence Erlbaum Associates, 19–36.

29.

Wijgh

(1996) A communicative test in analysis: strategies in reading authentic texts. In: Cumming

Berwick

(eds) Validation in Language Testing. Clevedon: Multilingual Matters, 154–65.

30.

Xie

(2013) Does Test Preparation Work? Implications for Score Validity. Language Assessment Quarterly 10(2):196–218.

31.

Xie

Andrews

(2013) Do test design and uses influence test preparation? Testing a model of washback with structural equation modeling. Language Testing 30(1): 49–70.

32.

Zeng

(2010) The computerized oral English test of the national matriculation English test. In: Cheng

Curtis

(eds), English Language Assessment and the Chinese Learner. New York: Routledge, 234–47.

33.

Zhan

Andrews

(2014) Washback effects from a high-stakes examination on out-of-class English learning: insights from possible self theories. Assessment in Education: Principles, Policy & Practice 21(1): 71–89.