Topical knowledge and ESL writing

Abstract

This study investigates the effects of topical knowledge on ESL (English as a Second Language) writing performance in the English Language Proficiency Index (LPI), a standardized English proficiency test used by many post-secondary institutions in western Canada. The participants were 50 students with different levels of English proficiency (basic, intermediate, and advanced) attending a Canadian college. Each student wrote two timed-impromptu essays: one responding to a prompt requiring general knowledge about university studies and the other pertaining to specific knowledge about federal politics. Results showed that students across three proficiency levels performed significantly better on the general topic than they did on the specific topic. The specific topic produced lower scores on content due to poor quality and development of ideas, implicit position taking, and a weak conclusion. Students also scored lower on organization and language on the knowledge-specific task because of weaker coherence and cohesion, shorter essays, more language errors, and less frequent use of academic words. Post-test interviews confirmed that participating students were challenged by the prompt that required specific topical knowledge. The study draws attention to the importance of developing appropriate prompts for ESL writing tests.

Keywords

English as a Second Language language proficiency test second language writing topical knowledge writing prompts

Introduction

Among variables that potentially influence second language (L2) writing, topics assigned in essay tests demand particular attention because they initiate and direct the act of writing that produces samples for evaluation. Particularly with ESL students, as Carlson and Bridgeman (1986) point out, ‘[t]he content implied by the topic must be as fair as possible, not favoring a specific set of personal or cultural experiences’ (p. 139). To explore the relationship between topic bias and candidate performance for timed-impromptu essays, the present study investigates the writing performances of a group of ESL students on the LPI, an English proficiency test that determines entry by international students to compulsory first-year English courses in some universities in western Canada.

Topical knowledge and ESL writing tests

As a focus of the study, topical knowledge is defined, in accordance with Alexander, Schallert, and Hare (1991), as ‘the interaction between one’s prior knowledge and the content of a specific passage’ (p. 334). Only when such an interaction, as Bachman and Palmer (1996) point out, requires the use of language knowledge can we make inferences about the language ability of test takers. In other words, it is difficult to evaluate students’ language ability on a writing task that requires certain cultural or topical content that they do not possess. According to Bachman and Palmer, ‘all instances of language use involve topical knowledge’ (italics in the original, p. 112), so the fundamental questions are: ‘To what extent does the task presuppose the appropriate area or level of topical knowledge, and to what extent can we expect the test takers to have this area or level of topical knowledge?’ (p.152).

Before addressing the above questions, it is useful to distinguish two types of topic-based essay testing in college and university writing assessments: timed-impromptu and integrated. Central to timed-impromptu writing is that test takers must have something to write about on a given topic under time pressure. In timed-impromptu essay tests, as White (1993) states, ‘students need to demonstrate the ability to generate prose and to develop thought in prose, and so the measurement can respond to these parts of the writing processes’ (p. 114). If the conditions of impromptu essay tests ‘strip [the] natural context from the writing’ by disallowing preparation for test takers to develop some topical knowledge (White, 1995, p. 36), integrated writing tests, such as those in the Canadian Academic English Language (CAEL) and the Internet-based Test of English as a Foreign Language (TOEFL iBT), are arguably closer to the natural context by providing test takers with input from listening and reading sources. Such integrated writing assessments, as Artemeva and Fox (2010, p. 486) illustrate, ‘elicit situated and socioculturally informed actions’ by involving students in the target domain of performance to identify possible written sources and issues needed to be addressed. In contrast to integrated essay writing, an impromptu essay test does not provide input, so test takers have to rely entirely on their prior knowledge in relation to the topical content. Accordingly, test writers have long striven to devise prompts that avoid cultural or subject-specific bias that might disadvantage certain groups of test takers. Nevertheless, there have been very few studies investigating whether the impromptu essay writing topics used in current standardized tests or placement tests are indeed general enough not to require particular cultural or subject-specific knowledge. This is clearly important to ensure fairness among diverse test takers.

Relevant studies on topical knowledge and ESL students’ written production

Among the few studies which have explored this issue, Winfield and Barnes-Felfeli (1982) compared the effects of culturally familiar and unfamiliar materials on the writing of students from intermediate-level ESL classes. Half of the participants were Spanish-speaking, and the other half were from other countries with various first language backgrounds. Prior to writing, all participants read two thematic paragraphs, one concerning a Spanish book of which the Spanish students had greater previous knowledge, and the other concerning a Japanese book of which all students had little or no knowledge. The reading materials were then taken away as students wrote down what they had read. Compared with the non-Spanish speakers, the Spanish speakers were found to write about the familiar Spanish book with more fluency and fewer grammar errors than was the case when writing about the unfamiliar Japanese book. The findings suggested that topic familiarity enhanced performance.

Since topics for academic writing tests are often derived from certain disciplinary areas, research on the effects of topic knowledge has focused on subject matter knowledge. For example, Tedick (1990) compared students’ written impromptu essay writing on a general and a field-related topic. The study involved graduate students enrolled in composition courses at different levels (beginning, intermediate, and advanced) in a university ESL program. The study illustrates that students of all proficiency levels produced significantly better writing on the field-specific topic. Compared to the general topic, the field-specific topic was also better in discriminating among groups with different writing proficiencies. Tedick’s findings, like those of Winfield and Barnes-Felfeli (1982), suggested a positive effect of topical knowledge on the writing performances of ESL university students.

Like Tedick (1990), Lee and Anderson (2007) also assumed that ESL graduate students possessed topical knowledge based on their majors. They explored the topic generality of a writing placement test by rotating three subject-specific topics integrated with listening and reading sources. Comparisons of the performance of students taking the test over a period of six years revealed that, in contrast to Tedick’s results, subject-specific topics did not favor test takers with the matching departmental affiliations. Based on the observation that the probability of getting the lowest score for the placement test decreased across all the topics as writers’ proficiency levels (via an independent measure) increased, the researchers suggested that general language competency might have played a more significant role than topical knowledge in the integrated writing test involving reading and listening tasks.

Also investigating the effect of field-specific topics, Lee (2004) compared students’ writing performance in response to a field-specific and a general topic in an ESL placement test. The field-specific writing test was integrated with listening and reading prompts drawn from four disciplinary areas (Business, Humanities, Technology, and Life Science). Results suggested that the prompt effect was not parallel across the four subgroups. Only students in the business and life science subgroups performed significantly better on the field-specific test. The other two groups showed no significant difference on the two different types of topics. The study suggested that apart from topical knowledge, the level of difficulty of writing sources (the reading and listening tasks) was an important factor influencing students’ performance.

While the above studies used writing tasks designed by researchers themselves, four other studies examined the effects of topical knowledge required by prompts used in standardized writing tests. One was conducted by Spaan (1993) who explored prompt type effects in the impromptu essay examination of the Michigan English Language Battery (MELAB), a standardized English proficiency test for non-native English speakers. Participating ESL students across three language proficiency levels (beginning, intermediate, and advanced) each wrote two essays on prompts requiring different types of content knowledge and rhetorical modes (Narrative/Personal vs. Argumentative/Impersonal). Although most students scored the same on the two types of prompts, the beginning- and intermediate-level writers who scored inconsistently across prompts seemed to have encountered content difficulties as all but one of them wrote much shorter essays on the Argumentative/Impersonal prompt that required special knowledge than on the Narrative/Personal prompt. The study suggested that prompt developers should take particular care to make the subject content accessible for the sake of test validity.

Two studies have explored topic effect in the CAEL Assessment (Fox, 2003; Jennings, Fox, Graves, & Shohamy, 1999). As an integrated test, CAEL requires that test takers use the information from readings and a lecture to write a response to indicate whether they agree or disagree with the propositional statement presented at the beginning of the test. Investigating whether there was a bias caused by how raters perceived the test taker’s position in the new science version of the test, Fox found that raters, believing that there was a potential for a right answer in science, ‘were consistently more likely in the middle of the scale to give a test taker “the benefit of the doubt” if that test taker appeared to be closer to what they perceived was a “right” answer’ (p. 41). The focus on the ‘right’ answer rather than an effective argument demonstrated scoring bias based on the raters’ and the test takers’ understandings of the topical content. An earlier study by Jennings et al. (1999) investigated whether test takers of CAEL, given a choice of five essay topics, performed differently than those not given a choice. It was assumed that the topic choice made by the test takers would reflect their prior knowledge and interest in the topic. Results showed that students who had a choice did indeed score higher than those who had no choice but the differences were not significant. Textual analyses of the writing samples suggested that students might not consider it appropriate to include prior knowledge or extra information beyond what was presented in the test. It was the context provided by the test materials, the researchers claimed, that had ‘reduce[d] the impact of prior knowledge to the point of insignificance’ (p. 448).

Like Jennings et al. (1999) and Fox (2003) who focused on test takers’ point of view, He and Shi (2008) compared ESL students’ perceptions and experiences of two standardized English writing tests: the Test of Written English (TWE) in TOEFL and the essay task in the LPI. In western Canada, the TWE is used as a university entrance test for international students who speak English as a second or foreign language, whereas the LPI is required, in many post-secondary institutions, for these students to register for the compulsory first-year English courses. As international students, all participants in the study had passed the TWE but many had taken the LPI repeatedly before passing it. All participants complained about the cultural bias of essay prompts or topics in the LPI such as ‘Road rage in Vancouver,’ ‘Pride of being a Canadian citizen’ and ‘Divorce rate in North America.’ These complaints raised questions about the validity of the test. The researchers called for further investigations to address issues of fairness and equity in L2 writing assessment.

The present study

While previous research confirms the importance of topical or prior knowledge in ESL relevant studies on this issue are scant and findings are inconclusive. Some have suggested a significant topic effect (Tedick, 1990; Winfield & Barnes-Felfeli, 1982), whereas others have found that the effect was either negligible (Jennings et al., 1999; Spaan, 1993), mixed (Lee, 2004), or non-existent (Lee & Anderson, 2007). In addition, further research is needed to verify whether students’ proficiency levels have an effect (Lee & Anderson, 2007; Spaan, 1993; Tedick, 1990) on their written responses to different prompts. Furthermore, since only a handful of researchers have examined writing tasks or prompts used in standardized English proficiency tests, what needs to be experimentally ascertained is what kind of knowledge underpins an ESL student’s performance on a writing task in a standardized test, especially in impromptu essay writing, when reading or listening sources are not provided.

The present study aims to add to the limited amount of research that explores how topical knowledge affects the writing scores and shapes the texts of ESL writers in impromptu essay writing. Conducted in a Canadian college, the study involved 50 ESL students across three English proficiency levels. The students responded to two sample prompts of the LPI, a standardized English proficiency test with an impromptu essay task. One prompt required general knowledge about university studies, and the other required specific knowledge about federal politics. Since students’ perceptions of task difficulty may help explain their task performances (Robinson, 2001), we conducted follow-up interviews to explore students’ perceptions of their writing experiences with the two prompts. The following are the three research questions:

Do ESL students across different proficiency levels perform differently in terms of overall and component scores when responding to a prompt requiring general knowledge and specific knowledge respectively?

Do the two prompts have different effects on specific textual features in ESL students’ writing in terms of content (quality of ideas, position taking, idea development, and idea wrap-up), organization (coherence and cohesion), and language (length, accuracy, and academic words)?

How do participants perceive their writing performances for the prompts requiring general and specific knowledge respectively?

Methods

Writing prompts

The two writing prompts used in the present study were extracted from The LPI Workbook (The University of British Columbia, 2008) designed to help students prepare for the LPI test. As an English proficiency test, the LPI is used by some post-secondary institutions in western Canada. ESL students and some native English-speaking students whose final English marks from high schools are below 75% must take the test in order to register for the compulsory 100-level English courses. The LPI is a two-and-a-half-hour test with three components: grammar, reading comprehension, and writing. The writing section is a 300- to 400-word impromptu argumentative essay with an overall weighting of 50% and hence a strong influence on the pass score. Since it is also a test for local students, some of the LPI writing prompts are related to Canadian culture and may therefore be difficult for international students (He & Shi, 2008).

A pilot study was conducted to identity two writing prompts that require different types of topical knowledge. Prior to the main study, a group of 20 participants from the participating college voluntarily took part in a general survey involving 24 sample prompts listed in The LPI Workbook (The University of British Columbia, 2008). The participants were asked to rate each of the prompts as either ‘difficult’ or ‘easy’ in relation to the topical knowledge required. Based on the results, two prompts were pulled out for the present study: Prompt A (about what to study) was rated by 97% of the participants as an easy prompt, and Prompt B (about federal politics) was rated by 100% of the participants as a difficult prompt. Notably, Prompt A was a general topic while Prompt B required some specific prior knowledge:

Prompt A: If you plan to attend a college or a university, what factors will influence your choice of what to study? Provide reasons.

Prompt B: Explain why you do OR do not take an interest in federal politics. Be specific.

The two prompts were then used in the main study. The order of the writing tasks was balanced across the participants: that is, about half did Prompt A first and the other half did Prompt B first. The participants gathered in their own classrooms and completed each task within 60 minutes. The two tasks were administered on two occasions with a week in between. To encourage students to write as much as possible, the researchers did not set a word limit for each essay though the LPI writing test requires 300 words (see Appendix for two sample essays).

Participants

A total of 50 participants enrolled in the ESL classes at City College voluntarily took part in the study. As a private educational institute in western Canada, the college provides both academic and vocational training programs and offers a two-year Associate of Arts Degree under the written consent of the local Ministry of Advanced Education. Every year the college enrolls a large number of international students who speak English as a second language. The courses offered at the college include academic reading, writing, speaking, and vocabulary. There is also an LPI preparation course for students who hope to enter local universities. Students are assigned to classes at different levels (beginning, intermediate and advanced) based on a placement test administered by the college. The test is developed using the Canadian Language Benchmarks (Pawlikowska-Smith, 2000) that measures English proficiency of adult immigrants in three components:speaking and listening, reading, and writing. The writing component assesses students’ writing proficiency based on whether they could create simple texts (beginning), moderately complex texts (intermediate), or complex texts (advanced) to present information and ideas.

The 50 participants scattered across classes of different levels turned out to be evenly distributed across three levels (basic = 17, intermediate = 16, advanced = 17). Along with the variable of prompt type, language proficiency forms another independent variable in this study. Aged from 17 to 35 years, the participants were from Mainland China (n = 35), Taiwan (n = 9), and South Korea (n = 6). Twenty-nine participants were females and 21 were males. They had been in Canada from 0.2 to 4 years. Most of the participants volunteered to participate in the study because they were planning to take the LPI.

Analytic scoring

Students’ essays were scored using a six-point analytic rating scale with three components: content, organization, and language (Table 1). The six-point rating scale was adapted from the six-point holistic rating rubric of the LPI, where Level 6 is the highest proficiency level, and Level 0 is the lowest. Since the present study focuses on topical knowledge and language proficiency, the component scores of content and language were further divided into seven indicators based on previous research on evaluation of ESL student writing (e.g. Hamp-Lyons, 1990; Lumley, 2002; Sakyi, 2000; Shi, 2001; Vaughan, 1991). The content component is comprised of four indicators: idea quality, exposition, idea development, and idea wrap-up. The language component of the essay is composed of three further indicators: length, accuracy, and frequency of academic words.

Table 1.

Six-point analytic rating scale

Components and scoring	Indicators	Definitions/focuses	Rating^a
			0	1	2	3	4	5	6
Content (Average of the four indicator scores)	Idea quality	Relevance, originality and depth of ideas
	Exposition	Thesis statement and position taken
	Idea development	Topic sentences and supporting details
	Idea wrap-up	Summary of main ideas
Organization		Logical thinking (coherence) and transitions within and between sentences/paragraphs (cohesion)
Language (Average of the three indicator scores) Prior to the calculation of the component score, each raw indicator score was converted to 6-point scale.	Length	Total number of words	Calculated by the first author
	Accuracy	Percentage of error-free T-units of the total number of T-units in each essay	Errors underlined and T-units identified by the raters. Percentage of error-free T-units calculated by the first author
	Academic words	Percentage of academic words of the total number of words in each essay	Frequency of academic words calculated by using anonline software program Percentage of academic words calculated by the first author

Note: ^a0 = Cannot be evaluated; 1 = No proficiency; 2 = Minimal; 3 = Developing; 4 = Adequate; 5 = Effective; 6 = Advanced.

Two experienced ESL writing instructors were invited to score the content and organization components of the essays. One was a native English speaker and the other a non-native English speaker. To ensure the internal reliability of the established analytic rubric, the raters first evaluated two student essays (one on each topic) together according to the criteria of the established analytic rating scale. They discussed their thoughts and decisions in order to reach a consensus. They then each rated all the essays with a six-point rating scale for the component of organization and all the indicators in the content component (Table 1). The Pearson correlation coefficient of the two raters’ scores for content and organization was higher than .85 (r >.85), ranging from .86 to .89. The average scores of the two raters were used as the final scores. The t score for the content component was calculated based on the average of the four indicator scores.

The two independent raters also identified the T-units and underlined all the errors (syntactic, lexical, spelling, and punctuation) in each essay. Following Hunt (1965), a T-unit contains a main clause with its subordinate clauses. After the two raters scored the essays, the first author then calculated the percentage of error-free T-units in each essay (accuracy). She also calculated the total number of words in each essay (length) and the percentage of academic words (words not included in the General Service List (West, 1953) using The AWL Highlighter, an online software program (http://jbauman.com/gsl.html). Finally, the three raw indicator scores of length, accuracy, and academic words were converted to the six-point scale to be consistent with the ratings of the other two component scores. The conversion was done by first rescaling the scores to a 0–100 scale then reconverting to a six-point scale. During these processes, a weighting check was conducted using statistical Principle Component Analysis for the three observed indicator scores (length, accuracy, and academic words), revealing an approximate weighting, range from 0.35 to 0.918 with a mean of 0.713. The language component score was then calculated based on the average of the three indicator scores.

Statistical analysis of writing scores

The overall score of each essay is the average of the three component scores. All the scores, the overall score and the three component scores, are the dependent variables for the study. Normality was checked for all dependent variables using boxplots and descriptive analysis. The skewness values for both the overall and component scores ranged from −0.003 to −0.314, which indicated a normal sampling distribution of the dependent variables. A paired-samples t test was used to compare the two group mean differences of the overall writing scores on the two prompts across proficiency levels. Given that each dependent variable (the overall and three component scores) was treated separately, a series of 3 × 2 ANOVAs were run to identify the main effects and interactions of three proficiency levels and the two prompts on the writing performances. The Type I error rate was corrected via Bonferroni adjustment for the three component scores (.05/3 = .0167 for each ANOVA). The Tukey b test of significant simple main effects was used when an interaction was found. Finally, analyses of student writing samples based on the descriptive data of the indicator scores were conducted to identify textual and linguistic features related to the prompt type effects.

Interviews

To help explain and verify the quantitative findings based on the writing scores, post-test semi-structured individual interviews were conducted to explore participants’ perceptions about their experiences in responding to the two prompts. Five participants across the three language proficiency levels, basic (Ben and Bill), intermediate (Ida), and advanced (Allen and Alex), volunteered to participate in the interviews. Ben, Bill, Allen, and Alex spoke Chinese as their first language, whereas Ida was from South Korea. Each interview lasted from about 30 minutes to one hour. The first author, fluent in both Chinese and English, conducted the interviews. The four Chinese students switched between Chinese and English, whereas Ida spoke in English. Ida was in the intermediate class and had no problem communicating in English. All interviews were tape-recorded and later transcribed/translated by the first author.

Findings

Prompt type effects and overall scores

As a group, participants across all proficiency levels had higher overall writing scores for the general knowledge task (Prompt A) than the scores they had for the specific knowledge task (Prompt B) (M of 3.23 vs. 1.72, Table 2). The paired-samples t test confirmed that the group mean differences were significant (t (49) = 10.56, p < .05), suggesting that these L2 writers underperformed on Prompt B.

Table 2.

Summary of overall writing scores

Proficiency levels	No. of participants	Prompt A		Prompt B
		M	SD	M	SD
Basic	17	1.64	0.83	0.62	0.47
Intermediate	16	3.43	0.59	2.05	0.92
Advanced	17	4.63	0.41	2.51	1.23
All 3 levels	50	3.23	1.4	1.72	1.23

Meanwhile, the results of the 3 × 2 univariate ANOVA showed statistically significant main effects for proficiency level (F(2, 47) = 59.82, p < .05) and prompt type (F(1, 47) = 135.47, p < .05), as well as a significant interaction across different proficiency levels (F(2, 47) = 6.42, p > .05) (Table 3, Figure 1). A post-hoc Tukey b analysis detected main effects of prompts on the overall writing scores for all pairwise comparisons across proficiency levels (p < .05). These findings indicate an association between topical knowledge and L2 writing across all proficiency levels.

Table 3.

3 × 2 analysis of variance for overall writing scores

Source	df	F	η²	p
		Between subjects
Proficiency levels	2	59.82	.72	.001*
Error	47	.88
		Within subjects
Prompts	1	135.47	.74	.001*
Proficiency × Prompt	2	6.42	.22	.023*
Error (Prompts)	47	(.42)^a

Notes: ^aValues enclosed in parentheses represent mean square errors. *Indicate a significant level at .05, two tailed.

Figure 1.

Effects of prompt types on overall writing scores

Prompt type effects and component scores

Table 4 illustrates that students across all three levels obtained higher component scores on the general topic (Prompt A) than on the specific topic (Prompt B). Consistent with the main effect of the prompt on the overall writing scores, the results of the 3 × 2 ANOVAs revealed that both prompts and proficiency levels had statistically significant main effects on the three component scores (using adjusted p-level at .0167, Table 5). The consistent drop in the three component scores for Prompt B, as illustrated in Figures 2, 3, and 4, indicates that students across all proficiency levels underperformed on a topic that required specific knowledge. Figure 4 also shows that the scores on language for all proficiency levels, compared with the scores on content (Figure 2) and organization (Figure 3), were much lower on Prompt B than those on Prompt A. This suggests that the knowledge-specific prompt had a more serious effect on the linguistic performance of the students whose writing on Prompt B did not represent their true language proficiency levels.

Table 4.

Summary of component scores

Type of scores	Proficiency levels	No. of participants	Prompt A		Prompt B
			M	SD	M	M
Content	Basic	17	1.17	0.63	0.70	0.48
	Intermediate	16	3.25	0.56	2.53	1.19
	Advanced	17	4.75	0.38	3.23	1.59
	All 3 levels	50	3.05	1.59	2.15	1.59
Organization	Basic	17	1.11	0.82	0.53	0.60
	Intermediate	16	3.46	0.65	2.67	1.26
	Advanced	17	4.82	0.57	3.18	1.62
	All 3 levels	50	3.12	1.7	2.11	1.68
Language	Basic	17	2.65	1.55	0.64	0.45
	Intermediate	16	3.58	0.81	0.97	0.55
	Advanced	17	4.34	0.66	1.13	0.67
	All 3 levels	50	3.52	1.27	0.91	0.59

Table 5.

3 × 2 analysis of variance for the three component scores

Source	df	F	η²	p
Between subjects
Proficiency level
Content	2	80.49	.77	.001**
Organization	2	70.76	.75	.001**
Language	2	10.77	.31	.001**
Error
Content	47	(1.01)^a
Organization	47	(1.28)
Language	47	(.95)
Within subjects
Prompt type
Content	1	30.90	.40	.001**
Organization	2	35.11	.43	.001**
Language	1	308.53	.87	.001**
Proficiency × Prompt
Content	2	3.88	.14	.028
Organization	2	3.72	.14	.032
Language	2	5.45	.19	.007
Error (Prompts)
Content	47	(.66)
Organization	47	(.18)
Language	47	(.01)

Notes: ^aValues enclosed in parentheses represent mean square errors.

Indicate a significant level at .0167, two tailed.

Figure 2.

Effects of prompt types on content scores

Figure 3.

Effects of prompt types on organization scores

Figure 4.

Effects of prompt types on language scores

There were no detected effects of interaction between prompt types and proficiency levels on the component scores of content and organization (p > .0167) (Table 5). This means that prompt effects on content and organization scores were not related to students’ proficiency levels. Put otherwise, performance was higher for Prompt A than Prompt B for all proficiency levels. However, an interaction was found between prompt types and proficiency levels on language component scores (p < .0167). A post-hoc analysis using a Tukey b test revealed a significant mean difference in proficiency effects on the language component scores between the basic group and the advanced group (p < .0167). However, no such differences were detected between the basic and the intermediate group and between the intermediate and the advanced group (p > .0167). These findings suggest that for language component scores, one proficiency level (either the basic or the advanced) has a different scoring pattern on the two prompts. The two nonparallel lines in Figure 4 illustrate that, while all proficiency groups scored much higher on Prompt A than they did on Prompt B, the difference between the two prompts is more pronounced for the advanced level. These observations confirm that the topical knowledge called for in Prompt B affected students’ writing across all proficiency levels.

Indicator scores and textual analyses of student writing

The descriptive statistics (Table 6) showed that the general knowledge task (Prompt A) had higher group mean indicator scores than the knowledge-specific task (Prompt B) across the board in idea quality (M of 3.37 vs. 2.52), position taking (M of 3.11 vs. 2.32), idea development (M of 3.28 vs. 2.16), idea wrap-up (M of 2.46 vs. 1.59), length (M of 2.37 vs. 1.39), accuracy (M of 4.10 vs. 0.16), and academic words (M of 4.10 vs. 1.18). Table 7 shows that students’ responses to Prompt B had fewer words (M of 133 vs. 226 words), error-free T-units (M% of 3.34 vs. 7.08), and academic words (M% of 4.24 vs. 6.65).

Table 6.

Summary of seven indicator scores in content and language

Components	Indicators	Proficiency	No. of participants	Prompt A		Prompt B
				M	SD	M	SD
Content	Idea quality	Basic	17	1.82	0.66	1.11	0.73
		Intermediate	16	3.39	0.47	2.92	1.16
		Advanced	17	4.91	0.47	3.54	1.73
		All 3 levels	50	3.37	1.39	2.52	1.63
	Position taking	Basic	17	1.04	0.88	0.77	0.62
		Intermediate	16	3.43	0.59	2.81	1.25
		Advanced	17	4.88	0.32	3.39	1.69
		All 3 levels	50	3.11	1.73	2.32	1.68
	Idea development	Basic	17	1.46	0.84	0.68	0.53
		Intermediate	16	3.58	0.74	2.68	1.32
		Advanced	17	4.80	0.43	3.16	1.57
		All 3 levels	50	3.28	1.56	2.16	1.62
	Idea wrap-up	Basic	17	0.34	0.57	0.24	0.44
		Intermediate	16	2.61	1.13	1.71	1.48
		Advanced	17	4.43	0.68	2.84	1.47
		All 3 levels	50	2.46	1.89	1.59	1.62
Language	Length	Basic	17	0.96	0.59	0.61	0.46
		Intermediate	16	2.92	1.26	1.80	1.02
		Advanced	17	3.26	.94	1.80	1.02
		All 3 levels	50	2.37	1.40	1.39	1.03
	Accuracy	Basic	17	3.40	2.27	1.71	1.74
		Intermediate	16	4.01	0.88	0.13	0.10
		Advanced	17	4.87	0.76	0.19	0.19
		All 3 levels	50	4.10	1.58	0.16	0.16
	Academic words	Basic	17	3.59	2.18	1.14	0.82
		Intermediate	16	3.81	0.88	0.98	0.80
		Advanced	17	4.87	0.77	1.42	1.31
		All 3 levels	50	4.10	1.52	1.18	1.01

Table 7.

Mean of length, error-free T-units (%), and academic words (%)

Language indicators	Proficiency levels	No. of participants	Prompt A		Prompt B
			M	SD	M	SD
Length	Basic	17	92	56	58	44
	Intermediate	16	279	121	172	97
	Advanced	17	311	89	171	98
	All 3 levels	50	226	133	133	98
Accuracy (error-free T-units)	Basic	17	1.88	1.22	0.88	0.78
	Intermediate	16	8.13	3.44	4.06	3.70
	Advanced	17	11.29	4.48	5.12	3.74
	All 3 levels	50	7.08	5.15	3.34	3.51
Academic words	Basic	17	3.59	2.98	2.35	3.06
	Intermediate	16	7.19	4.65	4.38	3.95
	Advanced	17	8.88	5.27	6.00	5.00
	All 3 levels	50	6.54	4.86	4.24	4.28

The differences in the indicator scores led to textual analysis of the students’ writing to trace differences in specific textual features related to each prompt. Apart from the differences in length, error-free T-units and use of academic words, a major difference between students writing for the two prompts concerned idea quality and position taking. Many students were observed to have problems taking a position pertaining to federal politics when writing for Prompt B. For example, Bob, a beginning-level student, failed to present a fully developed position in response to Prompt B by writing about how he came from China and knew little about federal politics. The following is Bob’s writing for Prompt B (errors are not corrected):

Most people like to find something they think interesting to know and study. Because federal politics are too many backgrounds and relationship between people and other people, federal politics is boring. I come from China, China is not federal system. So, I never know about federal politics. If I want to know federal system, I will go to study their backgrounds. In China, I only know a little of backgrounds about the government, and I feel it is not interesting. Therefore, I do not like to know about federal politics.

In contrast, like many other participants, Bob responded to Prompt A (the choice of what to study) in greater depth and with a clearly stated author position at the beginning. The following are the first two paragraphs he wrote for Prompt A:

In a university students have to choose some classes for study. Someone says a good beginning is a half of success. So, if students want to be succeed after university, they have to know what to study is good for themselves. However, two important factors, such as interest and preferable majors, will influence student choice of what to study.

When students choose classes, they will think about what classes they think interesting. Firstly, if students think the class is interesting for them, they will learn fast and well. Because they are willing to learn, they will enjoy the class. For example, a student likes to learn math, so he want to do lots of homework about math. After that, his math marks are the higher other students. Next, when students love to go to the class, they do not feel any uncomfortable with it. Students have a good mood when they take the class which is interesting for them. For example, in the math class, teacher usually gives lots of homework after the class. And then, many students feel a little bit unhappy, but some of them will ask teacher to give more questions to them. Therefore, interesting is an important factor.

Although the essay had visible grammar problems, Bob did incorporate some features valued in the English-speaking context. With a clear thesis statement in the introductory paragraph (Two important factors, such as interest and preferable majors, will influence student choice of what to study), Bob delivered a clear objective for the essay. In the following body paragraph, he included a topic sentence at the beginning and followed up with an explanation or a discussion using supporting evidence (e.g. math class). In addition, Bob used a few transitions to connect sentences (e.g. however; firstly; next; for example). Such evidence shows that the general topical knowledge required by Prompt A allowed Bob to function more effectively in composing than he did on Prompt B. The textual analysis of Bob’s writing illustrates how indicator scores demonstrate various prompt-type effects in the writing of the participating students.

Students’ perceptions

In the post-test interviews, all five students (Ben, Bill, Ida, Allen and Alex) said they felt comfortable writing about their choice of study (Prompt A) but found writing about federal politics (Prompt B) rather difficult. One difficulty was a lack of knowledge about federal politics and the Canadian government. The participants believed that a precondition for good writing was a topic that is ‘understandable’ (Bill), ‘related to [one’s own] experiences’ (Ida), or ‘you are familiar with’ (Allen). Because of the unfamiliar topic, Bill said, ‘I sat there thinking for a long time but got no ideas where to start writing about it [Prompt B]. You just don’t know it. How can you write about it?’ Similarly, Bing complained,

Most of the students here [in this class] came here [to Canada] not long ago and haven’t settled down yet. It seems they only know the names of democracy party and conservative [party]. For other parties, they have no ideas. They are not familiar with the politics here yet. They don’t know the government system either. So, it is more difficult for us to write on this topic.

On the grounds that they did not have a government labeled as ‘a federal government’ in their home country, all interviewees, however, associated federal politics with the Canadian culture and said that they needed more exposure to the Canadian society in order to write well for Prompt B. In this regard, Bill articulated his great difficulty, ‘I don’t know much about federal politics. I just came here last term. I may learn more about it in the future.’ Meanwhile, Ben stated his belief in the importance of practice, ‘I need more practice and write more … I need to do more reading and also watch more English TV to influence my thinking way.’ The meaning of practice in Ben’s testimony had a broad sense referring to L2 knowledge accumulation. According to Ben, learning English writing is a progressive process involving real-life exposure to English in the host culture.

Another difficulty mentioned by the participants was a lack of vocabulary when writing for Prompt B. Without the necessary subject words for federal politics, the participants said they could not express their thoughts and ideas. For instance, Ida recounted that she was struggling while writing for Prompt B: ‘I thought a long time thinking about Topic 2 [Prompt B] but finally I still couldn’t write more and I gave up … I need to know more words to express my views.’ Alex gave the following example when commenting on how he had no words about Canadian federal politics to communicate his thoughts,

I needed to express 竟选 (‘run for’ in Chinese), but I did not know which word I can use… There were many situations like this in writing for Topic 2 [Prompt B]. It was a test and we were not allowed to use a dictionary. So I was stuck there most of the time… To write on Topic 2 (Prompt B) well, you need to know a lot of words about the topic. Otherwise, you can only write or talk about something superficial.

Another student, Bill, mentioned how he had memorized a certain number of English words based on dictionary meaning. However, he found that words he learned as a foreign language in his local cultural context might not carry identical connotations in the target culture, and thus might fail to convey ideas accurately for writing on a culturally bound topic like Canadian federal politics. In his words,

It [English] is different from what I learned before. Sometimes I am not sure what those words really mean and how to use them correctly although I know them. For example, I know ‘political’ is something related to government. But I heard people use it to say about a person, like ‘You are political.’ I feel English words are more complex. We need to know lots of words about government and politics for Prompt B. I am just poor at them.

A third difficulty the students mentioned was a lack of confidence in commenting on authorities. Coming from China, Ben, Bill, Allen, and Alex said that they felt nervous about making comments on Canadian politics and government. Among them, Alex relayed his ideas using the case of Lai Changxing, a Chinese smuggling ringleader of ‘most-wanted’ status by the Chinese government. At the time when the interview was conducted, the Canadian government had issued Lai working authorization in Canada. Alex showed puzzlement at such permission from the Canadian government while making comments on the case:

I really don’t understand Canadian politics so I dare not to make more comments on it. This made writing for topics 2 [Prompt 2] more difficult. It’s not just writing, actually what’s right or wrong also made this topic hard. I feel somewhat nervous about saying something wrong there.

Like Alex, other participants from China said they were cautious when writing about federal politics in general or Canadian federal politics in particular. The lack of confidence of these Chinese students in commenting on authority might be attributed to the influence of the Confucian culture where a ‘good’ piece of writing should be judged by its reverence to authority, among other criteria.

Discussion

The present data consistently show significant effects of prompt types on students’ writing performances. The overall writing scores showed that students across proficiency levels performed significantly better on the general topic than they did on the specific topic. The knowledge-specific task produced lower scores on the content component due to poor idea quality, insufficient idea development, implicit position taking, and weak conclusions. Students also scored lower on organization and language on the topic-specific task because of weaker coherence and cohesion, shorter essay length, more syntax and lexical errors, and less frequent use of academic words. Post-test interviews confirmed how participating students were challenged by the knowledge-specific writing prompt because of their lack of knowledge about federal politics, lack of specific vocabulary for an unfamiliar topic, and lack of confidence in commenting on government authority.

The overall difference in students’ performances for Prompt A and B in this study raises construct validity concerns. As Messick (1989) noted, construct-irrelevance is a tangible threat to the validity of a test. Prompt B about federal politics reflects construct-irrelevant variability by challenging L2 writers with a topical knowledge that they did not possess and, thereby, causing notably lower scores across all proficiency levels than on the general task which gave them greater opportunity to display their ability. The study suggests that test users need to distinguish deficiencies in L2 writing ability from L2 writers’ interpretations of writing topics resulting from their prior knowledge and L1 cultural backgrounds. This is particularly significant in the case of high-stakes, norm-referenced tests such as the LPI which is administered in a specific local context and relies upon local knowledge. The present findings suggest that test developers need to be sensitive to the writers’ cultural and political backgrounds which might contribute to their level of topical knowledge. Following Bachman and Palmer (1996), we believe that test evaluators should conduct construct validation studies to identify and explain the sources of variability in L2 writing assessment contexts that may unfairly depress scores of a particular cultural background and hence threaten the validity of the test.

Based on our finding that writing is influenced by one’s prior knowledge of the topic, the present study rejects the assumption that language proficiency is the main factor determining performance in impromptu essay writing. Grammatical knowledge, as Bachman and Palmer put it, ‘is only one aspect of the ability to use language to perform academic writing tasks’ (p. 23). If grammatical knowledge is equally important to students at all levels, topic knowledge seems more crucial to advanced students compared to lower level students. As the present findings indicate, although students across all proficiency levels had much lower scores on Prompt B than the scores they had on Prompt A, the difference between the scores of the advanced students on the two prompts was more pronounced than that of students at the basic level. In line with Tedick’s (1990) observation that topic familiarity has little influence on the writing of low-level students, the results of the present study suggest that only when students have reached a certain level of writing proficiency can they either make effective use of their prior knowledge or conversely be impeded in their writing by a lack of topical knowledge. Since topical or prior knowledge is a part of the construct of language proficiency (Bachman & Palmer, 1996; Hammadou, 1991), language proficiency assessments are relevant only under the condition that irrelevant content contamination is controlled.

The authors, however, are aware of a number of methodological limitations of the study which in turn limit the strength of claims that can be made about the findings. First, it should be acknowledged that the effort to balance test tasks by reversing the order of topics for half of the groups might have enabled students to compare notes over the intervening week between the two tests although the low-stakes nature of the research experiment makes this unlikely. Second, the choice of the two topics rated by students in a pilot study as the easiest and most difficult may have predetermined the outcome of the study. Using a Likert scale rather than binary choice (easy and difficult) response options to categorize the topics might have yielded more suitable, less obviously oppositional, topic choices. Also, the fact that the provenance of 35 of the 50 participants was Mainland China may have rendered the inappropriateness of topic B more stark and hence contributed to a more pronounced prompt effect than might have been the case with test takers from other countries. In addition, the study would have benefitted from a comparison between the test performances of the target group of international students and those of native-English-speaking students. A finding of differential performance according to the extent of specific content knowledge within such a group would lend more weight to the study’s findings.

Conclusion

The present findings suggest that a lack of topical knowledge in relation to particular task prompts may place ESL students at a risk of failing a timed impromptu writing test. As Ruth and Murphy (1988) have pointed out, ‘No topic can absolutely guarantee equal access to knowledge of the subject matter for all participants in a test. But some topics provide more opportunities than others’ (p. 253). Although the present study does not address how general a topic needs to be to ensure fairness, it finds evidence to support the contention that there is an issue with topic specificity which needs to be carefully monitored, especially in high stakes testing contexts.

The lack of such research on the LPI, which ‘is administered to over 18,000 candidates per year’(http://www.companylisting.ca/Applied_Research_Evaluation_Services/default.aspx), to both international and immigrant students, is a matter for some concern. Relevant authorities should subject essay topics used in the LPI to critical scrutiny and rigorous trialling in order to ensure a topic-fair writing test. Also worth exploring would be the value of using integrated writing tasks, such as those utilized in CAEL and TOEFL iBT, that require students to use information provided in the reading texts and audio materials and thereby potentially reducing the effect of topic unfamiliarity on performance. Future studies could compare the impact of topic, content management, and prior knowledge on the integrated task (with readings/lectures as a background to writing on a topic-specific prompt) with responses to a general timed-impromptu essay task addressing the same topic.

Footnotes

Appendix

Acknowledgements

This study is based on the first author’s PhD research, which was supervised by the second author. We thank Bruno Zumbo and Monique Bournot-Trites for their valuable input throughout the process. We also thank the students for their participation, Paul Johnson and Yong Fan for rating and coding the writing samples, the LT editor, and the three anonymous reviewers for their comments on earlier drafts of the paper.

Notes

References

Alexander

P. A.

Schallert

D. L.

Hare

V. C.

(1991). Coming to terms: How researchers in learning and literacy talk about knowledge. Review of Educational Research, 61, 315–343.

Applied Research & Evaluation Services. (n.d.). Retrieved July 5, 2011, from www.companylisting.ca/Applied_Research_Evaluation_Services/default.aspx.

Artemeva

Fox

(2010). Awareness vs. production: Probing students’ antecedent genre knowledge. Journal of Business and Technical Communication, 24, 476–515.

Bachman

Palmer

(1996) Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.

Carlson

Bridgeman

(1986).Testing ESL student writers. In Greenberg

K. L.

Wiener

H. S.

Donovan

R. A.

(Eds.), Writing assessment (pp. 126–152). New York: Longman.

Fox

(2003). From products to process: An ecological approach to bias detection. International Journal of Testing, 3, 21–48.

Hammadou

(1991). Interrelationships among prior knowledge, inference and language proficiency in foreign language reading.Modern Language Journal, 75, 27–38.

Hamp-Lyons

(1990). Second language writing: Assessment issues. In Kroll

(Ed.), Second language writing: Research insights for the classroom (pp. 67–87). Cambridge: Cambridge University Press.

Shi

(2008).ESL students’ perceptions and experiences of standardized English writing tests. Assessing Writing, 13, 130–149.

10.

Hunt

(1965). Grammatical structures written at three grade levels. Champaign, IL: National Council of Teachers of English. (ERIC Document Reproduction Service No. ED 113 735)

11.

Jennings

Fox

Graves

Shohamy

(1999). The test-takers’ choice: An investigation of the effect of topic on language-test performance. Language Testing, 16, 426–456.

12.

Lee

(2004). Constructing a field-specific writing test for an ESL placement procedure.Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.

13.

Lee

Anderson

(2007). Validity and topic generality of a writing performance test. Language Testing, 24, 307–330.

14.

Lumley

(2002). Assessment criteria in a large-scale writing test: What do they really mean to raters? Language Testing, 19, 246–276.

15.

Messick

(1989).Validity. In Linn

R. L.

(Ed.), Educational measurement (3rd ed.) (pp. 13–103). New York: American Council on Education and Macmillan.

16.

Paragon Testing Enterprises (2011). LPI Brochure. Retrieved July 5, 2011, from www.paragontesting.ca.

17.

Pawlikowska-Smith

(2000). Canadian Language Benchmarks 2000: English as a second language for adults. Ottawa: Centre for Canadian Language Benchmarks.

18.

Robinson

(2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22, 27–57.

19.

Ruth

Murphy

(1988). Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex Publishing.

20.

Sakyi

A. A.

(2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate composition. In Kunnan

A. J.

(Ed.), Fairness and validation in language assessment (pp.129–145). Cambridge: Cambridge University Press.

21.

Shi

(2001). Native and nonnative speaking EFL teachers’ evaluation of Chinese students’ English writing. Language Testing, 18, 303–325.

22.

Spaan

(1993).The effect of prompt in essay examinations. In Douglas

Chapelle

(Eds.), A new decade of language testing research: Selected papers from the 1990 Language Testing Research Colloquium (pp. 98–122). Alexandria, VA: Teachers of English to Speakers of Other Languages.

23.

Tedick

D. J.

(1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123–143.

24.

The AWL Highlighter. Retrieved May 28, 2009, from http://jbauman.com.

25.

University of British Columbia. (2008). The LPI workbook (2nd ed.). Vancouver, Canada: University of British Columbia.

26.

Vaughan

(1991). Holistic assessment: What goes on in the raters’ minds? In Hamp-Lyons

(Ed.), Assessing second language writing in academic contexts (pp. 111–125). Norwood, NJ: Ablex.

27.

West

(1953). A general service list of English words. London: Longman.

28.

White

E. M.

(1993). Assessing higher-order thinking and communication skills in college graduates through writing. Journal of General Education, 42, 105–122.

29.

White

E. M.

(1995). Apologia for the timed impromptu essay test. College Composition and Communication, 46, 30–45.

30.

Winfield

F. E.

Barnes-Felfeli

(1982).The effects of familiar and unfamiliar context on foreign language composition. Modern Language Journal, 66, 373–378.