Abstract
The impetus for the present study came from Ferris’ (2010) article discussing the gap between theory, research, and practice in written corrective feedback (WCF). To address this gap, the present study aimed at comparing the impact of focused vs. comprehensive WCF and revision on the improvement of written accuracy of learners of English as a second language (ESL), with a focus on their global linguistic errors (sentence and word); the study also examined how this improvement contributed to the students’ writing quality, defined in terms of clarity of expression and text comprehensibility. Data was collected from 78 intermediate French ESL learners randomly assigned to four different treatment groups: two groups received focused WCF and two groups comprehensive WCF; one of the focused and one of the comprehensive groups were required to revise their writing and the other two groups did no revision after WCF. A comparison was made between the error means of the four groups on three out of seven essays they wrote during a 15-week writing course: week one (T1), week eight (T2) and week 14 (T3). The results revealed that the focused groups were more successful than the comprehensive ones in reducing their words errors at T2; no significant effect was observed for revision. Also, the focused-revision group outperformed the other groups at both T2 and T3 in reducing their sentence errors. The comprehensive-revision, however, group was more successful than the other groups in improving their overall written accuracy. The results also showed that the focused-revision group made more improvement than the other three groups in their writing quality at T3.
I Introduction
In spite of the large volume of studies on written corrective feedback (WCF), it still remains one of the most controversial issues in second language writing research. This ongoing debate over the efficacy of WCF has been attributed to methodological issues and inconsistencies in WCF research (Bitchener & Ferris, 2012; Bruton, 2009; Ellis, 2010; Ferris, 2004, 2010; Liu & Brown, 2015; van Beuningen, de Jong, & Kuiker, 2012). Ferris (2010) contends that these studies have followed different aims and, hence, can be roughly classified into two groups. The first group are second language acquisition (SLA) oriented studies that have used a pre-test–treatment–post-test–delayed-post-test design and focus on the acquisition of certain linguistic forms; these studies basically deal with ‘writing-to-learn language’ (WLL) dimension of second language (L2) writing (see Manchón, 2011, p. 3). The second group are L2 writing oriented studies that have majorly used a pre-test–treatment–post-test design and examined the effect of WCF on improving L2 learners’ written accuracy, which, in turn, contributes to overall writing quality; these studies belong to the ‘learning-to-write’ (LW) dimension of L2 writing (see Manchón, 2011, p. 3). Ferris (2010) suggests that the findings of these studies, particularly SLA ones, due to using highly controlled designs, may not be confidently applied to a natural L2 writing classroom. She argues: L2 writing researchers and practitioners might wonder if, in the interest of empirical rigor, some of the SLA research efforts on written WCF have been so narrowly focused that it would be difficult to transfer their approach and findings to a real writing classroom or to a diverse group of students. These criticisms raise legitimate questions of whether the two bodies of work on written WCF can be compared, should be included in the same reviews, or can contribute to one another, let alone provide practical pedagogical answers to L2 writing practitioners. (p. 186)
She, hence, suggests that future WCF research may set up a mixed design, combining the components of SLA-oriented and L2 writing studies, and address the issues separately tackled by them. Some of these issues include the inclusion of revision in the WCF study, focusing on more global and complex writing errors, and studying the long-term effect of WCF on written accuracy as well as on writing quality (see also Bitchener & Storch, 2016; Liu & Brown, 2015). These issues are discussed in more details below.
1 Types and focus of WCF
WCF has been classified with respect to its types and its focus. As far as WCF types are concerned, a distinction has been made between direct and indirect (or implicit vs. explicit) teacher feedback (Bates, Lane, & Lange, 1993; Ellis, 2009; Ferris, 1995, 2003; Karim & Nassaji, 2018, 2019; Suzuki, Nassaji, & Sato, 2019). Direct feedback means identifying student errors and providing the correct form (Bitchener, Young, & Cameron, 2005). Indirect feedback, on the other hand, indicates in some ways that an error exists without explicitly providing the correct form (Ferris, 2003). This may be provided in one of four ways: underlining or circling the error, thereby indicating and locating the error; recording in the margin the number of errors in a given line, thereby showing that an error has occurred somewhere in that line (Bitchener & Storch, 2016); a third category was proposed by Ellis (2009) named metalinguistic feedback, which involves providing learners with some form of comment about what has caused the error. This comment can be provided in the form of an error code (e.g. art = article; prep = preposition) which can be placed over the location of the error in the text or in the margin. Another form of metalinguistic feedback includes providing learners with a metalinguistic explanation of their error in the form of grammar rules and an example or examples of the correct usage. In this form, the teacher assigns a number to each error in the learner’s text and, at the end of the full text, they provide the metalinguistic explanation and example(s) beside the relevant number.
Concerning the focus of WCF, it can be either comprehensive or focused. Comprehensive WCF involves teacher’s correction of all the errors that the students make in their writing. Focused WCF, on the other hand, refers to targeting specific pre-selected errors. Focused WCF, according to Ellis, Sheen, Murakami, and Takashima (2008), can be highly focused, that is addressing a single error type, or less focused, which ‘restricts correction to a limited number of pre-selected types’ (p. 356). Lee (2018) calls this type of focused WCF, mid-focused feedback, which ranges from two to six different error types selected in advance.
Highly focused studies, mainly SLA-oriented, have aimed at investigating the impact of WCF on the acquisition of a specific linguistic form to see if and to what extent it would lead to interlanguage development (e.g. Han, 2002 and Iwashita, 2003). They have majorly focused on the acquisition of English definite and indefinite articles (e.g. Ellis et al., 2008; Sheen, 2007). Ferris (2010) and Hartshorn et al. (2010) argue that focusing on the acquisition of a specific linguistic form does not live up to the expectations of students and teachers in a natural classroom setting, where they expect feedback on all their linguistic errors to improve their written accuracy and, hence, the findings of these studies lack reliability and ecological validity. Lee (2018, p. 3), too, argues that focusing on such a limited number of errors has ‘little pedagogical value’. She adds that ‘such research has taken place in laboratory-like conditions that bear little resemblance to real classroom context.’ Moreover, the forms treated in these studies have very simple linguistic functions and do not contribute substantially to the overall writing quality. Two exception, however, are the studies conducted by Shintani, Ellis, and Suzuki (2014) and Suzuki et al. (2019). Shintani et al. (2014) compared the impact of two WCF types (direct WCF and metalinguistic explanation) and revision on Japanese pre-intermediate EFL learners’ accuracy of using the indefinite article and the hypothetical conditional over the course of four weeks. The results of their study showed that the students made significant improvement in their use of the conditionals but not in using the indefinite article. The results also revealed that direct WCF followed by revision was the most effective type of WCF. They argued that learners attend to the linguistic form that contributes more to the global meaning of the text (here the hypothetical conditional).
Similarly, Suzuki et al. (2019) investigated the interactional effect of WCF explicitness and the type of structure targeted by WCF. They compared the effect of direct CF with and without metalinguistic explanation to indirect CF with and without metalinguistic explanation on the accuracy with which a group of L2 learners used two target structures (i.e. English indefinite article and the past perfect tense) in their revisions as well as in a new writing task. The results of the study showed that all the groups improved their revision accuracy in both target structures while they improved their accuracy in the past perfect tense only in the new writing task. The results also showed that the indirect only group outperformed the other groups in reducing their errors in the past perfect tense only. Similar to Shintani et al.’s (2014) study, the authors argued that since the past perfect tense is more complex or salient than the indefinite article, ‘the learners were more likely to pay attention to the more salient structure or complex structures’ (p. 25).
On the other hand, comprehensive WCF studies have focused on all or a large number of errors made by students in an L2 writing classroom (see, for example, Chandler, 2003; Ferris & Roberts, 2001, Rahimi, 2009, to name a few). These studies have been criticized for being too comprehensive, overloading students’ attentional capacity, and diluting the impact of WCF on certain errors, especially the ones that contribute more to the communicative effect of students’ texts (see Bitchener & Ferris, 2012; Hartshorn et al., 2010; Lee, 2018). That is why Lee (2018) advises following a mid-focused WCF approach, which addresses a more limited number of errors depending on the teaching context and students’ needs. She argues that this approach has more pedagogical value in a natural L2 writing classroom.
To the best of my knowledge, there are only two L2 writing studies that have investigated the impact of focused WCF, adopting a mid-focused approach, on improving L2 learners’ written accuracy. Ferris et al. (2013) investigated the impact of indirect WCF on three to four most prominent errors made by 10 L2 students in a natural classroom context and interviewed the students on the strategies they used for applying the WCF. Results of retrospective interviews revealed that the students found focused WCF and revision helpful in improving their writing errors. In the second study, Hartshorn et al. (2010) focused on the most frequent errors made by a group of L2 learners. Using a dynamic WCF approach, they tried to make feedback ‘manageable’ by requiring the students to write shorter texts (10-minute paragraphs) to make the number of errors limited. The results revealed that this approach to WCF helped students improve the target errors. The study, however, did not provide any evidence as to if the same results would have been obtained had the students been required to go beyond short paragraphs and write longer essays.
Moreover, although both the afore-mentioned studies adopted a mid-focused approach and targeted a relatively limited number of errors, their main criterion for choosing the target errors was the learners’ most prominent or frequent errors rather than the pre-selection of the errors based on specific criteria such as their seriousness and complexity and how they would contribute to the text quality at least as far as the linguistic dimensions were concerned. According to Hartshorn et al. (2010, p. 89), the ‘ultimate quality [of a piece of writing] must be evaluated by its overall communicative effect.’ Bitchener and Ferris (2012) lament about the scarcity of WCF studies that target such complex structures. Ferris (2010), too, argues that WCF should target ‘more complex, more problematic errors student writers make: those that obscure meaning and interfere with communication’ (pp. 192–193).
2 Revision
Another unresolved issue in WCF studies is the impact of revision on the improvement of L2 learners’ written accuracy. Revision has been shown to significantly contribute to the uptake and retention of the linguistic forms treated by WCF (Chandler, 2003; Ferris, 2004; Liu & Brown, 2015). Surprisingly, however, very few studies have investigated the extent to which revision practices promote the effectiveness of feedback. Liu and Brown (2015), in their synthesis of research on WCF, report that only 55% of the L2 writing studies they have reviewed have required the learners to revise their writing and 75% of the studies have not mentioned if they had trained the students to revise their writing. Ferris (2014) argues that in spite of the importance of revision, there is not enough evidence confirming whether or not requiring students to revise after feedback can help students write more accurate essays in the long run.
van Beuningen et al. (2012) is one of the very rare studies that has systematically investigated the impact of revision on the reduction of L2 learners’ writing errors in a new text. They investigated, among others, the effect of revision after feedback on the reduction of L2 students writing errors in subsequent essays. They compared the new texts written by four groups of participants after writing and revising their first essays: the groups consisted of teacher’s indirect WCF with revision, teacher’s direct WCF with revision, students’ self-correction with revision, and no feedback without revision. The results showed that the revision groups made more improvement than the non-revision ones and those who had received teacher’s WCF made better revisions and wrote more accurate subsequent essays than those who did not revise their writing or revised after self-correction. This study, however, has mixed a number of variables, including WCF type, students’ proficiency level, and the source of WCF (teacher or student). Hence, one cannot definitely attribute more accurate subsequent essays to the revision only.
3 Long-term effect of WCF
Another important issue insufficiently addressed in WCF research is the long-term effect of WCF. In fact, the majority of feedback studies, particularly those that have aimed at improving overall written accuracy, rather than acquisition of a single form, have followed the feedback effect in the revision of an essay rather than in the subsequent essays written over time. The findings of these studies, however, do not necessarily indicate if the learners have become conscious of their errors and would not repeat them in the subsequent essays (see Truscott, 2007). Most of the studies that have focused on WCF effect on subsequent essays over time have used a pre-test–treatment–post-test design; the time span between the pre-test and the post-test has normally ranged from 10 weeks to 15 weeks during which the participants have written a few essays the last of which has been considered as post-test (e.g. Fazio, 2001; Ferris, 2006; Hartshorn et al., 2010; Karim & Nassaji, 2018; Kepner, 1991; Polio, Fleck, & Leder, 1998; Rahimi, 2009; Robb, Ross, & Shortreed, 1986; Semke, 1984; Sheppard, 1992). This design, nonetheless, does not precisely show the long-term effect of WCF. Bitchener (2008) contends that, in order to study the long-term effect of WCF and its retention over time, we need a study design that includes a delayed post-test. Such a design requires a delay period between the post-test and the delayed post-test, during which no feedback is provided on the students’ writings. Very few studies, mainly SLA oriented ones, have adopted such a design (see, for example, Ellis et al., 2008). Ferris (2004, p. 51) suggests that ‘it is extremely rare for researchers to compare “correction” versus “no correction” in L2 student writing’, because such a design may put teachers in an ‘ethical dilemma’ of whether or not ‘to withhold it [WCF] from their students’ and also cause negative reactions from the students, who expect teachers to provide WCF on their errors.
An alternative approach would be to investigate multiple instances of the feedback effect over an extended period of time. There are only a few studies that have adopted such a design. For instance, Bitchener et al. (2005) studied the effect of different types of WCF over a 12-week period. They studied the improvement of the students’ accuracy of the past tense, the definite article, and prepositions four times during the experiment. The results revealed that the overall improvement of the participants’ accuracy varied across the four times of writing. In other words, ‘a linear and upward pattern of improvement’ was not observed from one essay to the next. Fazio (2001), too, investigated the effect of different types of WCF on the improvement of written accuracy of high school learners of French as a second language. The study was conducted over five months and students’ written accuracy was compared at three different times during the treatment. The results showed no improvement in the accuracy of the target forms during the treatment. Due to the scarcity of research on the differential effect of WCF over a short and a long period of time and the conflicting results of the existing research, there seems to be an urgent need for further studies investigating the impact of WCF over time.
4 Effect of WCF on writing quality
The contribution of WCF to writing quality is one of the highly under-researched areas in WCF studies. High linguistic ability has been considered a significant component of L2 writing ability, so a more linguistically accurate writing may be rated as being of a higher quality (Cumming, 1989; Manchón, 2011; Weigle, 2002), as long as quality is tied to language control. Hence, in an L2 writing classroom, WCF should contribute to promoting the communicative effectiveness of writing through improving the learners’ linguistic accuracy (Hartshorn et al., 2010). To this date, only a few studies have addressed this issue (Hartshorn et al., 2010; Kepner, 1991; Robb et al., 1986; Semke, 1984). For instance, Semke (1984) found that WCF had no significant effect on the students’ German writing skill, evaluated in terms of written accuracy, fluency, and general language proficiency. Robb et al. (1986) showed that WCF on sentence level could help students with editing their texts but it did not contribute to the overall quality of writing.
The studies reviewed above, however, have failed to show whether reported quality improvements are due to the changes made by students in response to teacher WCF. They have basically focused on the improvement of overall quality of writing, which may as well be due to the improvement in content, coherence, and organization. It is not clear if or how much of this improvement can be attributed to the improvement in linguistic accuracy in response to WCF. There is, hence, a need for a study that explores if overall writing quality, at least, as long as the clarity of expression and communicating meaning are concerned, can be tied to the improvement of language control that results from teacher feedback.
II The study
1 Objective of the study
To address the above-mentioned issues and controversies in WCF research, the present study aims at comparing the impact of focused vs. comprehensive WCF and revision on reducing L2 writers’ sentence and word errors at two time intervals over the course of one academic semester. The study also aims to explore whether the changes made in the subsequent essays in response to these different WCF methods would improve the students’ writing quality in terms of clarity of meaning and comprehensibility of the content. As a secondary objective, the study investigates whether comprehensive WCF (with and without revision) contributes to improving L2 learners’ overall written accuracy. To this end, the study compares the impact of comprehensive and focused WCF on reducing the learners’ errors not targeted by focused WCF, too.
2 Design
The study follows a time-series design, investigating the impact of WCF at three different time intervals. As mentioned above, one of the shortcomings of L2 writing-oriented WCF studies is that they have majorly failed to investigate the WCF effect over time and at different time intervals. Hence, the present study focused on the students’ subsequent essays written almost in the middle and at the end of the study as the first post-test and the second post-test, respectively, to study multiple instances of the feedback effect over time. According to Hatch and Lazaraton (1991, pp. 93–94): Even though we offer students instruction and they are able to perform well during the instruction period, they may not truly internalize the material unless it is recycled over a fairly long period of time. Since this is the case, longitudinal time-series studies are frequently used to discover how long it takes students to reach the goal and the amount of variability of performance along the way.
3 Research questions
This study was designed to answer the following research questions:
Is focused WCF more effective than comprehensive WCF in reducing the participants’ word and sentence errors?
Does requiring the participants to revise their writing after WCF help them reduce their word and sentence errors?
Is comprehensive WCF more effective than focused WCF in improving the participants’ overall written accuracy?
What WCF approach (focused with and without revision or comprehensive with and without revision) makes a stronger contribution to improving the participants’ writing quality?
III Method
1 Setting and participants
The study was conducted in a natural writing classroom setting. The participants were 78 intermediate French Canadian learners of English as a second language (ESL) of both sexes (28 males and 50 females) with an average age of 22, who had enrolled in four sections of an academic writing course in a Canadian university. The classes were randomly assigned to four groups. The first group received WCF on all their errors and were required to revise their writing after receiving WCF (the comprehensive-revision group) (n = 19); the second group, too, received comprehensive WCF but were not required to revise their writing (the comprehensive-non-revision group) (n = 17); the third group received WCF on their word and sentence errors only and were required to revise their writing after receiving WCF (the focused-revision group) (n = 21); and the last group received focused WCF, but were not required to revise their writing after receiving WCF (the focused-non-revision group) (n = 21). They had all passed two English courses (Reading and Writing 1 & 2), which focused on grammar, reading, and paragraph writing. In order to ensure that the students of the four groups were at the same level of proficiency, an Oxford Quick Placement Test was administered. The results indicated mean scores of 42, 41.64, 42. 26, and 42.13, for comprehensive-revision, comprehensive-non-revision, focused-revision, and focused-non-revision, respectively. These scores are roughly equivalent to a B2 level on CEFR or a 72 level on TOEFL iBT. The results of a one-way ANOVA test showed no significant difference between the four groups (F = 1.13, p > 0.05). The classes were taught by four different teachers: the researcher, an experienced writing instructor, and two TAs with a long experience in teaching writing. The researcher was the coordinator of the writing courses.
Another group of participants were four writing raters who had a long experience (at least 10 years) in using different rubrics for scoring writing. I employed these raters to score the essays in terms of their quality to eliminate the impact of the instructors’ impression on their evaluation of the writings.
2 Target linguistic errors
As mentioned above, it has been recommended that further research on WCF should focus on the linguistic forms that are ‘more salient and semantically functional’ (Shintani et al., 2014) and that contribute more to the communicative effectiveness of students’ texts (Ferris, 2010). Hence, the present study, following a mid-focused approach, focuses on two of the five error categories introduced by Ferris and Roberts (2001), namely word and sentence structure errors (The other error categories include noun ending, article, and verb). According to Ferris et al. (2013), word and sentence errors create serious and global issues that may lead to the obscurity of expressions and, thereby, incomprehensibility of students’ texts. These two error categories can occur in the form of treatable (rule-governed errors) and untreatable errors (idiosyncratic errors) (see Ferris, 1999); untreatable sentence errors are mainly the errors in sentence structure (typically, word order) and untreatable word errors mainly arise from wrong and inappropriate word choice. Other errors in these two categories include subject-verb agreement error, run-ons, and wrong preposition, which are considered treatable.
Regarding the word order errors, it must be noted that although they have been labeled as untreatable, some might seem rule-governed; for instance, adjectives before noun, or the structure of relative clauses. However, student writing errors of these structures tend to be much more idiosyncratic and cannot be explained or fixed by simply saying ‘subjects come before verbs’ or ‘adjectives come before nouns’ (Ferris, personal communication). For instance, the ordinary rules cannot explain the word order in phrases like ‘an event important enough to report’ or ‘Below come some examples of . . .’. Moreover, as Shintani et al. (2014) argue, ‘what constitutes “treatability” is not just a question of whether or not a feature is rule-based but also the complexity of the rule-based structure’ (p. 123).
3 Writing quality in the present study
Writing quality in the present study mainly refers to the extent to which linguistic accuracy contributes to the clarity of meaning and expressions; this is, in fact, only a part of overall writing quality as measured by standard writing scoring rubrics such as TOELF and IELTS.
4 Materials and instruments
The materials of the study were three out of seven argumentative essays written by the participants of the study during a 15-week academic semester. These were written in week one (time one essay, T1), week eight (time two essay, T2), and week 14 (time three essay, T3). In order to make the essays comparable in terms of difficulty, care was taken to choose the texts and the prompts all from the same area, that is, education. Also, the writing prompts required the students to do similar writing tasks. The prompts were evaluated by three judges, who were experienced writing instructors; they all believed that they were comparable in terms of linguistic and cognitive complexity.
To score the writing quality, I used an adapted version of TOEFL iBT writing scoring rubric. I only used the linguistic dimension of the rubric. The descriptors of this dimension describe how linguistic properties of the essay contribute to text meaning and comprehensibility (see Appendix 1). The reason for choosing this rubric was that it is a holistic scoring rubric with relatively simple and easy to use descriptions, which was more practical for the raters to score a large number of papers. More importantly, it is a rubric used to score a standardized high-stakes writing test, so it can be considered a valid and reliable scoring rubric. The writing evaluated by this rubric receives a score of five to zero, which is then converted to a 30-point scale.
5 Treatment and procedure
The course focused on critical reading and text-based writing. The students were required to critically read short texts on controversial topics and then write argumentative essays in response to the texts. The classes were held once a week (15 weeks on aggregate with the last class meeting utilized as the final exam session), each class lasting about 165 minutes. In the first class session, the students in all the groups were asked to write an argumentative essay of 500–550 words in response to a short text that discussed whether or not the students should be allowed to use their cellphones at school. They were given a short text on pros and cons of using cellphones at school and were asked to use and react to the ideas in the text. This writing served as the diagnostic essay (T1), providing information about the participants’ level of writing ability as well as their error means in the two target forms.
In weeks two and three, the classes focused on how to read a text critically and write an argument in response to the author’s ideas. Starting from week four through week 14, in each class meeting, the students of all the four groups read a short text on a controversial topic, annotated it, and then wrote an argumentative essay in response to the author (seven essays on aggregate). The students wrote on paper and were allowed to use dictionaries. They spent almost all the class time on writing their essays, but those who finished earlier were allowed to leave the class. The average length of the essays were 500 words.
After writing each essay, they handed it in to the course instructor, who gave coded WCF on them. Coding was done based on Ferris and Roberts’ (2001) error category scheme including verb errors (VE), noun ending errors (NE), article errors (AE), wrong word (WW), and sentence structure errors (SSE). WCF for the two comprehensive groups (comprehensive-revision and comprehensive-non-revision) targeted all the learners’ errors, while for the focused-revision and the focused-non-revision groups, it focused only on word and sentence structure errors. All the students received comments on the content and organization of their writing, too. Then, in the following session, the instructors returned the papers to the students; in that session, the instructors and the students in the two revision groups (focused and comprehensive) briefly discussed the topic and reviewed the text based on which they had written the essay and then the students revised their writing in response to the instructor’s WCF. They then turned in the revisions to the instructor for a second round of review. Depending on the number of errors and how serious they were, some of the students were required to write a second revision, this time at home.
The two non-revision groups, after submitting their first draft, were given some time (around 15 minutes) to have a look at their teacher’s WCF and ask questions (if any) about the comments; then, similar to the revision groups, they briefly discussed the topic and reviewed the text on which they had written their essays. Since these students were not required to revise their writing, the rest of the class time was devoted to reading another text on the same topic. They then had some pair and group discussions and did some reading comprehension exercises. Table 1 provides a detailed account of the timeline and essay prompts.
Essay prompts and timeline.
IV Data analysis
1 Coding and scoring reliability
Before running any statistical analysis, in order to ensure that all the instructors were familiar with the error categories, I randomly selected five T1 essays and we all provided them with coded WCF. I then had a meeting with the three instructors to review the error categories and the coding procedure. During this session, we compared the comments and the codes and discussed the existing differences to ensure error-coding consistency. In order to ensure that the same procedure was followed in all the four classes, I had regular bi-weekly meetings with the three other teachers and reviewed the course outline and method of instruction with them. I also observed their classes from week two to week seven and notified any deviation from the planned instruction. Then, in order to ensure inter-coder reliability, we all provided WCF on 40% of the essays randomly selected from T1. I then calculated the Cohen’s Kappa inter-rater reliability; the acquired index was 0.86, which is an acceptable agreement index (Landis & Koch, 1977). I followed the same procedure for T2 and T3 essays. The acquired indices were 0.91 and 0.90, respectively.
As for evaluating the quality of the writings, after the collection of all the essays, I had a meeting with the four raters and introduced the rubric to them. Although they were already familiar with the rubric, similar to what I had done for coding the errors, I randomly selected five papers and asked them to score these essays based on the rubric. Then, we had a meeting during which we reviewed the scores and discussed any conspicuous discrepancy observed in scoring.
Subsequently, in order to ensure inter-rater consistency for scoring the quality, after collection of all the essays, I randomly selected 40% of the pre-test papers and asked all the raters to score them. I then calculated the Cohen’s Kappa inter-rater reliability; the acquired index was 0.83, which is a high agreement index (Landis & Koch, 1977). I followed the same procedure for T2 and T3 essays. The acquired indices were 0.84 and 0.82, respectively. Then the remaining essays (T1, T2, and T3) were randomly distributed between the raters to score. Neither in the reliability check stage nor later while scoring the other essays, the raters knew about the order in which the essays had been written or who had written them; the essays had been identified by codes (for excerpts of the essays with different levels of quality, see Appendix 2).
2 Statistical analyses
I calculated the errors of T1, T2, and T3 essays for all the groups based on the procedure suggested by Biber, Conrad, and Reppen (1998); that is, I divided the error counts by the number of words in each of the student’ texts and then multiplied them by a standard number (500) representing the average number of words in the students’ essays.
In order to compare the impact of focused vs. comprehensive WCF and revision on the participants’ improvement of written accuracy on the two targeted error categories, I ran two linear mixed-effects models analysis (one for word and one for sentence errors). In this analysis, revision and focused WCF were the independent variables each with two levels (revision vs. no revision and focused vs. comprehensive WCF) and the students’ sentence and word error means at T1, T2, and T3 essays were considered the dependent variables. Also, in order to see if comprehensive WCF would help the learners improve their overall written accuracy, I used repeated measures ANOVAs to compare the participants’ error means of the three non-target categories (verb, article, and noun ending) across the three times for all the four groups. Besides, three one-way ANOVAs were run between the quality scores of the four groups on the three tests to see which feedback group made more progress in their writing quality.
V Results
1 Impact of focused vs. comprehensive WCF and revision on improving word errors
Descriptive statistics for the word error means of the four groups on T1, T2, and T3 essays are presented in Table 2. Table 2 shows that the word error means of the four groups at T1 are very close; results of a one-way ANOVA test indicated no significant difference between these error means (F = 0.43, p > 0.05). The acquired means at T2 and T3 show that all the groups have decreased their word error means. In order to see which one of the target fixed factors (focus, and revision) contributed to the reduction observed in the means, a linear mixed-effects models statistical analysis was run.
Word error means of the four groups.
Notes. * Normalized frequencies per 500 words. CR = comprehensive-revision; CNR = comprehensive-non-revision; FR = FOCUSED-revision; FNR = focused-non-revision.
Results are presented in Table 3. As Table 3 shows, focus (F = 4.03, p < 0.05) turned out to significantly contribute to the reduction of word errors during the experiment; no significant effect was observed for revision, though (F = 2.33, p > 0.05).
Results of mixed-effects models.
Since only focus effect was significant, three follow-up independent t-tests were run between the error means of focused (revision and non-revision together) and comprehensive (revision and non-revision together) groups on T1, T2, and T3 essays to see where the difference lies. Results are presented in Table 4. The results indicate that only the difference between T3 means of the two groups is significant (t = 2.36, p < 0.05). That is, the focused groups made significantly fewer errors on T3 essay than the comprehensive groups. The effect size, calculated by Cohen’s d formula for the difference is a small one (0.61) (see Plonsky & Oswald, 2014).
T-test for the difference between focused and comprehensive groups for word errors.
2 Impact of focused vs. comprehensive WCF and revision on improving sentence structure errors
The same analyses were carried out for the sentence errors. Table 5 shows the descriptive statistics for the four groups on the three tests. A one-way ANOVA test was run between T1 sentence error means of the four groups to see if they were similar at the beginning of the experiment; the results showed no significant difference between the four groups (F = 0.23, p > 0.05). The acquired means at T2 and T3 shows that all the four groups have decreased their sentence error means. Similar to the word errors, a mixed-effects models analysis was run to examine the effect of focus and revision.
Sentence error means of the four groups.
Notes. * Normalized frequencies per 500 words. CR = comprehensive-revision; CNR = comprehensive-non-revision; FR = focused-revision; FNR = focused-non-revision.
Results are presented in Table 6. These results reveal a significant effect for both focus (F = 16.85, p < 0.001) and revision (F = 37.70, p < 0.01). Hence, I ran two one-way ANOVAs to see if there was a significant difference between the four groups at T2, and T3. The results indicated a significant difference both in T2 (F = 19.82, p < 0.001, η2 = 0.41) and T3 (F = 38.24, p < 0.001, η2 = 0.61). The results of a Tukey’s post hoc analysis for T2 essays revealed that the mean for the focused-revision group was significantly lower than those of the focused-non-revision (t = 3.83, p < .05, d = 1.02) and the comprehensive-non-revision (t = 5.71, p < .001, d = 1.53) groups both with large effect sizes; the other differences were not significant. At T3, the sentence errors mean of the focused-revision group was significantly lower than the focused-non-revision group (t = 9.55, p < .001, d = 2.55), the comprehensive-revision group (t = 7.07, p < .001, d = 1.89), and the comprehensive-non-revision group (t = 9.80, p < .001, d = 2.62); the effect sizes were larger than those of T2. Also, there was a significant difference between the comprehensive-revision and the comprehensive-non-revision groups (t = 3.19, p < .05, d = .85) with a medium effect size (see Plonsky & Oswald, 2014).
Results of mixed-effects models.
3 impact of comprehensive WCF on improving the participants’ overall written accuracy
As mentioned above, the two comprehensive groups, in addition to receiving WCF on the two target forms, received WCF on other error categories (i.e. verb, noun ending, and article errors), as well. In order to see if comprehensive WCF helped the learners reduce these errors, I calculated the means for the errors in these categories for both comprehensive and focused groups; then, I ran four repeated measures ANOVA to see which group was more successful in reducing their errors in these categories.
Table 7 presents descriptive statistics. The data show that all the four groups have generally reduced their errors from T1 to T3. However, the results of repeated measures ANOVAs revealed that only some of the observed reductions are significant. The comprehensive-revision group significantly reduced their verb (F = 4.01, p < 0.05, η2 = .22) and noun ending errors (F = 2.49, p > 0.05, η2 = .26) while the focused-revision and the comprehensive-non-revision groups significantly reduced their noun ending errors only (F = 3.19, p < 0.05, η2 = .17 and F = 3.41, p < 0.05, η2 = .20, respectively); none of the other differences turned out to be significant. The results of one-way ANOVAs revealed a significant difference between the noun ending error means of the three groups at T3 only (F = 3.54, p < 0.05, η2 = 15) and the post hoc Tukey’s test revealed that the comprehensive-revision group had a significantly lower error mean than the other two groups.
Error means of verb, noun ending, and article categories (M* with SD in parentheses).
Notes. * Normalized frequencies per 500 words. CR = comprehensive-revision; CNR = comprehensive-non-revision; FR = FOCUSED-revision; FNR = focused-non-revision.
The findings pertinent to the two target forms, reported in the previous section, indicated that the focused groups in general outperformed the comprehensive groups in the reduction of their words and sentence errors. However, in order to provide a clear picture of the effect of comprehensive WCF on the participants’ overall written accuracy, I had to see if they made any improvement in the accuracy with which they used the two target forms, too. To this end, I ran repeated measures for the sentence and word errors means of the two comprehensive groups (already reported in Tables 2 and 5). The results showed that only the comprehensive-revision group significantly reduced their errors in these two categories from T1 to T3 (F = 12.39, p < .05, η2 = .49, for sentence errors, and F = 4.76, p < .05, η2 = .25, for word errors).
4 Impact of focused vs. comprehensive WCF and revision on improving writing quality
Descriptive statistics for the writing quality of the four groups on the three tests are illustrated in Table 8. As the results show, the mean for the writing quality of all the four groups in T1 is around 20; a one-way ANOVA test showed no significant difference between the means at T1 (F = 0.66, p > 0.05). Two more one-way ANOVA’s were run for the differences between the quality scores of the four groups at T2 and at T3. The results showed that the differences at both T2 (F = 3.15, p < 0.05, η2 = 0.14) and T3 (F = 24.42, p < 0.001, η2 = 0.57) were significant. The results of a post hoc Tukey’s tests showed that at T2, only the difference between the focused-revision and the comprehensive-non-revision groups was significant (t = 2.94, p < .05, d = 78), with a medium effect size. At T3, on the other hand, the mean score of writing quality for the focused-revision group was significantly higher than those of the comprehensive-revision group (t = 3.36, p < .05, d = .87), with a medium effect size, the focused non-revision group (t = 10.09, p < .001, d = 2.60), and the comprehensive-non-revision group (t = 7.53, p < .001, d = 1.94), both with a high effect size. The mean for the comprehensive-revision group, in turn, turned out to be higher than that of the comprehensive-non-revision group (t = 3.54, p < .05, d = 91), with a medium effect size.
Descriptive statistics for writing quality of the four groups.
Notes. CR = comprehensive-revision; CNR = comprehensive-non-revision; FR = FOCUSED-revision; FNR = focused-non-revision.
VI Conclusions
1 Discussion and implications
The present study is one of the first attempts at comparing the effectiveness of focused WCF with comprehensive WCF in helping L2 learners improve their written accuracy over time when WCF targets learners’ global and complex errors (word and sentence errors); the study also investigated the extent to which revision contributes to these two different feedback approaches. Another aim of the present study was comparing the contribution of these feedback approaches to the improvement of the participants’ writing quality, defined in terms of comprehensibility and communicative effectiveness. The study compared the students’ essays three times during the experiment; week one (T1 essay); week four (T2 essay); and week 14 (T3 essay).
The results of the study showed that focused WCF was significantly more effective than comprehensive WCF in helping the students reduce their word errors at T3 only; no significant effect was observed for revision, though. Regarding the sentence structure errors, on the other hand, the results showed that the students who received focused WCF and revised their writing were more successful than those who received comprehensive WCF (with or without revision) both at T2 and T3. The focused-revision group was also more successful than the focused-non-revision group in reducing their sentence errors at T3. No difference was observed between the error means of the focused-non-revision and the comprehensive-revision groups and they both made significantly fewer errors than the comprehensive-non-revision group.
As far as focused WCF is concerned, the results corroborate those of Ferris et al. (2013) and Shintani et al. (2014); both studies showed that focused WCF was more effective than comprehensive WCF in helping the students reduce the targeted errors. Shintani et al. (2014) also showed that focused WCF helped the students retain the WCF effect in the long run. The results are also in line with those of Sheen (2007) and Ellis et al. (2008).
The higher efficacy of focused WCF over comprehensive WCF may be explained by Schmidt’s (1990) noticing hypothesis. Focused WCF can enhance the learner’s attention to and noticing of the error, which is the first stage in the cognitive processing of WCF (see Bitchener & Storch, 2016). This amount and quality of noticing does not happen when learners receive WCF on all or a large number of their writing errors. Ellis et al. (2008, p. 368) argue that ‘A mass of corrections directed at a diverse set of linguistic phenomena (and perhaps also at content and organizational issues) is hardly likely to foster the noticing and cognizing that may be needed for WCF to work for acquisition.’ The results support the idea of Bitchener and Ferris (2012), who suggest that focused WCF is specially more helpful when more complex and more cognitively difficult to process linguistic forms are involved because processing WCF on these errors requires more attention and noticing.
Concerning the impact of revision, the results of the study confirm the findings of van Beuningen et al. (2012) and Shintani et al. (2014) in that revision helped learners write more accurate subsequent essays. Ferris (2004) suggests that ‘the cognitive investment of editing one’s text after receiving error feedback’ is indispensable in the long-term improvement of accuracy (p. 54). Moreover, Based on Schmidt’s (1990) noticing hypothesis, uptake happens when noticing the gap is accompanied by revision. Also, according to Qi and Lapkin (2001), requiring the learners to revise after feedback increases the intensity of noticing and makes it more substantive (noticing with a reason or awareness at the level of understanding), than the noticing that happens right after receiving WCF without any further follow-up. The fact that in the present study the comprehensive-revision group was more successful than the focused-non-revision group in reducing their sentence errors confirms the strong contribution of revision to the efficacy of WCF.
The results of the study also showed the students who received comprehensive WCF and revised their writings were more successful than the other groups in improving their overall written accuracy. That is, the comprehensive-revision group significantly reduced their errors in four out of the five general categories on which they received WCF; they managed to reduce their sentence, word, verb, and noun ending errors. By comparison, the focused-non-revision group reduced their sentence and word errors and the comprehensive-non-revision group reduced their noun ending errors only; the group next to the comprehensive-revision was the focus-revision group, who reduced their errors in three categories (i.e. sentence, word, and noun ending). These findings imply that although a decision to correct all errors rather than just a certain (important) subset of them entails sacrifices in its effectiveness for that subset, as it was the case with the comprehensive WCF, correcting all the students’ errors and requiring them to revise their writing can help them improve their overall written accuracy, which is an aim of many writing classrooms particularly when the focus is on writing to learn language (Manchón, 2011). This finding has important implications for academic programs in which a very limited number of writing courses are offered and it is not possible to ‘take an incremental and systematic approach to WCF so that all major error types can be covered in the writing course/academic year’ (Lee, 2018, p. 13). Hence, teachers have to address a larger number of errors in their feedback; in such cases revision helps to increase the students’ degree of noticing their errors.
As for the contribution of WCF to students writing quality, the results of the present study showed that, at T2, the focused-revision group wrote essays of higher quality in terms of clarity of meaning and comprehensibility of their writing than the comprehensive-non-revision group only; no difference was observed between the other groups. At T3, on the other hand, the focused-revision group wrote essays of higher quality than all the other three groups. At this point, the comprehensive-revision class had a better performance than the focused-non-revision and the comprehensive-non-revision ones. This last finding, once more, confirms the important role of revision because the comprehensive group who revised their writing had a better performance than both the focused group and the comprehensive group who did not revise their texts.
The results are different from those of Robb et. al. (1986) and Semke (1984) in that they found that the reduction in students’ writing errors after receiving WCF did not contribute to their writing quality. Results are to some extent in line with those of Hartshorn et al. (2010) in that, in the study, improvement in the participants’ written accuracy after feedback contributed to their writing ability scores. However, as mentioned earlier, these studies have focused on overall quality of the essays after the application of WCF, which cannot exactly associate the improvement observed in writing quality with the decline in linguistic errors.
The fact that focused-revision students wrote essays of higher quality than the other three groups at T3 only might be indicative of the idea that, although at T2 (after a short time of receiving feedback), too, the focused-revision group wrote significantly more accurate essays than the other groups (at least as far as sentence structure errors are concerned), this improvement in their written accuracy was not large enough to contribute to the communicative effectiveness of their writing. In fact, this group, as a result of receiving focused feedback on their global errors only and having more opportunities than the other groups to practice the target forms through revising, became more aware of their difficult weak spots and paid more attention to them; hence, they were able to reduce those errors over time, which promoted the clarity of the ideas they expressed as well as the comprehensibility of their writing, which, in turn, improved their writing quality. An important implication of this finding is that in a natural L2 writing classroom, where we have a relatively limited time with a high class size and, where the main objective of the course is learning to write (see Manchón, 2011), teacher’s feedback must focus a priori on the errors that make a more significant contribution to text clarity and comprehensibility. Elimination of these errors would promote the communicative effectiveness of writing, which is an important, or perhaps the most important, aspect of writing quality. This approach to feedback would be more compatible with the demands of L2 writing teachers and students who wish for the WCF feedback that is more manageable and at the same time effective. This will promote the ecological validity of WCF, which is missing in many feedback methods and strategies.
2 Limitations and suggestions for further research
The present study adds to previous research by shedding more light on the comparison between the efficacy of focused and comprehensive WCF. However, a few notes of caution are due here.
One limitation of the present study is that, although the results of the study showed WCF that focuses on global and complex linguistic issues is more helpful in reducing the students’ errors on these categories than the one that addresses all the students’ errors, it did not show if such a feedback would contribute to the student’s writing complexity and overall quality. Truscott (2007) and Skehan (1998) argue that there is a trade-off between accuracy and complexity. Polio and Shea (2014, p. 24), on the other hand, argue that ‘Accuracy is certainly part of writing quality but errors may be more salient than complexity, so it may not be obvious to teachers or their students that language [complexity] is improving.’ Housen, Kuiken, and Vedder (2012) also argue that although complexity, accuracy, and fluency are considered distinct aspects of L2 proficiency, it ‘does not exclude the fact that they can be interrelated and that they may interact in the processes of L2 production and L2 development’ (p. 7). They refer to the evidence indicating that these features of L2 production interact with each other and that they are ‘mutually supportive’ and, at the same time, may be competitive. Hence, more research is needed to assess the impact of focused WCF on learners’ writing complexity.
Another issue that must be addressed here is that, in the present study, quality was assessed in terms of clarity of expression of ideas and comprehensibility of writing; however, McNamara Crossley, and McCarthy (2010, p. 63) contend that ‘more complex syntax, greater lexical diversity, and less frequent words may be reflective of more sophisticated, skilled language production,’ although such a text is more difficult to comprehend. This is, to some extent, different from the criteria against which quality was evaluated in the present study; that is, the descriptions in the rubric focused on how clearly the ideas have been presented and whether or not meaning is obscured. Hence, further research is needed to find out if this kind of feedback would equally promote writing quality seen from this perspective (i.e. using more sophisticated structures and words). Furthermore, as discussed above, the focused groups, especially the focused-revision group, reduced their errors in other categories not addressed by WCF and, as a result, improved their overall written accuracy. Hence, we are not sure if the raters scored the writings of this group only based on the extent to which they effectively communicated their ideas or they were, at least partially, influenced by the linguistic accuracy of the texts. Rezaei and Lovorn (2010) have shown that even when using a rubric that mainly focuses on the content of writing, raters might be influenced by grammatical features of student writing as well. Hence, further research must tackle this issue through conducing stimulus recall interviews with the raters to find out what features of the texts they take into consideration while rating writing quality.
A final note of caution that must be made is that a tentative analysis of the effect sizes for the differences between T1 and T3 errors and writing quality for each group showed high gains, especially for the focused-revision group (over 2 for sentence and word errors and over 4 four writing quality). These effect size values are much higher than what has already been reported in the literature. Although such high gains might be partially explained by the high degree of noticing due to focused WCF, already discussed above, a more detailed analysis of the nature of each group’s errors in the first essay (whether they were treatable or untreatable) could also explain such high gains. WCF focusing on treatable errors might lead to higher gains than when WCF focuses on untreatable errors or when it targets all student errors. Of course, any claims with respect to the efficacy of one type of WCF over another can be made when the gains are compared with those of a control group who receives no WCF. Hence, further research, replicating the present study, but including a non-feedback group in the design and analysing in more details the nature of the errors targeted by WCF, will need to be undertaken.
Footnotes
Appendix 1. Quality scoring rubric
| 5: Occasional language errors that may be present do not result in inaccurate or imprecise presentation of content or connections. |
| 4: The essay is scored at this level if it has more frequent or noticeable minor language errors, as long as such usage and grammatical structures do not result in anything more than an occasional lapse of clarity or in the connection of ideas. |
| 3: The errors of usage and/or grammar may be more frequent or may result in noticeably vague expressions or obscured meanings in conveying ideas and connections. |
| 2: The response contains language errors or expressions that largely obscure connections or meaning at key junctures, or that would likely obscure understanding of key ideas for a reader not already familiar with the reading and the lecture. |
| 1: The language level of the response is so low that it is difficult to derive meaning. |
| 0: The essay is written in a foreign language, consists of keystroke characters, or is blank. |
Appendix 2: Excerpts from a below and an above average quality essay
Acknowledgements
I would like to thank Professor Dana Ferris for her help, support, and encouragement. I would also like to thank Dr. Mahboobeh Saadat and Mr. Ali Kushki for their insightful comments on the earlier version of the article. Last but not least, I wish to express my gratitude to the two anonymous reviewers for their insightful and encouraging comments. Any remaining errors are, of course, mine.
Conflict of Interest
Also, if this study is part of a larger study or if you have used the same data in whole or in part in other papers, both already published or under review please state where the paper is published and describe clearly and in as much detail as you think necessary where the similarities and differences are and how the current manuscript makes a different and distinct contribution to the field.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
