Abstract
Collaborative writing (CW) has been proven advantageous to enhance the second and foreign language skills of university students. However, little research to date has explored whether CW practices are fruitful for secondary school learners in foreign language (FL) contexts, a population characterized by low language proficiency levels, and few opportunities to engage with the FL. The present classroom-based study examines CW in this setting and aims to determine whether CW fosters language opportunities, operationalized as language-related episodes (LREs), which will allow learners in low-input scenarios to compose better texts. Two parallel intact classes were studied: a control group (n = 16) which produced an argumentative essay individually, and an experimental group (n = 16) which did so in pairs while recording their interactions. The findings revealed that the pairs produced shorter but more accurate and slightly more lexically and grammatically complex texts and obtained higher scores in content, structure and organization. Collaboration afforded students the opportunity to pool ideas, deliberate over language use, and provide each other with feedback (collective scaffolding). Most importantly, collaborating seemed to be beneficial for all intermediate secondary learners and, thus, a useful strategy for improving FL writing skills in the secondary school context.
Keywords
I Introduction
Negotiating for meaning is considered essential for second language (L2) learning (e.g. Long, 1996; Pica, 1994; Swain, 1985) and offering learners opportunities for authentic discussions seems therefore paramount for effective language learning. Foreign language (FL) settings have been characterized as fundamentally different from L2 contexts in terms of both the quantity and the quality of input (García Mayo & García Lecumberri, 2003; Muñoz, 2006), as well as in the opportunities to engage with the FL in and outside the school context, because contact with the FL is mostly confined to the actual language class (Lasagabaster, 2008). In practice, exposure to the FL seems to be further reduced, since not all classroom talk is undertaken in the target language (Eurydice, 2012) and teacher talk still dominates in the classroom (Ball, Kelly, & Clegg, 2015). However, a mere increase in the quantity and quality of input has been suggested to be insufficient to develop language skills (García Mayo, 2008; García Mayo & Villarreal, 2011), while greater FL use has been associated to higher FL proficiency scores (European Commission, 2012). These findings emphasize the need to foster learner-centred practices where students are actively engaged in the creation of knowledge and take responsibility for their own learning (Manzano Vázquez, 2015). Consequently, it seems that opportunities for meaningful exchanges in the FL need to be pursued and boosted to raise the current low FL proficiency levels of secondary school learners (European Commission, 2012).
Collaborative tasks place the learner at the centre of the learning process and have been claimed to promote learners’ autonomy and genuine involvement, to foster interaction and knowledge co-construction and to increase speaking time (McDonough, 2004; Storch & Wigglesworth, 2007). They also offer ample opportunities for genuine negotiation of meaning and have been claimed to enhance students’ oral skills in L2 contexts (for a review, see Storch, 2013). Likewise, a growing number of studies have suggested that collaborative written tasks such as dictoglosses, text reconstruction tasks, or jointly written essays afford language learning opportunities for university L2 (see Storch, 2005, 2011) and FL students alike (e.g. Azkarai & García Mayo, 2015; Fernández Dobao, 2012; Shehadeh, 2011). However, research supporting those benefits in secondary school contexts is scarce (Basterrechea & García Mayo, 2013; Kim & McDonough, 2011; Kuiken & Vedder, 2002) and mixed results have been reported. Consequently, in order to determine whether collaborative written tasks boost the language skills of secondary learners of English as a foreign language (EFL) and their validity as effective pedagogical tools, the current study examines the process and product of a collaborative writing experience. In particular, it analyses the collaborative dialogues generated and contrasts pair and individual drafts in terms of complexity, accuracy and fluency as well as in holistic ratings of content, organization, structure and register.
II Literature review
1 Collaboration in the L2 classroom
Collaborative writing (CW) refers to the co-construction or co-authorship of a text (Storch, 2011) and has recently attracted the attention of researchers and instructors, as pair and group work have become central to the language classroom (Shehadeh, 2011; Storch, 2005). However, many teachers have been reluctant to implement partner activities in their classrooms because they are challenged by assessment issues (McDonough, de Vleeschauwer, & Crawford, 2018), unsure about how to best pair students (Storch & Aldosari, 2013), monitor equal participation (McDonough & García Fuentes, 2015) or because learners may use their first language (L1) (Brooks & Donato, 1994). In fact, when collaboration has been introduced, it has been generally for brainstorming ideas prior to the writing activity itself, or for obtaining feedback from the teacher or peers on the drafted or completed text (Storch & Wigglesworth, 2007).
Collaborative activities increase learners’ speaking time and promote autonomy and involvement while they reduce anxiety, and thus boost learners’ confidence (McDonough, 2004). This increase in speaking time is particularly important in secondary classes because teacher talk still dominates in the classroom (Ball et al., 2015), and therefore, tasks which encourage students’ output become crucial for negotiation of meaning.
The social constructivist perspective of learning (Vygotsky, 1978) establishes that cognitive and linguistic development occur through scaffolded interaction. Additionally, L2 research has demonstrated that peers can provide carefully attuned support and push learning (Donato, 1994; Guerrero & Villamil, 2000; Ohta, 2000). Swain (2000, 2006) conceives interaction as a ‘collaborative dialogue’ where learners engage in a joint problem solving and knowledge building activity which is mediated by language (‘languaging’ in Swain’s terms (Swain, 2006)). ‘Languaging’, operationalized as language related episodes (LREs), refers to episodes where there are interactional modifications as a result of students’ attention to form (Schmidt, 1990). Through ‘languaging’, students ‘pool their knowledge sources and co-construct new language or consolidate knowledge’ (Storch, 2013, p. 17). Thus, LREs represent L2 learning in progress (Swain & Lapkin, 1998, 2001; Swain & Watanabe, 2012).
2 Language related episodes (LREs) in collaborative writing
Analysis of pair talk in CW has revealed that collaboration provides students with the opportunity to give and receive immediate feedback on language (Storch & Wigglesworth, 2007), mostly on grammar and vocabulary choices (Fernández Dobao, 2012; Mozaffari, 2016; Storch & Aldosari, 2013). Negotiations over language forms (LREs) while co-constructing meaning are considered a source of language learning and development (Bueno-Alastuey, 2013; Storch, 2013; Swain & Watanabe, 2012) as they mediate different processes such as the gain and co-construction of new knowledge or the consolidation of existing knowledge (Storch, 2013; Swain & Watanabe, 2012). Various factors have been recognized as influencing the nature and quantity of such dialogues including L2 proficiency (Kim & McDonough, 2008; Leeser, 2004), engagement level (Storch & Aldosari, 2013), task type (Alegría de la Colina & García Mayo, 2007; Azkarai & García Mayo, 2015; Swain & Lapkin, 2001), pairing method (Mozaffari, 2016) and number of participants (Fernández Dobao, 2012).
Studies have also examined the outcome of the LREs generated according to their resolution (Leeser, 2004; Swain, 1998). Overall, it has been shown that most LREs are resolved correctly (Alegría de la Colina & García Mayo, 2007; Fernández Dobao, 2012; Storch & Aldosari, 2013). Timely and developmentally appropriate support is associated with enhanced performance on the collaborative task and/or subsequent (tailor-made) post-tests (Kim, 2008; Swain, 1998; Swain & Lapkin, 2001).
A closer scrutiny of the LREs produced has also revealed that collaborating students sometimes resort to their L1 as a mediator (Swain & Watanabe, 2012). The L1 has been identified to serve various functions (Antón & DiCamilla, 1998) and help learners to pursue task goals, stimulate reflection and restructuring of the L2 (Azkarai & García Mayo, 2015; Guerrero & Villamil, 2000).
The cumulative findings of this research have shown that CW practice affords language learning opportunities and even enhances performance by stimulating discussion and maximizing the opportunities for meaningful interaction in the target language in response to the linguistic difficulties experienced by the learners. However, few studies have investigated the merits of collaborative tasks for secondary EFL learners, a population that has traditionally been given limited opportunities for engaging in discussions in and about the FL, most probably as a result of the language/content dichotomy pervasive in mainstream EFL classes (Basterrechea & García Mayo, 2013).
3 Effects of collaborative writing
a Quantitative measures
Studies conducted predominantly among university students have contrasted texts completed individually, in pairs, or in groups in terms of complexity, accuracy and fluency (CAF). Converging evidence was reported for fluency and accuracy. Fluency does not increase through collaboration. Pairs and groups, overall, write shorter texts probably due to the lack of sufficient writing time for the pairs (Fernández Dobao, 2012). Conversely, higher accuracy rates have been consistently obtained for advanced L2 learners writing reports or argumentative texts (e.g. Storch, 2005; Storch & Wigglesworth, 2007; Wigglesworth & Storch, 2009) and for intermediate FL students writing picture stories, problem/solution or cause/effect one-paragraph drafts (e.g. Fernández Dobao, 2012; McDonough & García Fuentes, 2015). The success of pairs has been attributed to deliberations over language and the opportunities to pool their resources. The advantage is not so evident for secondary school students, however. Basterrechea and García Mayo (2013) compared secondary EFL and Content and Language Integrated Learning (CLIL) learners’ performances and reported that collaboration was only beneficial for CLIL learners. The lack of differences after CW among EFL learners made them conclude that collaborative practices might be more suitable for learners who were already following a task-based methodology programme.
Finally, complexity analyses have yielded mixed results. Studies targeting syntactic complexity have shown no advances for collaborators (but see Storch, 2005 for positive results) and the lack of improvement has been attributed to the aspects analysed (Wigglesworth & Storch, 2009), insufficient writing time (Fernández Dobao, 2012) or the limited attentional resources of the students (McDonough et al., 2018). An alternative explanation could be related to task design and task requirements (Pallotti, 2009). Pallotti (2009) reported that lexical and syntactical complexity only increase if the task and its goals require it. Consequently, increases in complexity should only be expected when required by the communicative goals of the task. Lexical complexity in CW has rarely been explored. The variety of measures employed – type/token ratios, Giraud index, and Mean-Segmental Type–Token Ratios – have hindered comparability and have yielded contradicting results some favouring collaboration (Ahmadian, Amerian & Tajabadi, 2014; Fujiwara & Sato, 2015; Kim, 2008), while others have not identified lexical advantages for collaborating students (Fernández Dobao, 2012; Nassaji & Tian, 2010). To the best of our knowledge, no other study has explored the effect of collaboration on lexical diversity in secondary settings. Since the exclusive use of grammatical measures to analyse the effects of collaboration on complexity is believed to be a limitation of research to date (Tavakoli & Rezazadeh, 2014; Wigglesworth & Storch, 2009), our study attempted to address this gap by analysing the lexical complexity of individuals and pairs after CW practice based on two lexical complexity indexes: type–token ratio (TTR) and VOCD-D.
b Qualitative measures
The effectiveness of CW practices has been established largely in terms of discourse analytic measures, and few studies have employed holistic qualitative scales such as rubrics (e.g. McDonough & García Fuentes, 2015; Storch, 2005; Shehadeh, 2011). Rubrics have been said to influence learning positively when combined with other instructional interventions including peer-assessment, meta-cognitive or scaffolding of writing (Panadero & Jonsson, 2013). Likewise, they are aligned with current views on education in which formative assessment is prioritized (Ball et al., 2015), fostered in secondary education (Decreto Foral 25/2015) and are ecologically valid ways of accounting for possible improvements experienced through systematic engagement (McDonough & García Fuentes, 2015). In addition, the combination of the results that spring from the analyses of holistic and analytic measures should generate a more reliable representation of the effects of CW on the quality of the texts produced. To date, conflicting results have been reported for holistic ratings, probably influenced by the distinct task types and components measured. Some authors have reported no differences in writing ratings (McDonough & García Fuentes, 2015; McDonough et al., 2018), while others (Khatib & Meihami, 2015; Shehadeh, 2011) have found benefits for content, organization, vocabulary, mechanics and grammar after sustained and prolonged collaboration.
Considering that rubrics have rarely been used to assess the effect of CW, that most research has focused on university students, and that research on secondary contexts has yielded mixed results, our research seeks to analyse the effect of CW in these contexts as they have been identified as low-input FL contexts (Lasagabaster, 2008), where fostering and maximizing interaction opportunities is paramount. The following research questions guided this study:
Do students engage in collaborative dialogues operationalized as LREs? What type of LREs do they generate? What is the outcome of these LREs?
Does collaboration among intermediate EFL secondary learners lead to better argumentative essays in terms of quantitative measures of complexity, accuracy and fluency (CAF)?
Does collaboration among intermediate EFL secondary learners lead to better argumentative essays in terms of qualitative measures of content, structure, organization of ideas and register?
III Methodology
1 Participants and instructional context
Participants were 32 16–17-year-old Basque–Spanish bilinguals learning English as a FL in their first year of non-compulsory secondary education in a city in Northern Spain. In a background questionnaire, students reported not having any meaningful contact with the FL outside school. The participants came from two classes who followed the same course book View Points 1 (Grant & Payne, 2013) and were taught by the same teacher. Each book unit focused on a different writing genre, such as descriptions, reports, letters or argumentative essays considered common text types for the CEFR B1 reference level (Council of Europe, 2001). This B1 or intermediate level can be mapped to a 4.0–5.0 IELTS level or 42–72 TOEFL score. The book was the main teaching resource used during the classes and the teacher relied on it to plan her classes, to teach, and to evaluate students.
Instruction was largely teacher-directed, and the teacher took on the roles of class manager, authority source for the L2, and provider of feedback. Furthermore, the teacher was reluctant to employ group or pair work due to issues of noise and class management, and concerns that students would stray off-task and revert to using their L1, both of which are concerns that teachers have expressed about group work (Baines, Blatchford, Webster, 2015; Brooks & Donato, 1994; Ghorbani, 2011). Students, thus, had few opportunities for interacting in and about the target language, which is considered paramount for language learning.
At the outset of the study, students were given a placement test (Allan, 1992) to gauge their language level and to see whether initial differences arose between the groups. A chi-square test determined that there were no statistical differences among the two groups (χ² = 4.19, df = 3, p = 0.242). Thus the division into experimental (EG, n = 16) and control groups (CG, n = 16) was determined at random. Students were classified as low-intermediate, intermediate and upper-intermediate in the EG based on the placement results, which were also considered for the pairings in the EG. Since previous studies indicate that collaboration tends to occur mainly among similar language ability pairs (Storch & Aldosari, 2013), parallel level pairs were formed. Out of the 8 pairs, 3 were upper-intermediate, 4 were intermediate and 1 was lower-intermediate.
2 Materials
Three different instruments were used for data collection: (1) a pre-test: an individually written argumentative text; (2) an experimental task: a second argumentative text written individually by the CG and in pairs by the EG, and (3) the recordings from four randomly selected collaborating pairs.
The task, the composition of an argumentative text, was chosen in collaboration with the teacher of the English class. Three reasons guided the choice of task: first, it was considered a suitable task as it is authentic and part of their English syllabus and textbook; second, students at this educational stage are familiar with the genre as it is a common genre in the University Entrance Exam most students take at the end of this stage. Students practice it not only in the English classes but also in their Basque and Spanish literacy classes. 1 As they had been writing such meaning-focused texts in their other two languages and considering the learning of languages is an interdependent process (Cummins, 1976, 1979), it was assumed that students could resort to their integrated source of thought to transfer the abilities and skills needed into the FL (Lasagabaster, 1998; Sagasta & Etxeberria, 2008); and finally, in meaning-focused tasks such as argumentative texts, attention to language occurs incidentally triggered by production or comprehension difficulties (Storch, 2013). Learners’ attention focuses on ‘those forms that are driven by the learners’ own needs and which are therefore more likely to be within their linguistic knowledge range’ (Storch, 2013, p. 54), especially among learners in the lower proficiency bands (Alegría de la Colina & García Mayo, 2007; Leeser, 2004; Nassaji & Tian, 2010). However, to the best of our knowledge, no studies have used argumentative texts with mid-level EFL students in FL secondary settings.
The recordings of the students’ interactions formed the third source of information. The pairs recorded their own interactions in class by using their personal cell phones. Students were asked to email their files to the investigator after class.
3 Procedure
The study was carried out as part of the regular coursework during three weeks. The procedure involved five different sessions. In the first session, the placement test (Allan, 1992) was administered (40 minutes). Based on the scores obtained, parallel level pairs were formed in the EG.
In the second session, each group received one instruction session (55 minutes), designed to revise and reinforce students’ prior knowledge about argumentative texts and the English specific elements that should be included in them. After a short brainstorming task (5 minutes), the class was divided into three groups. Each group had to arrange a jumbled ‘for and against’ essay (10 minutes) into the correct order. The structure of argumentative texts was then explained and a chart with connectors was also completed (15 minutes). The last 15 minutes were devoted to eliciting the plan students should follow when writing argumentative texts together with some points to consider.
In the third session, all students composed the first argumentative text, the pre-test. To have initial measures, the essays were written individually. Students were given 25 minutes to write, using this prompt: Exams are an important part of education in many countries. Are they necessary? Discuss the advantages and disadvantages of exams and the role they should play in education. (Maximum 150 words)
In the fourth session, the students wrote the second argumentative text, which was the experimental task. In this case, the CG wrote it individually while the EG composed their text in pairs. The EG voice-recorded their writing process. They were given the following prompt: New technologies are becoming more and more common. Many children already use mobile phones and social networks. Should they be allowed to use them? Discuss the advantages and disadvantages of using new technologies and the role they should play among children. (Maximum 150 words)
The students completed the tasks within the time limit. As pairs take longer to complete tasks than individuals (Fernández Dobao, 2012; Storch, 1999, 2005; Storch & Wigglesworth, 2007; Wigglesworth & Storch, 2009), pairs and individuals were allocated a different amount of time: 40 minutes for pairs, and 25 minutes for individuals.
4 Data analysis
The analysis was based on two different sets of data: Fifty-six (56) argumentative essays (32 individual texts from the pre-test, and 24 from the experimental task - 16 individual and 8 pair compositions) and the recordings produced by four pairs which had been randomly selected. The proficiency level of the pairs was upper-intermediate (EG1–2 and EG3–4) and intermediate (EG7–8 and EG9–10). The recordings were 111 minutes 55 seconds in total. The quickest pair took 13 minutes 54 seconds while the slowest one took up the entire 40 minutes available, the remaining two pairs took 22 minutes 28 seconds and 35 minutes 33 seconds, respectively. The recordings selected were transcribed verbatim.
a Criteria for the analysis of the pair dialogues
First, all LREs were identified and categorized for focus: (1) lexis-focused (L-LRE) episodes, in which learners searched for words, considered alternative expressions, or explained the meaning of words or phrases; (2) form-focused (F-LRE) episodes in which learners deliberated over morphology (e.g. word forms) or syntax (e.g. length and order of sentence); and (3) mechanics-focused (M-LRE) in which learners focused on the spelling of words or punctuation. Finally, the outcomes of the LREs that had been generated were considered and tallied as correctly resolved, incorrectly resolved or unresolved (Leeser, 2004; Swain, 1998).
b Criteria for the analysis of compositions
The analysis of the essays followed Storch (2005) and examined quantitative CAF measures and qualitative measures of content, structure, organization and register, comparing tasks written individually and in pairs.
First, quantitative measures of complexity, accuracy and fluency were analysed. Complexity included grammatical and lexical features. Grammatical complexity measures examined the degree of embedding in a text (Wolfe-Quintero, Inagaki, & Kim, 1998) through the analysis of the proportion of clauses to T-units (C/T) (Foster & Skehan, 1999) and the percentage of dependent clauses to clauses (DC/C). Following Hunt (1996, p. 735) a T-unit is defined as ‘one main clause plus whatever subordinate clauses happen to be attached to or embedded within it’. Clauses were classified as independent or dependent: an independent clause is one which can stand on its own (Richards, Platt & Platt, 1992), while a dependent clause must be used with another clause to form a grammatical sentence in English. Following Foster, Tonkyn and Wigglesworth (2000), a dependent clause is one containing a finite or a non-finite verb and at least one additional clause element of the following: subject, object, complement or adverbial. Lexical complexity was investigated using type–token ratio (TTR) and VOCD counts (D index) (Malvern & Richards, 2000; Malvern, Richards, Chipere, & Purán, 2004). These two measures of lexical complexity were obtained with the help of the VOCD command of the CHILDES project’s CLAN software (MacWhinney, 2000). To grant comparability with other L2 acquisition studies, the TTR index will also be reported, although D has been claimed to be a much more reliable measure (MacWhinney, 2000; Malvern et al., 2004) and less sensitive to text length (MacWhinney, 2000).
To measure accuracy, the proportion of error-free clauses to total clauses (EFC/C), error-free T-units to total T-units (EFT/T), and number of errors to words were calculated (Fernández Dobao, 2012; Storch, 2005; Storch & Wigglesworth, 2007; Wigglesworth & Storch, 2009). Global measures of accuracy (EFC/C and EFT/T) represent a realistic measure of accuracy (Foster & Skehan, 1999). However according to Storch (2005), it is also important to use local units (errors per word) because they account for the exact distribution of errors in relation to words. Errors comprised three categories: (1) grammatical errors which include syntactical (e.g. errors in word order, missing elements) and morphological errors (e.g. verb tense, subject–verb agreement, errors in use of articles and prepositions, errors in word forms); (2) lexical errors which include confusion of word choice; and (3) mechanical errors in spelling and punctuation (Fernández Dobao, 2012; Wigglesworth & Storch, 2009).
For the quantitative fluency measures, in addition to the number of T-units and clauses, three length measures were calculated: number of words, words per T-unit and words per clauses. These measures are considered the best way of describing fluency development (Wolfe-Quintero, Inagaki, & Kim, 1998).
Second, texts were analysed qualitatively using holistic measures of content, structure, organization and register with four score bands (based on Shehadeh, 2011: 4: very good; 3: good/average; 2: fair/poor; and 1: very poor). In short, content targeted relevance to the topic, quantity and quality of arguments. Structure assessed the (lack of) parts included in the text and extension. Organization evaluated fluency of expression, cohesive devices and clarity of ideas. Finally, register observed the adequacy of the language for the purpose.
c Statistical analysis of results
To ensure coding reliability, 10% of the data were independently coded by the two researchers. The transcripts were compared and differences discussed until a total intercoder agreement was reached. Finally, the remaining data were distributed and coded. When uncertainties arose, they were discussed until agreement was reached.
Various t-test for independent and dependent samples were run to determine whether between and within group differences arose, and a distinct performance between the EG and CG could be statistically established. However, the small sample size of students writing collaboratively could affect the statistical results obtained and not finding an effect might be due to little power and not to lack of treatment effects (Larson-Hall, 2010). An increase in the means will then be interpreted as signs of improvement which should be confirmed by further research with a larger sample. This limitation notwithstanding, the fact that the study has been undertaken in a real school controlling for as many variables as possible adds on its ecological validity (García Mayo & Villarreal, 2011).
IV Results
1 The process of collaborative writing: language-related episodes (LREs)
The LREs were categorized as grammatical (F-LREs), lexical (L-LREs) or mechanical (M-LREs). Interaction related to language generated a total of 69 LREs (see Table 1). The results show that there were a similar amount of grammar and lexical LREs (44.93% F-LREs and 42.02% L-LREs). When the results from the individual pairs were analysed, it seems that grammar discussions were favoured by two of the pairs, EG1–2 (47.06%) and EG7–8 (63.63%), while lexical issues were the main source of discussion for EG3–4 (50%) and EG9–10 (54.84%). Yet, even the pairs who favoured lexical discussion targeted grammatical points about 40% of the time. Meanwhile, the more grammar-oriented pairs discussed lexical features in about 25% of the occasions. Overall, mechanical episodes attracted the least attention from the pairs except for EG1–2 who discussed M-LRE 29.41% of the times.
Language-related episodes (LREs) and correctly resolved LREs.
Notes. EG = experimental group. F-LRE = form-focused language-related episode. L-LRE = lexis-focused language-related episode. M-LRE = mechanics-focused language-related episode.
As for the outcome of the LREs generated, the results revealed a very high proportion of correctly resolved episodes (above 80%). Pairs resolved almost all (93.55%) grammatical uncertainties, and most of the lexical (86.21%) and mechanical (88.89%) issues that arose during their collaboration. This high proportion of correctly resolved episodes indicates that intermediate learners can not only negotiate but also negotiate successfully. No relation was observed between the various sub-proficiencies of the pairs and the created or correctly resolved episodes as shown by the non-significant results from the Chi-square tests. The examples included below also evidence a strong commitment to solve the linguistic conflicts experienced in the tasks.
Examples 1–3 demonstrate that learners collaborated over a range of grammatical points, including prepositions, plurals, relative clauses, modal verbs and syntax. They deliberated and sought confirmation for the choices they made, corrected each other and, at times, provided explanations for the use of particular forms. Example 1 illustrates a correctly resolved discussion on prepositions, recurrent among pairs.
Example 1: EG1: in the one hand EG2: on the one hand EG1: on? EG2: yeah EG1: or in? EG2: on … on the one hand …
Students provided each other with corrective feedback. EG1’s yes–no questions show lack of certainty over choice of the preposition. EG2 is sure about the correct choice but after some turns, uncertainty arises again. Perhaps the act of verbalizing their concern helps them reach the correct answer (Storch & Wigglesworth, 2007) as in Example 2.
Example 2: EG3: and that will be so helpful EG4: so? … or very? EG3: very … very helpful
In Example 3, EG4 corrects EG3 for an incorrect verb agreement. The explanation is brief, but EG3 immediately realizes the mistake.
Example 3: EG4: today … today children EG3: uses EG4: use, it’s plural EG3: yes, yes
Discussion of L-LREs was also frequent. Students frequently discussed word and verb choice to describe and confer about new technologies: addiction, contact with friends, effects, etc. The students also deliberated over linking words or connectors introduced in the pre-teaching session.
In Example 4, students were trying to find the correct adjective. One of the students has a word in mind and the other offers suggestions until they find the exact word, an example of what has been labelled as collective scaffolding (Donato, 1994).
Example 4: EG1: more ambitious EG2: yes, but also we could say that they are more with themselves EG1: more selfish? EG2: no selfish is … EG1: individualist? EG2: aislado [isolated] … but I don’t know how to say aislado [isolated] in English EG1: isolated? EG2: OK isolated
Three out of four pairs correctly resolved all the L-LREs, while EG9–10 resolved 13 of 17 (76.47%). These four errors were the only errors they made in the essay. The following lengthy extract (Example 5) shows how these learners deliberated over some language choices. They solved two F-LREs but made a lexical error at the end. Still their engagement with the task is obvious.
Example 5: EG9: people use them correctly, risks can’t be as dangerous as otherwise EG10: eso está mal [This is wrong] EG9: ya, bai [Yes, I see it] EG10: these risks will be less dangerous EG9: OK EG10: or would be less dangerous, serían [They would be] EG9: so we think that … EG10: we think that children tendrían que tener prohibido [They should have it forbidden] EG9: we think that children … EG10: mustn’t, ¿no deberían? [They shouldn’t?] EG9: haven’t EG10: ¿bai? [Yes?] EG9: mustn’t EG10: no pueden no, no deberían … [They mustn’t no, they shouldn’t] EG9: shouldn’t! Use them, because they are too young to afford these risks EG10: afford da afrontar? [Afford is to face?] EG9: afford da aurre egin [Afford is to face] EG10: nola idazten da? [How do you write it?] EG9: afford [spelling it]
In contrast to the frequent discussion on grammatical and lexical items, negotiations on mechanical points in the texts were perfunctory (5/17 EG1–2; 1/10 EG3–4; 1/11 EG7–8; 2/31 EG9–10). Nevertheless, Examples 6 and 7 illustrate deliberation on spelling and register decisions, respectively.
Example 6: EG2: wait, between children EG1: between? EG2: tw double e e … EG1: between, EG2: two ee, I am not sure … EG1: Ok, so between children. EG2: full stop. Example 7: EG2: it’s obvious, do you know how to write it? EG1: yes EG2: no, oh sorry ‘it is’ obvious, it is formal so we can’t put it’s EG1: yeah
A final comment about the dialogues that were created refers to the L1 use of the learners. Resorting to the L1 in the decision-making process has been considered an essential cognitive tool for mediating in collaboration (Guerrero & Villamil, 2000; Swain & Watanabe, 2012). Our data suggest that there might be a relationship between L1 use and the learners’ proficiency level, a relationship that should be further explored with a more numerous sample of similar profile learners. The pairs with the highest level (EG1–2 and EG3–4) constructed the text by using English exclusively whereas the lower level pairs, and especially EG9–10, made a significant use of the L1 (see Example 5 and 10). The L1 was used mainly to deliberate over ideas (Example 8), language choices (see Example 9), meet the task goals (Example 10) or move the task forward (Example 11).
Example 8: EG9: aurrena alde onak eta alde txarrak, en plan eskema [first arguments for and against, as in an outline] EG10: ona zer da, jendea ezautzen dozo? [what’s a for argument that you meet people?] EG9: bai [yes] This pair has used the L1 to decide on the main ideas they will include in the text, meeting people as a positive aspect of technologies. Example 9: EG10: bai, bale ordun oain hau idazten du? [ok, alright, so now do we write this?] we know which dangers EG9: peligros da [is it dangers?] dangers? EG10: bai, ez?, [yes, isn’t it?] or risks
In this example, the L1 is used to make lexical choices as they try to retrieve the English equivalent for the word peligros. EG9 says the exact word in Spanish and her best guess in English, but EG10 provides her with a more suitable word for the context, risks.
Example 10: EG8: for and against, ordun [this is it], advantages? [the first sentence in their recording]
By using the adverb ordun, EG8 makes sure that both members of the dyad are aware of the task goals, and at the same time highlights the fact that it is a shared goal, a task that needs to be undertaken together.
Example 11: EG8: yes because they can be playing games or they … Also is true that in class people is using the mobile phone, en vez de [instead]? EG7: eh … eztakit [I don’t know] EG8: osea, da [I mean] when children must be doing exercises they can be with the mobile phone and that can make their marks worse, EG7: bai, ongi [yes, correct]
In this example students use the L1 to move forward in the task. Through the L1 they express their uncertainties (eztakit ‘I don’t know’), attempt to clarify uncertainties that may potentially obstruct the task (osea, da ‘I mean’) and confirm understanding which allows them to move on (bai, ongi ‘yes, correct’).
To summarize, all dyads actively engaged in dialogic interactions over language choices which targeted grammatical and lexical issues as well as mechanical issues, though to a lesser extent. Furthermore, these interactions contained a high proportion of LREs (above 80%) that were correctly resolved collaboratively. CW, therefore, provided learners with the opportunity to pool their linguistic resources (assisted sometimes by the L1) and to reflect on their language use. This confirms the suitability of argumentative essays to stimulate collaborative interaction.
2 Comparing individual and jointly written texts
Table 2 features the results for grammatical complexity. As can be observed, both groups reduce the proportion of clauses per T-unit in the second essays (EG from 1.83 to 1.68 and CG from 1.93 to 1.78) which indicates that both groups used less complex T-units in their second texts. Similarly, the rate of dependent clauses per clauses also decreased, the decline was more prominent for the CG (DC/C 0.47 to 0.40) than for the EG (from 0.43 to 0.41), but it was not statistically significant.
Measures of grammatical complexity: Proportion of clauses to T-units (C/T) and dependent clauses to clauses (DC/C).
Notes. CG = control group. EG = experimental group.
Lexical complexity, measured through TTR and VOCD-D indexes (see Table 3), showed a non-significant increase for both groups from the pre-test to the experimental task: types (CG: 83.63 to 90.94 and EG: 85.53 to 95), TTR (CG: 0.51 to 0.54 and EG: 0.55 to 0.59) and VOCD-D rates (CG: 53.23 to 67.88 and EG: 62.93 to 77.5). Experimental texts were lexically richer for both groups as shown by the higher number of different words (types) used, the larger TTR and VOCD-D rates. No significant changes in lexical complexity were observed across tasks and groups. It seems that pair work did not have a significant effect as the differences observed between the groups were not statistically significant. The EG started out with higher means and maintained it across tasks and for all the measures, TTR (0.54 CG vs. 0.59 EG) and the D-index (67.88 CG and 77.5 EG).
Measures of lexical complexity: TTR and VOCD-D.
Notes. CG = control group. EG = experimental group. TTR = type–token ratio. VOCD-D = VOCD counts (D index) (Malvern & Richards, 2000).
Collaboration had an effect on global accuracy measures (see Table 4). Pairs obtained better accuracy scores than individuals on both measures (EFT/T CG 0.21 vs. 0.47 EG and EFC/C 0.37 CG vs. 0.56 EG). The EG obtained slightly higher results in the pre-test for the rate of error free T-units per T-units (EFT/T 0.16 vs. 0.24) and the rate of error-free clauses per total clauses (EFC/C 0.38 vs. 0.35). In the experimental task, both groups improved their results, but the rates for CW EG rocketed. A t-test for independent measures confirmed that the texts by the EG were significantly more accurate than texts by the CG as observed in the rates of EFT/T (t = −3.905; p = 0.001) and EFC/C (t = −2.661; p = 0.019).
Measures of accuracy (global units) for the control group (CG) and the experimental group (EG) in the pre-test and at the collaborative writing (CW) time.
Notes. EFT = error-free T-units. EFC = error-free clauses. EFT/T = error-free T-units to total T-units. EFT/C = error-free clauses to total clauses.
As for local units (see Table 5), at the pre-test, the EG made fewer errors but differences were marginal. In collaboration, however, the EG reduced their errors markedly (from 22 to 14.62). In fact, the CG made notably more errors (t = −3.91; p = 0.022) and also more grammatical errors (t = 2.82; p = 0.010) while both groups made parallel low amounts of lexical and mechanical errors (0.01 lexical errors and 0.03 mechanical errors). Thus, it seems that collaboration allowed students to produce more grammatically accurate texts.
Measures of accuracy (local units) for the control group (CG) and the experimental group (EG) in the pre-test and in the experimental task.
Fluency (see Table 6) was similar in both groups. The CG obtained slightly higher results and means in the two tasks except for the number of dependent clauses in the second task. In the experimental task, as the EG writes collaboratively, the total count of T-units, clauses, and dependent clauses should be timed two.
Measures of fluency for the control group (CG) and the experimental group (EG) in the pre-test and in the experimental task.
Notes. C = clauses. DC = dependent clauses. W = words.
In the pre-test, although of similar length, texts produced by individuals were longer on average than those produced by pairs: 169.94 vs. 161 words. The groups also produced similar numbers of T-units in the pre-test (208 CG and 207 EG) and in the experimental task (213 CG and 106 EG). The biggest difference between the two groups lies on the amount of clauses (CG 397; EG 368) and dependent clauses (CG 188; EG 158) produced in the pre-test where the CG produced a higher number of them. However, after collaboration the difference disappeared and the EG produced a comparable number of clauses and dependent clauses. For the W/T and W/C means, however, individuals obtained higher means in both texts.
3 Qualitative measures of content, structure, organization of ideas and register
As can be observed in Table 7, whereas in the pre-test the CG scored higher than the EG in all qualitative measures, the EG outperformed the CG when they wrote in pairs, except for errors in register. CW resulted in texts with better quality of content (3.53 CG and 3.63 EG), structure (3.69 CG and 4 EG) and organization of ideas (3.22 CG and 3.63 EG) while the CG used a slightly more appropriate register (3.38 CG vs. 3.13 EG), even though only the EG improves in the second text (3.06 to 3.13). Yet, only the difference in the measure structure (t = −4,038; p = 0.001) was significant.
Qualitative scores for texts written by the experimental group (EG) and the control group (CG).
V Discussion and conclusions
Pair writing promoted FL production and generated multiple deliberations which provoked increased L2 use, a compelling need in low-input contexts with few opportunities to use the FL language outside school (Lasagabaster, 2008). Collaboration resulted in texts which were more accurate and of better quality on holistic measures of content, structure, and organization of ideas, although limited or no gains were observed for complexity and fluency.
Taking into consideration the findings regarding student talk, the analyses of the dialogues showed that collaborative activities provided learners with the opportunities to discuss their language problems and test their hypotheses while making meaning (Swain, 2006). As students deliberated over linguistic concerns, they collaborated and generated many LREs, which became language learning opportunities (Swain & Lapkin, 1998, 2001; Swain & Watanabe, 2012).
In line with previous research (Fernández Dobao, 2012; Storch, 2005; Storch & Wigglesworth, 2007; Wigglesworth & Storch, 2009), students focused more on lexis and grammar than on mechanics, probably because the learner ultimately writing the text made most of the spelling and punctuation decisions (Fernández Dobao, 2012). Unlike in previous studies (see, for example, Storch & Wigglesworth, 2007), however, form and lexis received similar attention from the pairs. This might be linked to the fact that CW students were familiar with the topic and the genre, which allowed their attention to focus on both lexical and grammatical issues that were within their linguistic repertoires (Alegría de la Colina & García Mayo, 2007) and were motivated by their own immediate needs (Storch, 2013). Dictogloss or text reconstruction tasks, however, have been said to cause comprehension problems (Leeser, 2004) or to target features beyond the abilities of learners with limited proficiency (Alegría de la Colina & García Mayo, 2007).
Additionally, students resolved most of their episodes correctly (more than 80%), which shows that collaboration is beneficial for all intermediate learners even though learners with lesser proficiency have been reported to use the L1 as a mediating tool (Azkarai & García Mayo, 2015; Donato, 1994; Swain, 2006). Struggling intermediate learners resorted to the L1 to move their discussion forward and correctly resolve LREs, as well as to make non-language related decisions and fulfil the task successfully. Notwithstanding the importance of language choice, the verbalization of metalinguistic knowledge is notable, as these exchanges (see, for instance, Examples 3 and 5) are evidence of students’ attempts to assist each other in bridging their zone of proximal development (ZPD; Donato, 1994; Kitade, 2008; Ohta, 1995; Swain, 2000, 2006), which is of paramount importance to successful L2 learning. These dialogues may explain the greater accuracy and quality in the compositions written in pairs (Wigglesworth & Storch, 2009, 2012). However, a direct correlation cannot be established, since our tasks did not include any tailor-made post-tests targeting the specific language examples students discussed in their exchanges (for an example, see Kim, 2008; for a summary, see Storch, 2013), a limitation which should be acknowledged here.
With regards to CAF and holistic analytic ratings, our results are in line with previous studies that have found a lack of complexity and fluency increase among pairs (e.g. Fernández Dobao, 2012; McDonough & García Fuentes, 2015; Wigglesworth & Storch, 2009). Our pairs had more time to complete the activities than the individuals, and familiar task types and topics were selected but differences were marginal. The results obtained support Pallotti’s (2009) claim that complexity only increases if required by the task. The two tasks undertaken shared the same communicative goal which might have resulted in similar complexity rates. Furthermore, length restrictions (maximum 150 words) might have become a homogenizing factor. In fact, the global quality scales do show an important improvement on all the qualitative criteria measured in the CW group. Conversely, our study revealed a statistical advantage on accuracy measures in favour of the CW group. This result has also been reported on earlier studies undertaken with ESL and EFL learners (e.g. Fernández Dobao, 2012; McDonough & García Fuentes, 2015; Wigglesworth & Storch, 2009). Although participants also improved their writing as individuals, dyads produced more error free texts. Languaging was advantageous and resulted in higher accuracy rates as students drew their attention to form when they negotiated meaning (e.g. Swain & Watanabe, 2012).
The current study has also shown that the advantage is not only observed on the analytic measures of accuracy, but also on the quality of the texts produced as measured by the global scales. In fact, pairs wrote texts that were significantly better structured, probably due to the time they had invested in reaching a consensus with the peer about the structural framework in which the content to be conveyed was to be accommodated (more than 20% of the time for 3 out of the four pairs). Collaborating students thus produced texts with a clearer structure and organization of ideas, as well as better quantity and quality of arguments. The findings suggests that argumentative texts are also suitable for intermediate learners (for studies with advanced university students, see Storch & Wigglesworth, 2007; Wigglesworth & Storch, 2009) and therefore places collaboration as a promising candidate for secondary mixed-ability classes.
Some pedagogical implications can be drawn from the findings of this study for EFL writing instruction. If language learning is a socially situated activity which is mediated by interaction (Vygotsky, 1978), opportunities to interact among peers should be fostered. Teachers need to design tasks which promote negotiation of meaning and form (Long, 2000) as peers have been shown to carefully scaffold each other and to help to bridge their ZPD even without any identifiable expert (Donato, 1994; Guerrero & Villamil, 2000; Ohta, 2000) in the process of making meaning (e.g. Watanabe & Swain, 2007). CW has been shown to offer these kinds of possibilities by encouraging peer collaboration which affords students the opportunity to pool their knowledge together and to create new knowledge (for a summary, see Storch, 2013). In addition, listening to the peer dialogues permits practitioners to harness insights into the students’ language learning process (Swain, 1998; Swain & Watanabe, 2012) for the design of tailor-made activities that cater to students’ needs and enhance their progress. Furthermore, CW tasks have an advantage over other types of spoken tasks since their development forces students to produce written as well as spoken language (Storch, 2013), an additional opportunity that is particularly important in low-input contexts with few chances to use the FL language outside school (Lasagabaster, 2008). All in all, the findings suggest that there is a place for CW tasks in the EFL secondary classroom.
This experience revealed improvements over a relatively short period of time and with relatively little intervention from the teacher. However, further research into whether CW results in gains in the quality of subsequent individual writing assignments -the main means by which students’ ability is assessed for academic credit- is necessary if CW interventions are to be widely exploited in secondary schools. Additionally, investigations into the effects of task design have been recommended regarding the impact of task goals on CAF measures (Pallotti, 2009), but also into how directing students’ attention to any of the written components in the task might affect the results (McDonough et al., 2018).
Most studies to date have tried to observe individual language learning gains over a relatively short time and have obtained mixed results (for a review, see Storch, 2013). Future longitudinal studies would be helpful to investigate whether prolonged engagement in CW could lead to more successful language learning in secondary education along the lines of the longitudinal studies of Shehadeh (2011) and Yasin Yazdi-Amirkhiz, Ajideh, and Leitner (2016) in EFL university contexts. Further investigations measuring whether the gains obtained are retained and transformed into FL learning need to also be undertaken through the design of tailor-made individual post-tests which target linguistic issues raised in the pair dialogues (Swain & Watanabe, 2012). Evidence of learning would increase the validity of CW practice for the secondary school context, a context in which academic success tends to be measured by official examinations which are reported to negatively impact the teaching methods and materials (Adnan & Mahmood, 2014).
Moreover, although the present investigation has tried to incorporate a wider range of features into the analysis than previous studies, the extent to which collaboration affects the meaning-based features needs to be explored further, incorporating more features and analysing how they affect students’ writing skills. Finally, future studies need to use larger samples as the small sample size might have been insufficient to detect subtle differences among the groups and might have compromised the generalizability of the results. All in all, CW seems to be a promising tool to maximize the opportunities for meaningful interaction and EFL development in secondary classrooms.
Footnotes
Acknowledgements
We would like to thank IES Amazabal BHI of Leitza for giving us the opportunity to carry out the project. Special thanks go to the participating students for their enthusiasm and their teachers for their help and support.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
