Abstract
A major concern with computer-based (CB) tests of second-language (L2) writing is that performance on such tests may be influenced by test-taker keyboarding skills. Poor keyboarding skills may force test-takers to focus their attention and cognitive resources on motor activities (i.e., keyboarding) and, consequently, other processes and aspects of writing (e.g., planning, revising) might be left unattended to, which can lead to poor text quality and lower test scores. Such effects might be more pronounced for L2 test-takers. This study investigated the impact of keyboarding skills on test-takers’ scores in the context of the TOEFL-iBT Writing Section. Each of 97 test-takers, with different levels of English language proficiency (low vs. high) and keyboarding skills (low vs. high), responded to two TOEFL-iBT writing tasks (independent and integrated) on the computer. Test scores were statistically compared across tasks and test-taker groups. The findings indicated that overall English language proficiency and writing ability in English contributed substantially to variance in task scores, while keyboarding skill had a significant, but weak, effect on task scores. Additionally, keyboarding skills effects depended on task type. While these findings support the claim that performance on TOEFL-iBT writing tasks depends mainly on test-taker English language proficiency, they also raise important questions about the relationships between keyboarding skills, L2 writing ability, and performance on CB L2 writing tests, as well as factors affecting these relationships.
Keywords
With the growing use of computer-based (CB) tests of second-language (L2) writing, new concerns have emerged regarding the validity of inferences based on scores from such tests (Chapelle & Douglas, 2006; Wolfe & Manalo, 2005). As Wolfe and Manalo (2005) explained, performance on CB tests is likely to be influenced by both the presentation mode and the additional computer skills required to perform the task. The first issue, raised when CB tests were first introduced, is whether tests administered in different modes (computer vs. paper) produce scores that yield equivalent interpretations about test-taker ability (Bennett, 2003; Burke & Cizek, 2006). The second issue is whether performance on CB tests is influenced by test-taker computer skills. L2 test scores are expected to vary solely as a function of test-takers’ L2 abilities. However, with CB L2 tests, observed test scores might reflect both L2 ability and computer skills (Taylor et al., 1998). As Taylor et al. (1998) explained, “if this occurs, then score interpretations would be confounded with ability to use a computer”, which introduces construct-irrelevant variance into the measure (p. 1). This validity threat is greater with writing tasks since these tasks make heavier demands on test-takers’ keyboarding skills than do selected-response items (Bennett, 2003; Burke & Cizek, 2006; Green & Maycock, 2004; Wolfe & Manalo, 2005).
This study examined the question of the effects of test-taker keyboarding skills on scores on the TOEFL-iBT Writing Section. The explanation inference in the TOEFL-iBT validity argument rests on the warrant that “[test] scores are attributed to a construct of academic language proficiency” (Chapelle, 2008, p. 333). Since TOEFL-iBT is delivered on the computer, test performance might depend on test-taker keyboarding skills as well. Keyboarding skills are not part of English language proficiency as currently defined and operationalized by TOEFL-iBT. Consequently, differences in test scores that can be attributed to a test-taker’s lack of keyboarding skills, rather than lack of English language proficiency, constitute a source of construct-irrelevant variance (Wolfe & Manalo, 2005). 1 This study examined this question by comparing the scores of test-takers with different levels of keyboarding skills and English language proficiency when responding to the TOEFL-iBT writing tasks.
Effects of delivery mode on writing test performance
Numerous studies have examined the effects of delivery mode (paper vs. computer) on writing test scores. The main question that these studies aimed to answer is whether test-takers’ performance on CB tests reflects the same ability as that which is measured by paper-based (PB) tests that are assumed to be equivalent (Chapelle & Douglas, 2006). Findings from this line of research are mixed. In a study comparing scores assigned to PB and CB TOEFL essays, Breland, Lee, and Muraki (2004) found that although there was little observed difference in mean scores, when test-takers were matched on English language ability, small differences were observed in effect sizes consistently favoring the PB test. Wolfe and Manalo (2005), also examining TOEFL essays, found that various test-taker demographic characteristics (e.g., gender, L1 script, L2 proficiency) influenced the likelihood that a test-taker chooses one writing mode over the other. Additionally, there was no difference between essay scores of test-takers who chose to handwrite or word-process their essays. However, test-takers with lower multiple-choice scores tended to have higher scores on the PB writing test than on the CB writing test, while test-takers with higher multiple-choice scores tended to have similar scores in either mode. Three studies have compared test-takers’ scores on CB and PB versions of the IELTS writing section. Green and Maycock (2004) found that test-takers performed marginally better on the PB version than they did on the CB version, while Blackhurst (2005) and Weir, O’Sullivan, and Jin (2007) found no significant differences across writing modes. Blackhurst, consequently, argued that the two versions of IELTS can be used interchangeably and that test-takers, given adequate computer skills, will perform equally well on either version of the test.
The role of keyboarding skills
A factor that seems to moderate the effects of writing mode on test performance is test-taker computer familiarity and skills. Theoretically, computer skills may influence test-takers’ writing processes, texts, and/or test scores on CB writing tests. Cognitive models of writing (e.g., Fayol, 1999; Hayes, 2006; Kellogg, 1996; Torrance & Galbraith, 2006) provide an explanation of how and why such effects may occur. According to these models, writing is a complex activity that requires the coordination of a variety of different cognitive processes that can compete for cognitive resources that are limited (Fayol, 1999; Torrance & Galbraith, 2006). With increasing demand by some processes, performance based on other processes, which rely on the same cognitive resources, may suffer (Alves et al., 2007; Connelly et al., 2007; McCutchen, 1996; Olive & Kellogg, 2002).
From a cognitive perspective, if low-level skills such as keyboarding and spelling are automated, that is, “occur without voluntary control or interfere minimally with other processes” (Torrance & Galbraith, 2006, p. 74), they will not require any attentional resources and, consequently, will not constrain or influence the writing process and its outcomes (Fayol, 1999; Torrance & Galbraith, 2006). However, poor keyboarding skills may force writers to focus their attention and cognitive resources on motor activities (i.e., typing) and, consequently, other higher-order processes (e.g., planning, revising) might be left unattended to, which can lead to poorer text quality and lower scores (Alves et al., 2007; Connelly, Gee, & Walsh, 2007; Fayol, 1999; Horkay et al., 2006; Wolfe & Manalo, 2005). Additionally, there is evidence that when instructed to write using an unfamiliar method (e.g., typing, writing in capital letters), writers tend to pause more frequently and to write more slowly, indicating a trade-off between the formulation and execution systems (Bourdin & Fayol, 1994; Olive & Kellog, 2002).
These effects might be magnified for L2 writers with low computer ability when writing on the computer under test conditions (Wolfe & Manalo, 2005). As Wolfe and Manalo explained, when responding to CB writing tasks, L2 test-takers with limited keyboarding skills “may waste valuable test time, divert their attention away from the quality of their writing, and/or lose self-confidence which, in turn, affects their task performance regardless of their writing ability” (p. 49; cf. Chapelle & Douglas, 2006). In this case, the inference made from test scores would be that the test-taker has poor L2 writing ability, whereas the correct inference would be that the test-taker does not have competence in completing writing tasks on the computer (Bennett, 2003; Burke & Cizek, 2006; Chapelle & Douglas, 2006).
While numerous studies have compared test-takers’ performance on PB and CB writing tasks, few studies have examined the effects of computer skills on CB writing test performance, particularly in L2 tests (Douglas & Hegelheimer, 2007). These studies examined the interaction effects of delivery mode (PB vs. CB) and computer skills on test performance. This line of research suggests that test-takers with higher computer familiarity tend to receive higher scores on CB writing tasks, while test-takers with lower computer familiarity tend to receive higher scores on handwritten essays (Wolfe & Manalo, 2005). Russell and Haney (1997), for example, found that students accustomed to writing on computer obtained higher scores when writing on computer than when composing on paper. Horkay et al. (2006) investigated the comparability of scores on CB and PB versions of a writing test administered to eighth-grade students. They found no significant mean score differences across modes. However, students with higher keyboarding skills scored higher than did those with less skill on the CB test, after controlling for PB test scores. Burke and Cizek (2006) found that the effects of writing mode and computer skills on the essay scores of sixth graders with different computer skill levels varied across writing tasks. Finally, Maycock and Green (2005) found that test-takers’ ability and experience in using computers did not have a significant impact on their scores on a CB version of the IELTS writing section.
One limitation of previous studies on the effects of computer skills on writing performance is that they define and measure computer use and skills in different ways. A second limitation is the tendency to rely on self-report data, in the form of questionnaires and/or interview responses, as measures of computer use, familiarity, and/or skills. However, perceived computer ability may be very different from actual ability. For example, in a study that compared self-report and objective measures of computer literacy among entry-level undergraduate accounting students, McCourt Larres et al. (2003) found significant differences in the students’ perceived and actual computer literacy, with the vast majority overestimating their computer knowledge. Consequently, McCourt Larres et al. argued that computer skills should be measured more directly such as via individual tests of typing speed and accuracy and editing skills (cf. Connelly et al., 2007; Horkay et al., 2006). As Horkay et al. (2006) have argued, typing speed and accuracy are needed to ensure that a complete and accurate response can be entered before the testing time elapses, while editing skills can help the writer to revise his or her text more effectively and quickly (p. 32).
Purpose of the study
No previous studies have examined the relationship between test-taker keyboarding skills and performance on the TOEFL-iBT writing tasks. Taylor et al. (1998) examined the impact of computer skills on perfomance on the listening and reading sections of TOEFL and found no practical effect on most test-tasks after adjusting for language ability. Taylor et al., however, did not examine the effects of computer skills on performance on TOEFL writing tasks. As a result, Wang, Eignor, and Enright (2008) argued that “the impact of computer familiarity on test performance, particularly for writing, remains a potential threat to test interpretation” (p. 276). Wang et al. reported on a field study that found a positive correlation between frequency of use of English language computers and scores on a field version of TOEFL-iBT. The study also found that the impact of computer familiarity did not appear to be any stronger for writing than for the other skills (p. 276). As Wang et al. cautioned, the findings of this field study do not necessarily mean that weak computer skills are the reason that some participants did poorly on the test. As documented by Taylor et al. (1998), other variables, such as English language proficiency, must be taken into account before any conclusions about the impact of computer skills on test performance can be drawn. Wang et al. called for further research to explore in a more systematic way the impact of computer skills on scores on TOEFL-iBT writing tasks.
This study addresses this research gap by comparing the scores of test-takers with different levels of keyboarding skills (low vs. high) and English language proficiency (ELP, low vs. high) when responding to independent and integrated writing tasks in the TOEFL-iBT Writing Section. Specifically, the study addressed the following research question: What are the effects of keyboarding skills, ELP, and task type on test-takers’ holistic scores on the TOEFL-iBT writing tasks?
Participant recruitment
Emails were sent to all international graduate and undergraduate students at a university in Southern Ontario, Canada, to invite them to participate in the study. Emails were also sent to ESL teachers in the English language program at the same university asking them to tell their students about the study. In addition, flyers about the study were distributed to students in the ESL program. More than 300 students responded to the call for participation. Each volunteer was asked to complete two online typing tests. Based on typing test scores, students who could participate in the study were identified.
The original plan was to recruit 100 participants: 25 participants by 2 keyboarding skill levels (high and low) by 2 ELP levels (high and low). However, it was very difficult to identify and recruit (a) students with low ELP and high keyboarding skills and (b) students with high ELP and low keyboarding skills. After extending data collection for seven months, only 97 students participated in the study (see Table 1). The high ELP group included post-admissions students in their first or second year of university (graduate or undergraduate) study. The low ELP group included pre-admission students who were enrolled in low- to high-intermediate (pre-academic) ESL classes. Keyboarding skill level was determined based on typing test scores (see Table 1).
Descriptive statistics for typing skills test results by ELP and keyboarding skill group.
10 graduates and 16 undergraduates; ** 15 graduates and 11 undergraduates.
Half the participants (52%) were males. Their ages ranged between 18 and 46 years (M = 24, SD = 5), with the majority (n =71) being between 18 and 25 years old. They spoke 25 different first languages, with the majority being L1 speakers of Chinese (n = 24), Spanish (n = 10), Farsi (n = 9), and Korean (n = 7). The great majority (82%) of participants were in Canada for less than one year at the time of data collection; the remaining participants, all in the high ELP group, were in Canada for between 1 and 2 years. The high ELP group included 25 graduate and 27 undergraduate students. The graduate students were studying a variety of disciplines including business and economics, engineering and hard sciences, and humanities and social sciences. The undergraduate students reported a range of intended majors, including business, finance, or economics, hard sciences or engineering, health sciences, and humanities and social sciences. Less than half (42%) reported that they had taken the TOEFL before. Only a quarter of the participants (26%) reported that they had taken TOEFL preparation classes before. Less than half the participants (42%) reported that they had taken a writing test on the computer before.
Research tools
Typing skills tests
Students who responded to the call for participation were asked to perform two online typing tests to measure their typing speed and accuracy in English. Each typing test consisted of typing a 200-word passage, presented at the upper half of the computer screen, into a blank text box located at the lower half of the screen (www.assesstyping.com). Participants were instructed to type each text as quickly and as accurately as possible within two minutes. Participants did not have access to any editing functions when typing the texts; nor were they allowed to edit what they type. The online typing tests provided three measures of keyboarding skills (cf. Russell, 1999; Horkay et al., 2006):
Gross typing speed: Number of typed words per minute (WPM) not adjusted for typing errors. 1
Accuracy percentage: The percentage of words typed correctly out of all typed words. For instance, an accuracy of 75% means that three quarters of the words (that the student typed in 2 minutes) were typed correctly.
Net typing speed: Gross typing speed (WPM) adjusted for typing accuracy (i.e., accuracy percentage).
The correlations (Pearson r) between the results for the two typing tests were high, with the correlation for net typing speed being .94. The correlation (Pearson r) between gross typing speed and accuracy percentage, an indicator of the trade-off between typing accuracy and typing speed, was .39 for average scores across the two typing tests.
Based on the results for a sample of 15,000 test-takers (M = 35WPM, SD = 10), assesstyping.com (pers. com., November 29, 2010) recommended using a net typing speed of 40WPM and an accuracy percentage of 95% as a cut-off score, with test-takers typing above these cut-scores being considered to have high typing speed and those below these cut-scores being considered to have low typing speed. Following this recommendation, two cut-scores were set for this study. First, to be classified into the high keyboarding skills group, a test-taker needed to achieve a net typing speed of 40WPM or more. In order to distinguish keyboarding skill groups, a decision was made to include in the low keyboarding speed group only those volunteers with a net typing speed that is at least one SD below the cut-score for the high keyboarding skills group. Consequently, only volunteers with net typing speed of 30WPM (i.e., 40 WPM – SD 10) or less were included in the low keyboarding speed group. Table 1 reports descriptive statistics for the typing skill test results (average of two typing tests) by ELP and keyboarding skill group.
Table 1 shows that the SD for net typing speed is larger for the high keyboarding skill groups than for the low keyboarding skill groups. This was the case because the typing speed of participants in the latter groups varied between 12 and 30 WPM, while that for the former groups varied between 40 and 68 WPM. In contrast, the low keyboarding skill groups had larger SD for accuracy percentage than did the high keyboarding skill groups, because the accuracy percentage for the latter varied between 95% and 100%, while that for the former varied between 60% and 95%.
Writing tasks
Three writing tasks were used in this study, two independent and one integrated task. The independent writing tasks consisted in writing an essay about a general topic in 30 minutes, while the integrated task consisted in listening to a lecture and reading a text about a topic for 5 minutes and then writing a summary of both the lecture and the reading in 20 minutes. The three tasks, obtained from the TOEFL-iBT Form Creator Software, are representative of TOEFL-iBT writing tasks. A paper-based version of one independent writing task was administered to the participants at the beginning of the study. The other independent and integrated tasks were administered to the participants on the computer.
Data collection procedures
Recruitment emails and flyers were sent to international post-admission (undergraduate and graduate) and pre-admission (ESL) students. Students who responded to the recruitment emails and flyers were instructed to complete the online typing tests. Based on the results of the typing tests, students were selected to participate in the study.
Each student then completed an Informed Consent Form and responded to the paper-based independent writing task (30 minutes) in a small group (of four to six students) in a classroom. One week later, each student responded to the CB integrated task (25 minutes) and then the CB independent task (30 minutes) on a local PC. With both CB tasks the participants had access only to three editing functions: cut, paste, and undo. The CB writing tasks were completed in a computer lab in small groups (of four to six test-takers), with a short break between the two tasks. Students were seated at work stations that were far apart from each other. Finally, each student completed an online questionnaire about their backgrounds (e.g., age, gender, L1, discipline). Participants were paid for participating in the study.
Each writing sample was rated by two independent, trained TOEFL-iBT raters at ETS on the five-point holistic rating scale for the TOEFL-iBT writing section. Inter-rater reliability (Cronbach’s Alpha) was .88 for the PB independent essays; .94 for the CB integrated essays; and .87 for the CB independent essays. When there was a difference greater than one point between the two raters’ scores, a third rater scored the essay in question. Out of 291 essays (97 students by three tasks), only five (2%) were rated by a third rater. The final score is, in most cases, the average of the scores from the first two raters. When the essay was rated by a third rater, the closest two scores were averaged to obtain a final score for the essay.
Data analysis
Various statistical analyses were conducted to address the research question of the study. First, descriptive statistics and correlations among variables were examined. Second, repeated-measures univariate analysis of variance (ANOVA) was conducted to compare scores across the three writing tasks. Third, repeated-measures univariate analysis of covariance (ANCOVA) was conducted to examine the effects of keyboarding skill, ELP, and task type on CB task scores while adjusting for one covariate: scores on the PB independent writing task, a measure of test-taker L2 writing ability. Task was the repeated measures (within-subject) independent variable (integrated vs. independent). Between-subject independent variables consisted of ELP group (low vs. high) and keyboarding skill level (low vs. high). Fourth, hierarchical multiple regression analyses were conducted to examine the relationships between ELP, scores on the PB writing task, and keyboarding skills, on the one hand, and scores on each of the two CB writing tasks, on the other. The purpose of these analyses was to examine whether the strength of association between keyboarding skill and CB task scores varied depending on task type. Examination of the score data (following guidelines in Field, 2009) indicated that they met the assumptions of all statistical tests. For analysis of variance, partial eta-squared (partial η2) is used as a measure of effect size. Partial η2 ≥ .01 indicates a small effect size; partial η2 ≥ .09 indicates a medium effect; and partial η2 ≥ .25 indicates a large effect (Field, 2009).
Findings
Table 2 displays descriptive statistics for scores for the three writing tasks by ELP and keyboarding skill group. It shows that, overall, the score means are similar across the three writing tasks; that the high ELP group obtained higher scores on average than did the low ELP group on each of the three tasks; and that the group with high keyboarding skills obtained higher scores on average than did the group with low keyboarding skills on each of the three tasks, including the PB writing task. A repeated-measures ANOVA for all the participants detected no significant differences in scores across the three writing tasks: F(1.64, 157.51) = 2.07, p > .05. 3
Descriptive statistics for task scores by task and ELP and keyboarding skill group.
Table 2 shows that test-takers with low keyboarding skills obtained slightly higher scores on the CB independent task than they did on the PB independent task. For the high-ELP group, participants with high keyboarding skills obtained higher scores than participants with low keyboarding skills on each of the three writing tasks. A two-way ANOVA was conducted to examine the effects of ELP and keyboarding skill level on PB independent task scores. Theoretically, keyboarding skills should not affect PB task scores. ANOVA detected no significant interaction effect (F(1, 94)= .78, p >.05), but there was significant main effects for ELP group, F(1, 94)= 32.11, p < .05, partial η2 = .26, and keyboarding skills group, F(1, 94) = 5.64, p < .05, partial η2 = .06. As Table 2 shows, the high-ELP group obtained significantly higher scores than did the low-ELP group. Though the effect size was small, the high keyboarding skill group obtained significantly higher scores than did the low keyboarding skills group on the PB task.
Table 3 reports the correlations among scores on the three writing tasks for all participants and by ELP and keyboarding skill group. It shows that, overall, these correlations were around .60, but that they varied within and across groups. For example, the correlations between task scores are generally lower for the low-ELP group than those for the high-ELP group. These correlations are also lower for the low keyboarding skill group than those for the high keyboarding skill group. Both ELP and keyboarding skill seem to have influenced the strength of the relationships between task scores, particularly those between PB task scores, on the one hand, and scores on the CB tasks, on the other.
Pearson r correlations among writing task scores by test-taker group.
p < .05; ** p < .01.
Effects of ELP, keyboarding skills, and task type on CB task scores
A repeated-measures ANCOVA was conducted to examine the effects of task type (within-subject variable), ELP, keyboarding skill, and interactions among them on CB task scores while adjusting for scores on the PB independent writing task (covariate). ELP is a measure of overall L2 ability, while PB task scores are a measure of L2 writing ability. Repeated-measures ANCOVA detected no significant task effects on scores: F(1, 92) = 1.37, p > .05. There were no significant interaction effects between task type, ELP, PB task score, or keyboarding skill either. The covariate, PB task score, was significantly related to the dependent variable, CB task score: F(1, 92)= 18.16, p< .01, partial η2 = .17. Test-takers who obtained higher scores on the PB writing task obtained higher scores on both CB tasks (see Table 3). After adjustment for PB task scores, the main effects of ELP (F(1, 92)= 22.99, p< .01, partial η2 = .20) and keyboarding skill (F(1, 92)=5.12, p < .05, partial η2= .05) on CB task scores were significant. These main effects were qualified by a significant interaction between ELP and keyboarding skills: F(1, 92)= 4.56, p<.05, partial η2= .05.
The significant interaction between ELP and keyboarding skills indicates that CB task scores for ELP groups were affected differently by keyboarding skills. Because the significant interaction effect makes it difficult to interpret the main effects of ELP and keyboarding skills, two strategies were used to examine the cell means in Table 2 as recommended by Keppel and Wickens (2004). First, follow-up analyses of simple main effects were conducted to compare the CB task scores of the two keyboarding skill groups within each ELP group separately. These analyses indicated that the difference between the two keyboarding skill groups within the low-ELP group was non-significant (F(1, 42)= .02, p> .05), while that between the two keyboarding skill groups within the high-ELP group was significant (F(1, 49)= 9.38, p< .05, partial η2 = .09). 3 As Table 4 shows, the difference between the two keyboarding skill groups within the low-ELP group (.03) was smaller than that between the two keyboarding skill groups within the high-ELP group (.56).
Adjusted marginal means for CB task scores by ELP and keyboarding skill group.
Covariate (PB task score) is evaluated at the following value: 3.32.
Second, the cell means of the four groups (2 ELP levels by 2 keyboarding skill levels) were compared statistically. A repeated-measures ANCOVA, with task type (within-subject), group (between-subject), and PB task score (covariate), detected significant main effects for group: (F(1, 92)=9.51, p< .01, partial η2= .24). Follow-up pairwise comparisons with Bonferroni adjustment indicated that the group with high ELP and high keyboarding skills obtained significantly higher scores on CB tasks than did each of the other three groups (p < .01). The differences between the other three groups were not significant (p> .05). These patterns suggest that keyboarding skills moderate the relationship between ELP and task scores (see Figure 1). On average, participants with high ELP seemed to benefit more from higher levels of keyboarding skills than did students with low ELP. 4

Marginal means for CB task scores by ELP and keyboarding skill group.
Task effects on the relationships between CB task scores and ELP and keyboarding skills
To examine whether the relationship between keyboarding skills and CB task scores varied across tasks, two regression models, one for each CB task, were examined. Both models included the same predictors and used a hierarchical (or forced entry) regression method, with ELP (overall L2 ability) entered first, followed by PB task score (L2 writing ability), and then keyboarding skill level. Keyboarding skill was entered last in order to identify the impact of this construct-irrelevant factor after controlling for the effects of the two construct-relevant factors in the study: ELP and PB task score.
The outcome variable for the first regression model is CB independent task score. Table 5 displays statistics for the three regression models. It shows that R was significantly different from zero at the end of the first, second, and third steps. Model 1 (ELP) had a significant adjusted R2 = .31. The addition of PB task score (Model 2) improved the model significantly: R2 change = .14. Adding keyboarding skill level (Model 3) also improved the model significantly: R2 change = .03. Model 3 accounted for 47% of the variance in CB independent task scores, with ELP accounting for 31% of the variance, PB task scores for 13%, and keyboarding skill for only 3%.
Model summary for regression models for CB task scores.
Predictors: ELP; b Predictors: ELP, PB task score; c Predictors: ELP, PB task score, keyboarding skill.
Table 6 reports the unstandardized and standardized coefficients for Model 3. It shows that the coefficients for the three predictors are significantly different from zero. The standardized Beta values for PB task score indicates that as PB task scores increase by one standard deviation (SD= .79), CB independent task scores increase by .38 SD. The standard deviation for CB independent task scores is .77, and so this constitutes a change of (.38 × .77 =) 0.29 points. This interpretation is true only if the effects of ELP and keyboarding skills are held constant. A change in ELP group (from low to high) results in an increase in CB independent task scores by (.33 × .77 =) 0.25 points, while a change in keyboarding skill group (from low to high) results in an increase in CB independent task scores by (.20 × .77 =) 0.15 points.
Regression coefficients.
Table 5 displays statistics for the three models for the second regression model with CB integrated task score as the outcome variable. It shows that R was significantly different from zero at the end of the first and second steps. Model 1 (ELP) had a significant adjusted R2 = .28. The addition of PB task score (Model 2) improved the model significantly: R2 change = .10. Adding keyboarding skill level (Model 3), however, did not improve the model significantly: R2 change = .01 (p > .05). Model 2 accounted for 38% of the variance in CB integrated task scores, with ELP accounting for 28% of the variance and PB task score for 10%.
Table 6 shows that the coefficients for ELP group and PB task score are significantly different from zero. Overall, the standardize Beta value for PB task score indicates that as PB task scores increase by one standard deviation (SD= .79), CB integrated task scores increase by .34 SD. The standard deviation for CB integrated task scores is 1.24, and so this constitutes a change of (.34 × 1.24 =) 0.42 points. A change in ELP group (from low to high) results in an increase in CB task scores by (.33 × 1.24 =) 0.41 points.
Summary and discussion
This study examined the effects of task type, keyboarding skills, and English language proficiency (ELP) on scores on TOEFL-iBT writing tasks. Overall, the findings of the study indicate that the effects of keyboarding skills on TOEFL-iBT writing task scores, though significant, are weak, after controlling for the effects of overall English language proficiency and writing ability in English. Additionally, task type did not have a significant impact on task scores, while ELP and PB task scores both had significant, strong, and positive associations with CB task scores.
The findings of the study suggest that the effects of keyboarding skills on performance on CB writing tasks may be task dependent (cf. Burke & Cizek, 2006). The independent task seems to have required more writing (and typing) and to be more cognitively demanding as it requires the generation, planning, organization, and typing of more content compared to the integrated task which involves summarizing ideas from the reading and listening. Keyboarding skills, thus, seem more likely to affect performance on the CB independent task, but this effect was small.
There was also a significant ELP by keyboarding skill interaction effect on CB task scores. While weak, this interaction effect suggests that keyboarding skills moderate the relationship between English language proficiency and CB task scores. Overall, test-takers with high ELP seem to benefit more from higher levels of keyboarding skills than test-takers with low ELP when responding to CB writing tasks.
As expected, overall English language proficiency (i.e., ELP group) and English language writing ability (i.e., PB task scores) had a higher association with CB task scores than did keyboarding skills or task type. This lends support to the claim that performance on TOEFL-iBT writing tasks depends primarily on test-taker English language proficiency and writing ability. However, correlations between task scores varied across ELP and keyboarding skill groups (see Table 3), suggesting that ELP and keyboarding skills affect the relationships between performance on different task types (integrated vs. independent) as well as the relationships between performance on different writing modes (PB vs. CB).
In response to the finding that test-takers with low keyboarding skills tend to obtain lower scores on CB writing tests, several authors have recommended allowing test-takers to choose to handwrite or type their responses (e.g., Burke & Cizek, 2006; Horkay et al., 2006; Russell, 1999; Wolfe & Manalo, 2005). For example, Bennett (2003) cautioned that, because writing tasks make heavier demands on computer skills, there is a greater chance “for an interaction with computer proficiency, such that students who routinely do academic work on the computer are more accurately assessed in that mode while others may be better tested on paper” (p. 5). Similarly, Horkay et al. (2006) argued that writing on the computer may not be the same as writing on paper and that conducting writing assessment in a single mode will underestimate performance for those test-takers not given the opportunity to write in their preferred mode.
However, there are practical, empirical, and theoretical reasons for not offering test-takers options in terms of writing mode. First, the findings of this study suggest that the impact of keyboarding skills on task scores was weak and varied across tasks and ELP levels. For some writing tasks, keyboarding skills may not affect test scores significantly. Additionally, keyboarding skills seem to make a difference only for test-takers with high ELP, as the significant ELP by keyboarding skill interaction indicates.
Second, TOEFL-iBT allows test-takers to take and make notes on paper during the test (i.e., a mixed mode). For example, it was observed that many participants with low keyboarding skills in this study planned and drafted their responses on paper first and then typed them on the computer (Barkaoui, K). However, allowing test-takers to plan and draft their responses on paper addresses the concerns noted by Bennett (2003) and Horkay et al. (2006) above only partially. Test-takers who draft on paper may still need more time to type their responses, particularly if they have low keyboarding skills.
Third, and more important, most writing for academic purposes and contexts is computer-mediated. Keyboarding proficiency is, thus, often essential for efficient and effective writing and success in academic, as well as professional, settings (Burke & Cizek, 2006; Chapelle & Douglas, 2006). As Chapelle and Douglas (2006) have argued, because students now spend much of their time reading and writing on the computer, the constructs of reading and writing “might best be reflected in computer-assisted test tasks” and authentic language assessment tasks, particularly writing tasks, need to engage test-takers in language use through the computer (p. 94). “So integral is the computer to the writing process,” Chapelle and Douglas emphasized, “that the idea of assessing writing ability with a paper-and-pencil writing test would be recognized by most academics as introducing bias into the measurement” (p. 94). From this perspective, the contribution of test-takers’ keyboarding skills to CB test scores is not really irrelevant to language ability “as it should be defined in the twenty-first century” (p. 45). This means that the construct assessed by TOEFL-iBT writing tasks, writing in English as a second language for academic purposes and contexts in North America, could be re-defined to include keyboarding skills.
Fourth, the recommendation to allow test-takers to choose to handwrite or type their responses seems to imply that test-takers with low keyboarding skills would get a higher score on a handwritten essay than on a CB essay. The data for this study, however, suggests that this may not be true. For example, test-takers with low keyboarding skills obtained slightly higher scores on the CB independent task than they did on the PB independent task (see Table 2). Additionally, differences in handwriting skills may now be a bigger barrier to fair and valid assessments than differences in word-processing skills, and word-processing skills are probably more construct-relevant as most university writing will use a word processor. It is also likely that the growing use of technology, as populations become more greatly exposed to technology-mediation in all that they do, could improve computer skills.
Finally, there is evidence in this study that students with high L2 writing ability tend to have higher keyboarding skills than those with low L2 writing ability. As noted above, it was difficult to identify university students who have high ELP and low keyboarding skills or low ELP and high keyboarding skills for this study. Additionally, the correlations between keyboarding skills and CB and PB task scores were generally positive and significant. For example, net typing speed (WPM) had high and significant correlations with PB writing task scores (r = .38, p < .05) and CB independent task scores (r = .46, p < .01). Moreover, keyboarding skills had a significant association with scores on the PB task that should not be affected by keyboarding skills (see Table 3). This finding suggests that keyboarding skill, as measured in this study, is likely a proxy for more than just keyboarding skill. This is not surprising because test-takers with higher L2 writing proficiency very likely write more frequently, most probably on the computer, than do less proficient test-takers. Put differently, L2 learners who write frequently, most likely on the computer, are likely to have both higher keyboarding skills and higher L2 writing abilities. Previous research also indicates that test-takers with low ELP are more likely to choose to handwrite their responses and to obtain lower scores on CB writing tests (e.g., Breland et al., 2004; Wolfe & Manalo, 2005). It is thus possible that the two skills, writing and keyboarding, are correlated in the population from which the sample for this study was drawn: University L2 students in North America.
This hypothesis, however, may apply to university students only since this population tends to use computers more frequently than students at secondary schools. Second, it may apply to L2 students from particular regions of the world, but not others. For example, many participants with high keyboarding skills in this study come from particular countries (e.g., South Korea, Mexico) and first languages (e.g., Spanish), while many participants with low keyboarding skills come from other countries (e.g., Iran, Saudi Arabia, Bangladesh) and first languages (e.g., Arabic, Russian, French). Breland et al. (2004) and Wolfe and Manalo (2005) also found that test-takers from particular regions of the world and particular L1 backgrounds are more likely to choose to handwrite, rather than type, their responses to the TOEFL-CBT writing task. This may indicate that the relationship between keyboarding skills and L2 writing ability depends on test-taker country of origin and first language. Country of origin may be an indicator of the level of access to and/or use of computers in some regions of the world (Wolfe & Manalo, 2005). The hypothesized relationship between writing and keyboarding proficiency may not apply to test-takers from countries with limited access to computers and/or where writing on the computer is not a common practice. First language is relevant in relation to the language of the computer and keyboard layout. Test-takers who are used to writing on the computer in their first language, may find it challenging to write using an English keyboard. The layout of the keyboard varies significantly across languages, including across Indo-European languages (e.g., English, Spanish, French). Even if there is a high correlation between keyboarding skills and writing ability, the use of a keyboard with a different layout can weaken this relationship significantly. It is also possible that the larger the difference between the L1 and English keyboard, the more demanding typing in English is, which can affect writing performance negatively.
Recommendations for practice and research
The study has its limitations. In particular, it included small samples of participants and tasks; the order of the writing tasks was not counterbalanced across participants and writing modes; the study was done in an experimental, rather than a test, context; and the test-takers were assessed in North America. The participants may not be representative of the population of test-takers in their home countries. Additionally, ELP was not clearly measured or defined in the study, while keyboarding skill is likely a proxy for more than just keyboarding skill as noted above. Both keyboarding skill and English language proficiency were treated as dichotomous, rather than as continuous, variables in this study. This did not allow the examination of questions such as whether there is a threshold level of keyboarding skills below which test performance is affected severely.
Another factor not considered in this study is raters’ tendency to respond differently to PB and CB essays. Such tendency might explain some of the differences across tasks and test-taker groups in this study. For example, some studies (e.g., Powers et al., 1994; Russell & Tao, 2004) found that raters tend to assign higher scores to PB essays. If raters in this study followed the same pattern, the benefit of the CB task for low keyboarding skill students might be actually even larger than it appears. Additionally, the keyboarding skill effect is significant even after controlling for overall writing ability. This suggests that there are some unmeasured differences between the high- and low-ELP groups not captured by the covariates used in the study. Finally, the analyses in this study focused on group differences. An important question is whether any individual was dramatically affected by the condition of typing. In general, keyboarding skill had a small effect, but the performance of some participants could have been affected significantly by keyboarding skills, which would raise fairness issues.
Despite these limitations, the current study suggests some important implications and points to several areas for further research. First, given the significant, though small, effects of keyboarding skills on task scores, it is important to make test-takers aware of these potential effects and to advise them to practice writing on the computer frequently before taking the test in order to improve both their keyboarding skills and their ability to write in English (cf. Breland et al., 2004). Such advice can be included in test preparation materials, in information about the test, and in instructions given to test-takers when they register to take the test. However, this can raise concerns about test fairness given variability in access to computers among the test population (Wolfe & Manalo, 2005).
The study points to several areas for further research. First, this study was a part of a larger project that examined the effects of test-taker ELP and keyboarding skills on test scores and writing processes using keystroke logging programs and stimulated recalls (Barkaoui, K, 2013 a, b, c). Examination of test-takers’ writing processes is expected to shed light on the findings of score analyses reported above. Second, examination of the linguistic and discourse characteristics of the essays collected in this study could also shed light on the effects of ELP and keyboarding skills on test-takers’ performance. Third, the hypothesis that keyboarding skills and writing ability are correlated in university L2 students in North America should be investigated empirically with large and representative samples of L2 students. Such research needs to examine the relationship between scores on keyboarding tests and scores on PB and CB writing tasks. The same hypothesis should also be examined in other contexts, particularly in regions where L2 learners have fewer opportunities to write on the computer. Fourth, it is important to examine whether and to what extent differences in keyboard layout across languages affect performance on CB writing tasks. Variability in computer access across regions and variability in keyboard layout across languages present potential threats for the validity of inferences based on scores from internationally-administered CB tests such as TOEFL-iBT. It is, thus, important to examine whether and to what extent they affect performance on CB writing tasks.
Finally, the findings of this study point to a possible additive contribution of keyboarding skills to performance on CB writing tests for test-takers with high ELP. It is also possible that there is a level of keyboarding skill (a threshold) below which writing performance on the computer can be affected drastically, particularly for test-takers with low ELP. Future studies could include a large sample of test-takers with a range of levels in terms of ELP and keyboarding skills to examine these hypotheses and to find out the level of keyboarding skill required before test-takers can begin to demonstrate similar qualities of writing when writing on the computer as they do when writing on paper. Such a program of research could provide important evidence concerning the validity arguments of CB tests of L2 writing ability and contribute to theory on the relationships between writing conditions (e.g., writing mode, task type), writer characteristics (e.g., keyboarding skills, L2 proficiency), and L2 writing performance.
Footnotes
Acknowledgements
I would like to thank the participants and Angelpreet Singh, Anna Martinez, Gerrenne Gunthrope, Mara Reich, and Xiaohui Zhu who helped with data collection.
Funding
This study was funded by TOEFL Research Program at Educational Testing Service. The opinions expressed in the article are those of the author.
