Abstract
This study addresses the current debate about the beneficial effects of text processing software on students with different working memory (WM) during the process of academic writing, especially with regard to the ability to display higher-level conceptual thinking. A total of 54 graduate students (15 male, 39 female) wrote one essay by hand and one by keyboard. Our results show a beneficial effect of text processing software, in terms of both the qualitative and quantitative writing output. A hierarchical cluster analysis was used to detect distinct performance groups in the sample. These performance groups mapped onto three differing working memory profiles. The groups with higher mean WM scores manifested superior writing complexity using a keyboard, in contrast to the cluster with the lowest mean WM. The results also point out that more revision during the writing process itself does not inevitably reduce the quality of the final output.
Information technology is constantly transforming how knowledge is accessed and shared. The acquisition of knowledge is particularly critical within the field of education, where educational technology is increasingly used as a medium for both learning new information, as well as demonstrating knowledge, mostly—although not exclusively—in the form of written assignments using text processing software (Goldberg, Russell, & Cook, 2003). Especially for students struggling with a limited working memory (WM) capacity, technologies such as personal computers are generally viewed as advantageous. By reducing the mechanical demands of manual writing and facilitating revisions, capacity for higher-level processes is assumed to become available (Li & Hamel, 2003). However, Van Waes and Schellens (2003) point out that this same facility to revise at lower linguistic levels (e.g., correcting spelling or typing errors) might prevent writers from attending to revisions at higher levels, such as the structure of an argument (see also Haas, 1989). This means that keyboard-based word processing could provide an additional obstacle for students with a lower WM capacity. Although the number of typed (as opposed to written) assignments has increased in the last decades (Mogey, Cowan, Paterson, & Purcell, 2012), the relationship between WM and extended writing with or without technological assistance has never been explicitly investigated. The aim of this paper is therefore to fill this void, and empirically investigate how keyboard-based word processing affects students with different WM capacities during the process of academic writing.
Academic Writing and the Effect of Word Processing
Academic writing is commonly used as a form of assessment in education, especially in postsecondary institutions, to let students demonstrate their critical thinking and “deep understanding” of a certain topic (Mogey, et al., 2012). Writing is, however, one of the most complex skills demanded of students (Lindstrom, 2007). A possible reason for this is that writing involves multiple processes, including generating, organizing and synthesizing ideas, goal setting, the production of text, as well as the process of revision (Flower & Hayes, 1981; Hayes & Chenoweth, 2006). To manage these processes, students need a variety of skills, such as planning abilities, retrieving information, generating text (knowledge of syntax, semantics), monitoring and revising text, and so on (Flower & Hayes, 1980; Torrance & Galbraith, 2006). Planning is particularly important; many thinking processes involved in writing are not automatized, and given the limited capacity of our cognitive system, this requires coordination (Torrance & Galbraith, 2006). Although the production of a single sentence, which involves word retrieval, developing a syntactic structure, retrieving phonology, and motor planning, can occur in an automatized fashion (Chanquoy, Foulin, & Fayol, 1990; Torrance & Galbraith, 2006), producing a full text requires more cognitive resources, as each sentence needs to be connected to both the previous and next sentence, with the overall structure of the argument kept in mind (Torrance & Galbraith, 2006).
In the past two decades, students’ academic writing is increasingly performed using electronic devices, such as a (laptop) computer and text processing software (Goldberg et al., 2003). Studies have found that, in general, word processing has medium-sized positive effects on writing quantity—meaning that students writing with a keyboard produce longer texts—and small- to medium-sized positive effects on writing quality (see Bangert-Drowns, 1993, and Goldberg et al., 2003, for a meta-analysis). In Graham and Perin’s (2007) meta-analysis on effective instructional practices to teach writing, word processing had a medium positive effect (Cohen’s d = 0.55) on students’ writing quality, measured by an overall score that included factors such as the organization of the text, sentence structure, vocabulary, and tone. In addition, most college-age students can type faster than they can handwrite (MacKenzie & Soukoreff, 2002; Mogey et al., 2012), and essay length is positively correlated with the quality of word-processed texts (Lovett, Lewandowski, Berger, & Gathje, 2010). Finally, using text processing software stimulates revision (Goldberg et al., 2003; Mogey et al., 2012), and revisions made via keyboard potentially result in higher quality writing compared to revisions made using pen and paper (Goldberg et al., 2003). These positive findings suggest the potential effectiveness of word processing software for academic writing.
While research has shown that technology assists with lower-level writing processes such as proofreading, grammar, spelling and outlining (Li & Hamel, 2003), no empirical evidence explicitly suggests that these technologies facilitate higher-level conceptual writing skills needed to construct an argument. An early study by Haas (1989) showed that typing, as opposed to manual writing, obstructed the higher-level conceptual planning process of the participants (thoughts related to the structure and meaning of the text), while it facilitated sequential planning (focusing on lexical or syntactic characteristics). A possible reason for this might be that writers have more opportunities to change their text when using text processing software (cf. Case, 1985). Students seem to be aware of this as well. When Mogey et al. (2012) asked students if they would prefer using word processing software for their written (in class) examinations, students responded that they worried that they would “waste time making lots of small but essentially trivial changes” (p. 120). In a recent study, Mueller and Oppenheimer (2014) found that participants using a laptop while taking lecture notes performed significantly worse than participants who wrote manually on conceptual questions, but not on factual questions, suggesting that manual writing might enable people to process information at higher cognitive levels.
Only a handful of peer-reviewed studies to date have directly compared the real-time process of writing using a keyboard versus pen and paper (Berninger, Abbott, Augsburger, & Garcia, 2009; Gould, 1981; Haas, 1989; Van Waes & Schellens, 2003). Van Waes and Schellens (2003) asked 40 experienced writers to write a technical report. Keystroke software collected process information from the keyboard condition while videotape capture was used to log the equivalent information from the pen and paper condition. Process data including pausing and revision behavior were analyzed. The use of a word processor, even by skilled postgraduate students, produced more fragmentary pause time patterns. The participants also tended to revise more extensively at the beginning of the writing process, and attended more to lower linguistic levels (letters, words). This suggests that the facility of lower level revisions provided by computers may in fact actively, “distract the writer’s attention from the possibility of revision at higher levels” (Van Waes & Schellens, 2003, p. 833; see also Haas, 1989).
Working Memory Capacity and Academic Writing
Regardless of the writing mode, the process of writing is assumed to be heavily mediated by the working memory (WM) system (Olive, 2014), which has led to an increased research interest in the past decades (Berninger & Winn, 2006; Bourke, Davies, Sumner, & Green, 2014; Kellogg, 1996; McCutchen, 2000; Swanson & Berninger, 1996; Torrance & Galbraith, 2006; Vanderberg & Swanson, 2007). During writing, several cognitive processes access the limited resources of our WM system (Berninger & Winn, 2006; McCutchen, 2006; Olive, 2004, 2014; Swanson & Berninger, 1996). WM is thought to be central to the full writing process, including processes of planning, idea and text generation, and revision (Hayes, 2000; McCutchen, 2000; Olive, 2004, 2014). This makes writing a demanding process, using much of the limited capacity of the WM, which sometimes causes an overload (Swanson & Berninger, 1996). It has been found, for example, that increases in cognitive load lead to more subject-verb agreement errors (Fayol, Largy, & Lemaire, 1994).
Throughout the literature on writing and cognitive resources, the conceptualization of WM largely depends on Baddeley’s (1986) definition, stating that WM consists of two specialized memory systems, the visuospatial sketchpad (for spatial content) and phonological loop (for verbal content) to store information (Baddeley & Hitch, 1974). These are controlled by the central executive, that directs the flow of information in and out of these two specialized systems (Baddeley & Hitch, 1974; Cowan, 2016). Both phonological and visuospatial memory seem important for the writing process. While the role of phonological memory has been established in earlier studies (e.g., Berninger, 2009; Bourke & Adams, 2003), a recent study also found evidence for the role of the visuospatial working memory system in children’s writing development (Bourke et al., 2014). In the current study we use the work of Daneman and Carpenter (1980) and conceptualize WM not only as a temporary storage of phonological and visuospatial information, but also the processing of this information (see Cowan, 2016, for an interesting overview of WM conceptualizations). People with a high WM capacity are well-able to both retain information, as well as performing an additional operation on that information (Conway et al., 2005; Swanson, 1993; cf. Daneman & Carpenter, 1980), such as recalling the final words of each sentence after having heard a list of sentences (Daneman & Carpenter, 1983).
In order to become skilled, however, a writer also needs encoding methods and retrieval structures to quickly retrieve and use information stored in long-term memory. Ericsson and Kintsch (1995), based on their studies on expert performers, proposed that the working memory system includes another element, namely “long-term working memory.” This would allow for skilled use of accessible information storage in the long-term memory, and would hence extend WM capacity. More recently, Baddeley (2000) added an interim storage buffer to his WM model, the episodic buffer, which also integrates information from the WM systems with the long-term memory system (Berninger, Garcia, & Abbott, 2009). These revised models seem to be in line with research on the role of WM in writing (McCutchen, 2000). That is, while more advanced writers can move beyond the limited capacity of the WM system—possibly by making use of fluent encoding processes to retrieve knowledge from their long-term working memory—novice or struggling writers are hindered by the limited capacity of their WM system. For this latter category of students, reduction of cognitive task demands competing for limited WM capacity is thus an important goal.
Through this lens, text processing software is generally viewed as assistive. According to McCutchen (2006), manual writing may never be fully automatized, and hence, text processing is believed to reduce the mechanical demands, freeing up capacity for the higher level processes required for text generation (Berninger & Winn, 2006; Li & Hamel, 2003). However, as pointed out by Van Waes and Schellens (2003), the fact that word processors facilitate revisions on lower linguistic levels might actually prevent the writer from attending to revisions on higher levels, such as the structure of the argument, its content, or its complexity. Strikingly, using text processing might even obstruct the planning process before the first words are put on paper, even for experienced writers. Yet, this initial planning is considered important, as the writer starts to organize the text and explores possible content and thesis (Haas, 1989). In a study on the writing process of children with learning disabilities, Berninger, Abbott et al., (2009) report that for both children with and without learning disabilities, text processing software facilitated faster generation of separate letters, and sometimes even sentences, but not text generation, which puts higher demands on the WM system. Simply providing a computer, Berninger and colleagues conclude, is therefore not a solution for children struggling with writing assignments.
Thus, while keyboard-based word processing is generally viewed as assistive and increasing the writer’s cognitive capacity for higher-level thinking, other researchers have suggested an opposite effect. This means that text processing software could provide an additional obstacle for students with low WM capacity, by cluttering their limited WM with more accessible revisions at the word- or sentence-level. The relationship between WM and writing with or without keyboard-based word processing has, however, never been empirically investigated in terms of both quantitative and qualitative writing output.
Measuring Writing Output
Quantitative writing measures often cover the time spent writing, the number of words, a combination of these two (i.e., words per minute), or the number of grammatically correct sentences (Berninger, Abbott et al., 2009). Special categories of quantitative writing measures are pause times and revision patterns (Van Waes & Schellens, 2003). Pauses and revisions tell us something about the real-time thinking process of the writer. More pausing, especially within a sentence, may signal a more fragmented and inefficient writing process (cf. Schilperoord, 1996).
The quality of writing is usually assessed with a holistic measure (Bangert-Drowns, 1993). This often consists of (independent) raters’ assessment of the overall text, including its organization, sentence structure, the ideas conveyed, and the tone (Graham & Perin, 2007). In the study of Troia and Graham (2002), for instance, writing quality was measured on a scale from 1 to 8 for both organization, and for the clarity and definition of the arguments. While there is no doubt that such measures of quality reflect the complexity of the written text to some extent, it remains questionable if they can fully assess higher cognitive complex thinking. Ideally, a measure of quality should also be equally applicable to a variety of texts and topics, making it possible to assess different texts on the same scale.
Skill theory (Fischer, 1980; Fischer & Bidell, 2006) and its derivative the Lectical Assessment System, constructed to analyze test responses and essays (Dawson-Tunik, 2004), meet both criteria. Originally a theory of human cognitive development, skill theory proposes a hierarchical complexity scale along which people develop their understanding and other cognitive skills. The theory can be used to analyze people’s performances in a wide variety of domains, by informing researchers on the construction of cognitive skill structures, which enables them to localize these performances along the complexity scale (Stein, Dawson, & Fischer, 2010). This complexity scale is hierarchically organized in three consecutive tiers that each contain three hierarchically connected levels. The three levels on the sensorimotor scale depict a concrete, basic understanding, which form the basis for the next, representational tier that specifies more complex relations between basic elements, which, in turn, serves as the foundation for the abstract tier, encompassing valid abstract rules (Fischer & Bidell, 2006; Van Der Steen, Steenbeek, Van Dijk, & Van Geert, 2014). Texts that show higher complexity levels generally connect more elements, specify relations between elements, and go beyond concrete instances by forming more abstract rules (Stein et al., 2010).
The Current Study
Several researchers have called for more studies on the difference in cognitive demands between writing using a keyboard versus writing manually, especially when it comes to individual differences (Burke & Cizek, 2006; Chen, White, McCloskey, Soroui, & Chun, 2011; Lindstrom, 2007). At the same time, and as outlined above, there exists a debate about the beneficial effects of text processing software on the WM demands during writing, especially with regard to the ability to display higher-level conceptual thinking. The goal of this study is therefore to empirically investigate how text processing software affects students with different WM capacities during the process of academic writing. This goal is addressed within the specific context of expository text, a complex and highly utilized genre within higher education (Mogey et al., 2012). We do so by focusing on quantitative patterns of writing behavior (measures of pauses and writing time) as well as qualitative measures reflecting the hierarchical complexity of the text (measures derived from the Lectical Assessment System–skill theory).
Three research questions specifically guided our study: (1) What is the general effect of modality (manual writing versus word processing software) on the qualitative and quantitative writing output? (2) Which distinct groups can be distinguished based on their quantitative and qualitative writing output, and what are the within-group effects of modality on the writing output? (3) How do these writing output groups differ in terms of their WM capacity?
Method
Participants
The sample was drawn from the general population of a U.S. graduate education program. All students in the graduate program received a recruitment email, which resulted in a response rate of 19%. The recruited participants were subjected to three screening procedures. The first method of screening was an online survey that determined that all participants were native English speakers and used both writing modalities on a regular basis. Next, an individual interview was conducted to identify and exclude individuals diagnosed with health conditions that might affect the interpretability of the results, such as anxiety disorders triggered in a testing environment, epilepsy, or dysgraphia. Two individuals were excluded at this stage. Finally, participants undertook a behavioral assessment to screen for any significant graphomotor or literacy difficulties that would contraindicate participation in the study. The Test of Word Reading Efficiency (TOWRE-2; Torgesen et al., 2011), the Woodcock–Johnson III Broad Written Language–Spelling subtest (Woodcock, McGrew, & Mather, 2001), and the Purdue Pegboard (Tiffin, 1968, 2002) for graphomotor dexterity were administered. Following the exclusion criteria of two standard deviations above or below the mean, no participants were excluded. The remaining 54 participants (15 male, 39 female; Mage = 26, SD = 2.79; age range = 21-34 years; 18-25 years of education) signed an informed consent. The study was approved by the local university ethics committee.
Memory Assessment Measures
WM tasks are characterized by not only recalling presented information, but also performing an additional operation on that information (Conway et al., 2005; Swanson, 1993; cf. Daneman & Carpenter, 1980). 1 According to Swanson and Berninger (1996), who correlated several WM measures with writing skill, WM measures should include “language-related processes” (p. 379) when assessing individual differences in writing performance. Other authors have mentioned the lack of a single “pure” working memory task (Conway et al., 2005). We therefore decided to administer two renowned tests of verbal WM and look at converging evidence from across these measures.
WAIS-III digit span backward
The Wechsler Adult Intelligence Scale–III (Wechsler, 2008) measured verbal working memory, specifically the phonological loop component (Baddeley & Hitch, 1974). A list of sequences with random numbers was read aloud to the participant. At the end of each sequence, the participant had to recall the numbers in the reversed order.
Oral sentence span
This test (Daneman & Carpenter, 1980, 1983) measured the central executive function of verbal working memory span. The examiner read aloud from the set a sentence every 5 seconds. The participant repeated the sentences aloud and was asked to recall the final word of each sentence from the set at the end, in any order.
Materials
The participants were asked to write two essays responding to moral-dilemma writing prompts, one using a keyboard and one manually. The prompts were designed to be morally complex, and covered themes that require equal background knowledge. Participants were recruited from a graduate school of education, and so one prompt focused on an inappropriate teacher-student relationship, while the other focused on a case of students changing their grades as a result of computer hacking. In both cases, the participant was asked to indicate how strongly they agreed or disagreed with the deliberately ambiguous outcomes described, and asked to provide a written rationale for their response. Every effort was made to equate the conceptual and written demands of each prompt and so both were of equivalent length, with similarly structured response requirements. Alongside this structural equivalence, both prompts were analyzed by two researchers with Lectical Assessment expertise who were blind to the purposes of the study; the researchers were used to confirm that responses to each prompt could be considered from a comparable number of perspectives, that is, could contain similar levels of complexity.
Procedure
Each participant attended two experimental sessions. In each session, the participant was given a prompt to read. The participants were randomly assigned to both a writing modality and prompt for the first test session, and administered the remaining prompt in the other modality for the second session (prompts were randomized across the two modalities). The participants were advised to use 45 minutes to answer, but if additional time were required, it would be given. No one exceeded 1 hour. To prevent fatigue, all participants had several days at a minimum between tests.
For the experimental condition of writing manually participants used the Intuos 4 Wacom tablet, with a piece of lined self-adhesive paper placed on top of the tablet workspace and an inking pen stylus. By using Eye and Pen software (Alamargot, Chesnet, Dansac, & Ros, 2006; Chesnet & Alamargot, 2008), every pen stroke was recorded and time stamped to within a millisecond. Eye and Pen also measured the pressure level of the pen permitting the capture of up/down states, providing pause durations.
In the keyboard condition, a Dell laptop computer loaded with the software program ScriptLog (Stromqvist & Karlsson, 2002) was used. On the screen appeared a text box similar in appearance to most word processing software. The menu of features included the ability to delete, move throughout the document with the mouse or cursor, a return, and capitalization. Spell and grammar checks were not available. ScriptLog recorded every keystroke and keyboard event to the millisecond. In both conditions, participants performed small tasks to get used to the software (type/write their name and the alphabet), before beginning the experimental session.
Data Analysis
Writing output
Both ScriptLog and Eye and Pen enabled the researchers to replay the writing process and provided the following quantitative data: total writing time (measured from the first pen- or keystroke), total number of words, words generated per minute, and the ratio of pause time (pauses longer than 2 seconds) to active writing time. In addition, the essays were subject to an analysis of the complexity of their argumentation using the Lectical Assessment System (Dawson & Wilson, 2004), which is a psychometrically refined metric based on skill theory (Fischer, 1980; Fischer & Bidell, 2006). Two expert LAS analysts independently scored each essay as a whole, in order to yield a total complexity score (Dawson & Stein, 2010). The analysts received typed copies of the responses from both conditions, in order to keep them blind as to the modality of the response they were reading.
The Lectical Assessment System (LAS) assesses students’ performance by examining the level of complexity of the concepts present in the argumentation, and the logical structure of the argument (Dawson-Tunik, 2004, 2006; Dawson & Wilson, 2004). The LAS metric is particularly valuable in the cases of open-ended questions and dilemmas (Stein & Heikkinen, 2009), and is considered domain-general. Hence, the scoring is not bound to a particular task or domain (Dawson & Wilson, 2004). The LAS metric has exhibited high interrater reliability (80-97% agreement). With regard to the validity, the LAS corresponds well with scores assigned by other developmental metrics, such as Kohlberg’s Standard Issue Scoring System for morality, Armon’s Good Life Scoring System for evaluative reasoning, Perry’s Scoring System for epistemological understanding, and Kitchener and King’s Reflective Judgment Scoring System (agreement rates of 85% or greater; Dawson, Xie, & Wilson, 2003; Stein & Heikkinen, 2009). For the current project, the initial rate of agreement between two raters within one fourth of a level was 85.7%. After discussion of differences larger than one third of a level, it was 95.5%. All essays were then scored by these two raters. We took their average score for further analysis.
General effect of modality on writing output (Research Question 1)
We performed a repeated measures ANOVA to examine the within-subject effect of modality on the writing output measures. A power analysis in G Power 3 (Faul, Erdfelder, Lang, & Buchner, 2007) indicated that 39 participants should be included for this analysis to have 95% power for detecting a small sized effect, adhering to the .05 criterion of statistical significance. The observed p value was corrected by using the Bonferroni correction (multiplied by the number of comparisons). For each writing output measure, an effect size in the form of the partial η2 was calculated.
Which groups can be distinguished based on writing output? (Research Question 2)
To see if this multivariate data set comprised distinct groups with different performance patterns, we used Hierarchical Agglomerative Clustering (HAC) in the statistical program Tanagra 1.4.41 (Rakotomalala, 2005). We used participants’ standardized LAS complexity scores, total writing time, ratio of pause/writing, total number of words, and words per minute for each modality as the input variables of the clustering. The HAC method successively merges individuals who show a similar pattern of scores into groups (i.e., clusters), keeping the within-cluster variance at a minimum. The clustering starts with all individual subjects, joining the two most similar subjects together, after which the next two most similar are merged. From the resulting hierarchy, the appropriate number of clusters needs to be determined. For this, the gap statistic (Tibshirani et al., 2002) is widely used. The gap statistic compares the change in within-cluster dispersion with what would be expected under a null distribution, e.g., no clusters (Martinez, Martinez, & Solka, 2004; Tibshirani et al., 2002). We proceeded by validating the solution of the HAC using a K-means clustering technique (Hartigan & Wong, 1979; Steinley, 2006), to see if it provided the same number of clusters, and by comparing the two solutions with the chi-square statistic.
We then used the group characterization function in Tanagra 1.4.41 (Rakotomalala, 2005), to see if the clustering resulted in groups with lower and higher writing output scores. This group characterization was formalized in test values that show how much weight each variable has in a specific cluster, based on a comparison of the mean of each variable within a specific cluster (e.g., the mean writing time within Cluster 1) with the mean of this variable across the whole data set (e.g., the mean writing time within the whole sample). The test value asymptotically follows a Gaussian distribution, and absolute values greater than 2 signal that the value of the variable in a specific cluster is significantly different from its value in the rest of the sample (Rakotomalala, 2005).
Meaningful differences between the clusters, signaled by their test values, were further explored using Monte Carlo permutation tests to determine the p values. This method is particularly useful in the case of small sample sizes or unbalanced data sets (Anderson, 2013; Todman & Dugard, 2001). Taking the sample distribution into account, a Monte Carlo test measures the probability that a difference is caused by chance. This is done by drawing (in this case) 5,000 random samples from the original data, and determining how often the observed or a bigger difference occurs in these random samples. This number is then divided by the number of drawn samples (5,000), which produces a p value.
Characterization of groups in terms of working memory (Research Question 3)
Last, we used Monte Carlo permutation tests to examine the association between the clustering (based on the writing output measures), and the three working memory measures from the screening. Using this method, it was possible to see if observed trends in the clustering based on the writing output (i.e., clusters with higher and lower scores) could be linked to specific trends in the WM measures.
For both Research Question 2 and Research Question 3, where separate groups were compared, we calculated the effect size (Cohen’s d) by dividing the observed difference between groups by the pooled SD. A value of d between 0.2 and 0.3 is generally considered as a small effect, a value around 0.5 is considered as medium, and a value of 0.8 and higher is considered as large (Cohen, 1988).
Results
General Effect of Modality on Writing Output (Research Question 1)
Table 1 shows the summary statistics for written output variables of the repeated measures ANOVA. In general, all students received a higher complexity score in the keyboard condition, F(1, 53) = 8.39, p = .025, η2 = .14; they also wrote significantly more words, F(1, 53) = 31.59, p < .001, η2 = .37, and generated more words per minute, F(1, 53) = 38.38, p < .001, η2 = .42. The ratio of pause/writing was, however, significantly higher in the keyboard condition, meaning that students spent more time pausing instead of actively generating text, F(1, 53) = 24.29, p < .001, η2 = .31. Although the total writing time also seemed to be longer in the keyboard condition, no significant difference between the modalities was found. In sum, the results seem to point to a slight advantage for the keyboard condition (more words, faster writing, and a higher complexity score), apart from the ratio pause/writing.
Repeated Measures Statistics for Written Output Variables.
Note: Significance level adjusted using the Bonferroni correction (p value multiplied by the number of comparisons, i.e., 5).
Which Groups Can Be Distinguished Based on Writing Output? (Research Question 2)
Using HAC, three groups with a distinct writing output could be identified (n1 = 25, n2 = 14, and n3 = 15; R2 = .45), with a gap statistic of 1.03. To check the validity of the clustering, a K-means clustering was performed, which resulted in an almost exact copy of 3 clusters (n1 = 24, n2 = 14, and n3 = 16), χ2K-means-HAC (2, N = 54) = 102.6, p < .001.
Cluster 1 scores significantly higher on most variables for both modalities, apart from the ratio of pause/writing in the manual writing condition (M = 0.31), on which this cluster scores below average (test value = −1.89, d = 0.35), meaning that they spend less time pausing in this modality. Table 2, which contains the test values per variable, shows that this cluster often scores higher than the other two clusters, as indicated by the high positive test values (for the means and standard deviations, see Table 3). The participants in Cluster 2 have a significantly higher ratio of pause/writing time (i.e., they spend more time pausing) in both modalities (Mpen = 0.49, test value = 5.73, d = 3.35; Mkeyboard = 0.5, test value = 2.4, d = 0.88). They spend significantly more time writing in the keyboard condition than the other clusters (M = 28.91, test value = 1.2, d = 0.43), but not in the manual writing condition (M = 26.09, test value = .58), and generate less text per minute in both modalities (Mpen = 13.52, test value = −4.76, d = 2.27; Mkeyboard = 16.61, test value = −3.89, d = 1.63). Their complexity scores are not significantly different from the other two clusters (Mpen = 11.08, test value = −1.64; Mkeyboard = 11.26, test value = −.04). This cluster is thus characterized by both positive and negative test values, less extreme than the other two clusters. Cluster 3 generates significantly more words per minute than the other clusters for both modalities (Mpen = 21.25, test value = 2.97, d = 0.99; Mkeyboard = 27.68, test value = 2.4, d = 0.77), and spends less time pausing when writing manually (M = 0.25, test value = −3.49, d = 1.22). The complexity scores of participants in Cluster 3 are, however, significantly lower than the other two clusters (Mpen = 11.0, test value = −3.38, d = 1.16; Mkeyboard = 11.04, test value = −3.87, d = 1.39), as well as their total number of words (Mpen = 273.6, test value = −4.22, d = 1.59; Mkeyboard = 415.5, test value = −3.14, d = 1.06), and the writing time in both modalities (Mpen = 13.25, test value = −5.64, d = 2.76; Mkeyboard = 16.16, test value = −4.51, d = 1.76). In fact, Table 2 shows that the test values of this cluster are often negative, indicating that they score lower on the writing output measures than the rest of the sample.
Test Values of Writing Indicators for Each Cluster and Accompanying Effect Sizes.
Note: Positive test values indicate that the cluster scores higher than the other two clusters combined, whereas negative test values indicate that this cluster scores lower than the other two clusters combined. Means and standard deviations of the writing output measures can be found in Table 3.
p < .05. **p < .01 (p value derived from Monte Carlo permutation tests).
Within-Cluster Modality Differences and Significance.
Note: P value derived from Monte Carlo permutation tests.
Table 3 takes a closer look at the within-cluster modality differences, and shows that the slight advantage for the keyboard condition that was found across the sample, is also visible within each cluster. Cluster 1 has a higher complexity score (Mpen = 11.3, Mkeyboard = 11.4, p = .02, d = 0.59), higher number of words (Mpen = 595.64, Mkeyboard = 745.24, p < .01, d = 0.85), and generated more words per minute in the keyboard condition (Mpen = 19.35, Mkeyboard = 25.01, p < .01, d = 1.14). The total writing time is, however, roughly equal in both conditions (Mpen = 31.09, Mkeyboard = 30.56, p = .63), and the ratio pause/writing is higher in the keyboard condition (Mpen = 0.31, Mkeyboard = 0.42, p < .01, d = 1.16), indicating participants spent more time pausing while writing using word processing software. Cluster 2 shows roughly the same pattern, apart from the ratio of pause/writing time, which is not significantly different for the two modalities (Mpen = 0.49, Mkeyboard = 0.5, p = .38). Cluster 3 does not show a significant difference in LAS complexity scores between the two modalities (Mpen = 11.0, Mkeyboard = 11.04, p = .31). These participants seem to write longer in the keyboard condition (Mpen = 13.25, Mkeyboard = 16.16, p = .09), although this is not statistically significant. Just like Cluster 1, this cluster spent more time pausing in the keyboard condition (Mpen = 0.25, Mkeyboard = 0.4, p < .01, d = 1.57).
Generally speaking, Cluster 1 seems to perform well in both modalities compared to the other clusters, Cluster 3 performs worse, and Cluster 2 falls in between. An advantage for the keyboard condition is found for all clusters, although none shows a difference in total writing time between the two modalities, and the time spent pausing is generally higher in the keyboard condition. With regard to writing complexity, only Clusters 1 and 2 manifest a significant effect of modality, that is, superior writing complexity using a keyboard.
Characterization of Groups in Terms of Working Memory (Research Question 3)
In Table 4 the clusters are linked to the WM measures obtained during the screening. We tested whether the patterns found in the clustering of the writing output measure—one high-performing, one average, and one low-performing cluster—were reflected in the patterns of their WM measures. Cluster 1 scored higher than Cluster 2 on the digit backward span (Mdiff = 0.63), which in turn, scored higher than Cluster 3 (Mdiff = 0.42). A Monte Carlo analysis revealed that these differences between the groups were significant (pcluster1>2>3 = .02, d = 0.65). In addition, Cluster 1 had higher values on the oral sentence span measure than Cluster 2 (Mdiff = 1.91), and Cluster 2 scored slightly higher than Cluster 3 on this measure (Mdiff = 0.1). A Monte Carlo analysis revealed that these differences between the groups were significant (pcluster1>2>3 < .01, d = 1.94).
Statistics of Working Memory Measures (M and SD) for Each Cluster.
Note: p values derived from Monte Carlo permutation tests.
In sum, the patterns found in the clustering of the writing output measures were reflected in the patterns of digit backward span and oral sentence span. The results show that people who perform well on the writing tasks also score high on the WM tasks (Cluster 1). Moreover, the group that scores particularly low on the writing indicators, also scores low on the WM measures (Cluster 3).
Discussion
This study set out to investigate how text-processing software affects students with different WM capacities during the process of academic writing. The goal was addressed within the specific context of complex, expository text. Across the participants as a whole, significant within-subject differences were found in both quantitative patterns of writing behavior as well as qualitative measures reflecting the complexity of the text: Students wrote more words, and at a faster rate in the keyboard condition, as compared to the pen and paper condition, and in terms of final output, the essays written on the keyboard were of significantly greater conceptual complexity. These overall findings are in accord with previous studies: The two key meta-analyses of Bangert-Drowns (1993) and Goldberg et al. (2003) both report greater output and higher quality of output for keyboard writing as opposed to that generated by pen and paper. Equally, a faster rate of typing than hand-writing has previously been documented for college-age students (Mogey et al., 2012).
The current study went one step further by examining whether this global profile is influenced by individual differences in working memory, a factor known to play a significant role in effective writing (McCutchen, 2000). Using HAC, three distinct performance groupings were identified that mapped onto three differing working memory profiles. Replicating the findings of previous studies (see McCutchen, 1996, for a review), working memory performance was related to writing quality—the cluster (Cluster 1) with higher means on a range of working memory measures also exhibited higher means in their LAS complexity scores, while the cluster with the lowest means on working memory tasks (Cluster 3) achieved lower means in their complexity scores.
More uniquely, this study allowed us to see whether the higher writing quality scores in the keyboard condition occurred uniformly across the different clusters. This quality (LAS complexity) advantage for essays written by keyboard was observable for Clusters 1 and 2, but not for Cluster 3, which was the group with the lower working memory means. This cluster also performed below the overall mean on most writing output measures for both typing and writing. Given that neither of the modalities resulted in an advantage for these students, we can deduce that the writing process itself might place a burden on these students’ WM capacity (cf. McCutchen, 2000, 2006). For these students it would be helpful to investigate which aspects of the writing process are most affected by their WM capacity, which could help in constructing interventions aimed at the reduction of cognitive task demands competing for limited WM capacity while writing. As Swanson and Berninger (1996) put it, expert writers are less hindered by the limited capacity of their WM, as they have automatized many elementary writing processes, such as the structure of discourses and lexical access, leaving more capacity for generating and organizing ideas. It has been suggested that more advanced writers can move beyond the limited capacity of the WM system, possibly by making use of fluent encoding processes to retrieve knowledge from their long-term working memory (Ericsson & Kintsch, 1995; McCutchen, 2000, 2006). However, given that the students in Cluster 3 had a similar educational background compared to the other two clusters, it is questionable whether their performance is a matter of experience with writing, and that their limited WM capacity could be overcome by simply building expertise.
Regarding the total amount of text written, and writing rate, the three clusters all mirrored the main effects of more text and faster writing in the keyboard condition. The keyboard condition also elicited a higher pause/writing time ratio (i.e., more time spent pausing) in the overall group and in Clusters 1 and 3, while Cluster 2 spent more time pausing in both modalities in general. According to the literature, writing on a keyboard would appear to place different demands on the WM system as compared to writing using a pen (Berninger, Abbott et al., 2009; Berninger & Winn, 2006). As opposed to preplanning and then writing, the use of computers appears to encourage less initial planning and more thinking and revision once writing has begun (Haas, 1989; Van Waes & Schellens, 2003), which would be reflected in longer pause times. Indeed, the use of text processing software, with its user-friendly options to delete and retype text, seems to afford (cf. Gibson, 1977) revisions during the act of writing at the word- or sentence-level (Case, 1985; Mogey et al., 2012). In the current study, an increase in the time spent pausing for the keyboard condition was also seen, but equally, an overall quality advantage for essays written by keyboard was observable across the group. Thus it appears that more pausing during the writing process itself does not inevitably reduce the quality of the final output.
Strengths, Limitations, and Future Directions
The HAC uses a bottom-up procedure to cluster participants by using patterns in the multivariate data as the only input. We consider this an asset, as groups did not have to be determined a priori, and could be deduced based on patterns in no less than 10 writing output variables. While the HAC procedure yielded three groups with differing WM profiles, including a group with lower WM scores compared to the rest of the sample, note that our sample came from an institution for higher education, and was thus a high-performing group with considerable experience in writing expository texts. It would therefore be interesting to extend this study to other population groups, including less experienced groups, as well as students diagnosed with WM deficits, to see if the advantages of the keyboard condition over the pen and paper condition would still hold.
The program ScriptLog (Stromqvist & Karlsson, 2002) was used to record writing behavior in the keyboard condition. Despite that it is similar in appearance to most word processing software, ScriptLog has limited text editing functions, and although we gave participants time to familiarize themselves with the program (see the method section), they might have needed time to adapt. New keystroke logging programs with more editing options are currently available to overcome this (Leijten & van Waes, 2013).
Earlier research suggests that writing using a keyboard would elicit revision, mainly at lower linguistic levels (e.g., correcting spelling or typing errors). This would lead to an increase in pauses, but might also prevent writers from attending to revisions at higher levels (Van Waes & Schellens, 2003). In the current study, an increase in the time spent pausing for the keyboard condition was also seen, but equally, an overall quality advantage for essays written by keyboard. An important next step in this work will be studying the relationship between the specific location of pauses in writing (within a word, within a sentence, at sentence or at paragraph boundaries) and the characteristics and complexity of the final written output. More pauses within a word or sentence may signal a more fragmented and inefficient writing process (Schilperoord, 1996; Van Waes & Schellens, 2003). While doing so, one must take into account that pausing while using pen and paper may be different from pausing during keyboard text processing (cf. Van Waes, Leijten, Wengelin, & Lindgren, 2012). Moreover, pauses may not only reflect anticipatory planning of the next word(s), but may also represent the delayed effects of the previous word written, at least while writing manually (Maggio, Lété, Chenu, Jisa, & Fayol, 2012).
In future research, it will also be important to consider the relationship between WM and writing processes across different types of writing activity. In contemporary models of WM and writing relationships, WM is conceptualized as a mediator of writing process coordination (Kellogg, 1996; Olive, 2014). When the accumulated concurrent demands of writing processes exceed the limited WM capacity, then performance is compromised. However, this leaves many unanswered questions in terms of the mechanism of this performance compromise. Are specific writing processes more vulnerable to WM overload, exerting a cascading effect on subsequent processes, or do WM capacity issues have direct parallel effects across multiple writing processes concurrently? Given the ubiquity of writing within digital environments and the shift in writing processes this is precipitating, as seen in this study, we strongly recommend that future theory-building regarding relationships between working memory and writing is alert to writing modality and inclusive of digital modes.
Footnotes
Acknowledgements
The authors would like to acknowledge Vidya Gopinath and Chandana Jasti for their invaluable help with data collection and processing.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by a grant from the Harvard University Milton Fund.
