Abstract
We used an unobtrusive approach, keystroke logging, to examine students’ cognitive states during essay writing. Based on data contained in the logs, we classified writing process data into three states: text production, long pause, and editing. We used semi-Markov processes to model the sequences of writing states and compared the state transition time and probability for demographic subgroups that were matched on writing proficiency. Results suggested that the subgroups employed different processes in essay writing.
1. Introduction
A major attraction of keystroke logging is that the moment-by-moment process of text creation is recorded unobtrusively, allowing the text production process to be analyzed more naturally. From these logs, we can extract observables indicating various states of the writing process, where long pauses may suggest a planning state, text deletion and out-of-order insertion may suggest an editing state, and rapid uninterrupted text bursts may denote a text production state. These logs may help researchers understand cognitive writing processes in general as well as how individuals and demographic groups differ in their approaches to composition. The information provided by such logs goes well beyond what can be discerned from the final essay. Ultimately, researchers hope that the keystroke logs may provide informative advice to individual writers to improve writing quality.
Previous research on keystroke logs has found a variety of interesting results. For example, experienced and unskilled writers differ in their planning and revision patterns. More skilled writers mainly do conceptual revision (e.g., revise words related to the meaning and content of the text), while unskilled writers show more local corrections to punctuation, syntax, and spelling (Breetvelt, van den Bergh, & Rijlaarsdam, 1994; McCutchen, 1996; Hayes, 2012). In addition, compared to novice writers, more experienced writers tend to pause longer at natural text junctures (Matsuhashi, 1981; Schilperoord, 2002), reflecting engagement in higher level processes such as idea generation (Schilperoord, 2002). Zhang, Bennett, Deane, and van Riin (2019) found gender differences in essay writing as well.
Besides describing planning and revision patterns of skilled versus unskilled writers, keystroke logging research has examined the meaning and stability of different log features. Deane and Zhang (2015) found that the text burst length, an indicator of fluency, had a strong association with essay quality. Allen et al. (2016) found that keystroke indices accounted for 38% of the variance in the linguistic characters. Finally, Guo, Deane, van Rijn, Zhang, and Bennett (2018) discovered that the interkey interval in conjunction with the total writing time explained as much as 47% of the variance in writing scores; in addition, some keystroke features were quite consistent across essays written by the same students in response to parallel prompts.
A potentially valuable way to view writing processes is in terms of states. Kellogg (2001) classified these states into planning, translation, and revision, which compete for a limited common, general-purpose cognitive resource. Planning content involves idea generation and text organization. Translation includes the linguistic and motoric operations needed to generate sentences from the planned content. Revision involves reviewing and amending the text while detecting and correcting errors or problems either in the text or in the plan for the text. Subsequent to Kellogg, Hayes (2012) proposed a somewhat different conceptualization in which he distinguished planning, translating, and reviewing, but as subcomponents of a “transcription” state or process.
Other researchers have used keystroke logs to infer these states and to attempt to decompose them. For example, Baaijen, Galbraith, and Glopper (2012) focused exclusively on the procedures and measures to analyze pauses, bursts, and revisions, the basic units of analysis from keystroke logs in terms of cognitive models of writing. A principal components analysis identified three underlying dimensions in these data: planned text production, within-sentence revision, and revision of global text structure. A tagging process using think-aloud protocols (Kaufer, Hayes, & Flower, 1986, pp. 125–126) was also employed. More recently, Zhang and Deane (2015) used principal factor analysis and found a four-factor structure of 29 writing-process features extracted from keystroke logs. That structure included general fluency (an aspect of translation), major editing, local editing, and planning and deliberation.
The goals of the current study are to use keystroke logs to classify students’ writing processes into sequences of writing states and then use semi-Markov processes to model these sequences. In addition, we investigate subgroup differences in those writing processes, independent of writing proficiency. To address these research questions, we conducted three analytical steps. Step 1 is classification. Based on the previous studies, sequences of keystroke actions are classified into three states: P (long pause), E (editing, a subcomponent of revision), and T (text production). Long pauses are most saliently connected to planning or reviewing the text produced so far; they might also reflect other situations such as disengagement or students being frustrated or stymied because they are unable to generate content easily. Editing mostly includes out-of-order insertion and deletion. Finally, text production entails mostly uninterrupted text-generating bursts. This classification is similar to an automated annotation process in that it does not require manual tagging by writers, investigators, or other instruments as is the case for think-aloud or eye-tracking protocols (Leijten & Van Waes, 2013).
Step 2 is score matching. Using a statistical matching method, studied demographic subgroups are matched on essays scores to eliminate possible confounding of group writing proficiency-level differences with differences in group writing processes. By controlling essay scores, the differences in writing processes are more likely to be explained by the group characteristics instead of by writing proficiency.
Step 3 is process modeling. Because we are interested in the writing states (a categorical variable) and the time duration at each state (a continuous variable), we use the semi-Markov model (Krol & Saint-Pierre, 2015) that generalizes the multistage and continuous-time Markov chain model (CTMC; Jackson, 2011). Comparisons are made between subgroups in terms of how likely a typical writer is to move from one state to another and how long on average that writer stays at the current state (the sojourn time) before transiting to the next one. In addition, we compute summaries of keystroke features at the essay level to assist with interpretation of the semi-Markov results.
2. Method
2.1. Data
The data set came from a larger experimental study (Zhang, van Rijn, Deane, & Bennett, 2019) in which a base summative writing assessment form was administered along with three alternative forms varying in task order and/or topical focus. The sample from the base test form was used for analyses in this study. In the assessment, students were asked to complete a sequence of items related to the topic of whether advertisements directed at children should be banned and then write an argument essay on that topic. In the final data set, there were 257 eighth graders who had valid keystroke logs on the essay task, along with human scores for essay quality. Students were asked to finish their essays within 30 minutes.
The essays were scored against two rubrics, each on an integer scale ranging from 0 to 5. For the purpose of our analyses, we excluded responses that received a human score of 0 (a very small subset of the total), since that score point was used to denote essays with unusual response characteristics including empty, nonsensical, and off-topic responses; plagiarized responses; and responses consisting of random keystrokes. One of the scoring rubrics (denoted as RS1) evaluated basic writing skills (e.g., word usage, writing mechanics, syntactic variety, grammar, text organization), and the other rubric (denoted as RS2) evaluated student performance on such higher level skills as the quality of the argument. Human scores were computed for each rubric as a mean score taken across two raters, except when raters disagreed by more than one point, in which case a third rater was employed to resolve the discrepancy. We used the human scores as criterion variables in this study.
2.2. Writing States
To use a semi-Markov model, it is necessary to classify the keystroke logs into different states. For that, we use the following features in the raw keystroke data (Deane & Zhang, 2015): The raw keystroke logs are first grouped into chunks. Most of the time, the unit of a chunk is a word or a delimiter. A chunk may also be comprised of such text segments as sequences of delimiters, deleted word sequences, or inserted word sequences, among others. A burst is defined as a sequence of chunks without interruptions by long pauses. A burst break (or a long pause) is calculated continuously on the fly as writing proceeds for each writer in a way that it is adaptive using 4 times the median between-chunk pause length of all chunks thus far. A burst is typically comprised of more than one word, also called a phrasal burst. Between burst is a chunk of an isolated event (most likely a space, line or paragraph break, or comma or period at phrasal or sentence boundaries), accompanied by long pauses before and after it. A between burst is considered as indicative of long pause in this study. We further treat the initial pause at the beginning of a burst (including a one-word burst) as a meaningful pause indicating a long pause state. A special case arises when a chunk occurs between bursts and is comprised of only an alphanumeric word. In this case, the chunk is labeled as a one-word burst—a special burst case. Finally, if a burst contains deleted sequences, replaced sequences, out-of-order inserted sequences, or other major editing (e.g., typo corrections of more than two characters), it is classified as an editing state; otherwise, it is denoted as a text production state.
Duration/holding time for each state is captured using the interkey intervals in the keystroke logs in the time unit of 1 second. Generally speaking, the three states (long pause, text production, and editing) correspond to proposer/planner, translator and transcriber, and evaluator in Hayes’s (2012) cognitive writing process model.
Because of the nature of chunk data, the same states may appear consecutively. In the analysis, we combine the adjacent, same states into one. As a result, the transition probability of a state to itself is always zero. This decision leads to the triangular structure of the state relationships in Figure 1. Note that State E cannot immediately move to State T because there is always an initial pause P before T.

Transition relations between writing states. P = long pause; E = editing; T = text production.
We applied the above classification approach to each student’s keystroke log sequence. In our data set, there are 374 chunks on average across all 257 valid keystroke log sequences produced by these students (N = 257), with the largest chunk number being 918 and the smallest having 1 chunk only. We removed logs that were too short (
Figure 2 shows the state sequences of three students. The x-axis is time in seconds, and the y-axis is writing state. In the upper panel, the student’s total number of states is 113, and his or her RS1 score is 1; in the middle panel, the student’s total number of states is 161, and his or her RS1 score is 3; and in the lower panel, the student’s total number of states is 136, and his or her RS1 score is 4.

State sequences of three students. The x-axis stands for time in seconds, and the y-axis for state of E (editing), P (long pause), and T (text production).
2.3. Subgroups
Students’ background variables—gender, race, socioeconomic status (SES; whether students received free or reduced-price lunch)—and their writing scores assigned by human raters on both rubrics were obtained. Table 1 shows the mean writing scores for each subgroup. As expected, low SES students had lower essay scores compared to their peers, the relatively high SES students; males had lower essay scores compared to females; and Black students 1 had lower scores than White students. Analysis of variance (not shown) revealed that gender, race, and SES were associated with writing scores, a result also found in the National Asessment of Educational Progress (National Center for Education Statistics, 2011).
Summary Statistics for Human Scores for Each Subgroup
Note. SD = standard deviation. RS1 = scoring rubrics 1; RS2 = scoring rubrics 2.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
2.4. Semi-Markov Model
Markov chains or Markov models are often used in educational data analyses and other fields. Nevertheless, they may not always fit the data well, and more generalized models may be preferable. In the current study, we applied the semi-Markov models to our data.
The states in a Markov chain are characterized either by discrete time points or by a continuous time variable. The discrete-time Markov chain and CTMC (Jackson, 2011) both require the memoryless property that the probability of a future state depends only on the current state, and, consequently, the interarrival time (also called sojourn time, duration, or holding time) is either fixed or follows an exponential distribution. In cases when this assumption is too restrictive, semi-Markov models can be considered (Krol & Saint-Pierre, 2015).
To characterize a semi-Markov model, two so-called hazard rates are necessary. One is the hazard rate of duration time (or sojourn time, denoted as
When the distribution of the sojourn time follows the exponential distribution that has one parameter
2.4.1. Cox regression model
Our study focuses on subgroup comparison. Therefore, a covariate Z associated with the studied groups was used in the Cox proportional regression model (Cox, 1972) to compare group processes. The influence of Z is placed on the hazard rate
where
In this study, we use the R packages SemiMarkov (Krol & Saint-Pierre, 2015) for semi-Markov models to estimate related parameters, hazard rates
2.5. Matching Method
Because we are interested in evaluating subgroup differences in writing processes that may be attributed to group writing styles, instead of to differences in writing proficiency, we match the focal and reference subgroups on their writing scores. Using the original writing scores as a covariate in Equation 1 is possible only if we dichotomize the scores, which would lead to coarser matching. Exact matching would eliminate more process data given our limited data pool. Hence, we decided to employ the commonly used matching procedure, the propensity score method (Rosenbaum & Rubin, 1983), to preserve as much data as possible. Using the R package MatchIt (Ho, Imai, King, & Stuart, 2007), we estimate the propensity score of each student in both studied subgroups, with RS1 as a covariate, and then students in the reference group are selected and matched to those in the focal group, based on the method of choice. We used the “nearest” method in our implementation. This matching procedure is applied to prepare the data before fitting the semi-Markov model for subgroup comparison.
3. Results
In this study, we used the R language (R Core, 2018) to conduct statistical analyses and to produce graphs.
3.1. Total Group
In this section, we present results of model fit analysis and hazard rate estimation for the total group. Table 2 displays the empirical transition frequencies of the writing states across all students, where E, P, and T stand for editing, long pause, and text production. The numbers on the diagonal are the total numbers of E, P, and T in our data set, which are 4,665, 9,267, and 5,381. The off-diagonal numbers are the counts of transitions. For example, 3,809 (cell in the second row and the first column) is the count of transitions from P to E across all students. As noted before, in our sample, there were 231 students in total.
Empirical State Transition Table
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Table 3 shows the estimated parameters of the semi-Markov model using all data, where SD is the estimation error, LCI and UCI stand for the lower and upper points of the 95% confidence interval, and the last three columns are related to the Wald test: the null hypothesis of H0, test statistic, and p value.
Estimated Parameters of Weibull Distributions
Note. σ = scale parameter; ν = shape parameter. LCI = lower confidence interval; UCL = upper confidence interval; SD = standard deviation.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
In Table 3, all σ estimations are significantly larger than 1, indicating that the sojourn time has larger spread than 1. Of these estimations, the sojourn time from State E to P is the most dispersed. The estimated νs are not significantly different from 1 for the E-to-P transition, indicating that the sojourn time may be approximated by an exponential distribution; that is, the hazard rate of the sojourn time may be approximated by a constant for this transition; in this case, the duration time at E before P is memoryless. All the other estimated
Table 4 shows the estimated parameters for the semi-Markov model with
Estimated Parameters of Weibull Distributions
Note. σ = scale parameter; ν = shape parameter in the adjusted semi-Markov model (i.e.,
Source. Copyright by Educational Testing Service, 2019. All rights reserved.

Estimated density distributions of sojourn time in the semi-Markov model with
Figure 4 shows the estimated hazard rates of sojourn time for these transitions with the constraint

Hazard rates of the sojourn times in the semi-Markov model with constraint
Figure 5 shows the estimated transition intensities (the hazard rate of the semi-Markov process) in the model with the constraint

Hazard rates of the transition probability in the semi-Markov model with
Overall, from the above plots, we observed that the hazard functions were functions of time within a duration of 50 seconds or so. Beyond that, there were too few transitions to make meaningful claims.
For the following subgroup process comparisons, we used the full semi-Markov model without model selection (Berk, Brown, & Zhao, 2010). Similar general observations for the hazard rates were found for the subgroup analyses, which are not presented to avoid duplication. Instead, we focus on the subgroup differences from now on; that is, we focus on the beta parameters in the Cox regression model in Equation 1.
3.2. Subgroups
In each of the following analyses, a matching procedure (as described earlier) was conducted, so that the studied groups were equivalent in terms of their RS1 scores 2 and then the semi-Markov model was fit to the matched data with the covariate of the group variable (SES, race, or gender). In this way, we sought to obtain informative differences in writing processes between subgroups while conditioning on the same writing proficiency.
We present tables for β estimates to compare hazard rates of the sojourn time (A1). In those tables, when β was not significantly different from zero, the two studied groups were regarded equal in their sojourn time for the transition. Note that these hazard ratios of sojourn time are constants by the Cox model in Equation 1. However, the hazard ratios of the process, ratios of
In addition to the above results from semi-Markov models, for each subgroup, we provide summaries of the following variables as an aid to interpreting the results from the semi-Markov models. These variables are computed for each student: Number of words: the total word count in the essay. Word length: the median of the word lengths in the essay. Standardized frequency index (SFI): SFI of the unique words (SFI is a measure of English vocabulary complexity. The higher the value, the more common the word; a lower value indicates more complex vocabulary; Zeno, Ivens, Koslin, & Zeno, 1995). The median of SFIs is used as a summary statistic. Writing efficiency: the ratio of the number of characters in the final essay to the total number of keystrokes produced in the process. Seconds per word and seconds per character: the mean duration of words and characters (i.e., the total time divided by the total number of words and characters), respectively. Total time: the total writing time from the first keystroke to the last one in seconds.
Seconds per words and seconds per character are measures of keyboarding skills. Word length and SFI are measures of vocabulary complexity. The first three features are word features that can be observed in the final essay, and the rest are keystroke features that cannot be derived from the final essay product.
Note that in the following tables, a statistical significance test was conducted separately for each target quantity without adjusting the significance level of the critical value (i.e.,
3.2.1. SES
Table 5 shows the feature means and SDs for the SES groups. As expected, the two groups were quite close in RS1 writing scores as a result of the matching procedure (the second to the last column). The same holds true for the three word features: word count, word length, and SFI. However, given that they produced essays of equal quality in similar lengths of time, compared to the high SES group, the low SES group showed statistically lower writing efficiency and slower typing character-wise.
Summary Statistics of Essay Production by Socioeconomic Status Groups
Note. The t tests of mean differences for writing efficiency and seconds per character are statistically significant at p < .05.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Table 6 shows the estimated βs for the SES variable in Equation 1 from the semi-Markov model. The statistic follows a χ2 distribution of degree of freedom 1. From Table 6, the estimated coefficients
Estimated
Note. 1, 2, and 3 in the label column stand for E, P, and T, respectively. LCI = lower confidence interval; UCL = upper confidence interval; SD = standard deviation.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Figure 6 shows the logarithm of the hazard ratios of transition intensity

Logarithm of the hazard ratios of transition intensities for the low socioeconomic status (SES) group and for the high SES group. The dashed horizontal line stands for a log ratio of zero (when the transition intensities for the two groups are equal).
Overall, the above comparison results for the SES groups show that, to produce essays of the same quality as the high SES group, the low SES group spent significantly longer time at States T and E, which might be caused by their lower writing efficiency and slower character-level typing speed. In addition, they seemed to spend limited time in pauses/planning before producing text. Finally, they made less frequent transitions from state to state except for T-to-P, compared to the high SES group.
3.2.2. Race/ethnicity
Table 7 gives the summary statistics for the two racial ethnic groups examined, Blacks and Whites. Again in Table 7, the two groups were quite close in writing scores as expected from the matching procedure; they had similar word-related features as well (number of words, word length, and SFI). Given equal essay quality, however, the Black students had statistically significantly lower text efficiency than the White students but similar keyboarding skills. Although, on average, the difference in total writing time was not statistically significant, the Black students in our sample spent sizably longer time on essay writing (Ziliak & McCloskey, 2008).
Summary Statistics of Essay Production by Racial/ethnic Groups
Note. The t tests of mean score differences for writing efficiency are statistically significant at p < .05.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Table 8 shows the estimated β parameters for the race covariate from the full semi-Markov model. From Table 8, the estimated coefficients
Estimated
Note. LCI = lower confidence interval; UCL = upper confidence interval; SD = standard deviation.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Figure 7 shows the logarithm of the hazard ratios of transition intensities for the racial/ethnic groups in the full semi-Markov model. These results indicate that the Black students had a lower transition intensity for E-to-P than the White students; they also made less frequent transitions for T-to-P within 5 seconds or so but a much higher transition intensity than the White students after that time. For the remaining transitions, the Black students were more likely or equally likely to make change states within the 5-second duration but less likely to do so after 5 seconds, compared to their counterpart students.

Logarithm of the hazard ratios of transition intensities for the Black students and White students. The dashed horizontal line stands for a log(ratio) of zero (when the transition intensities for the two groups are equal).
Again, from the above results, to produce essays of the same quality as the White students, the Black students spent statistically significantly longer time in the text production and editing states before long pauses and tended to make quicker and shorter transitions in getting to those states.
3.2.3. Gender
Table 9 shows the feature summary statistics for the gender groups. Again, the two groups were quite close in writing scores due to the matching procedure. Given that they produced essays of equal quality, the female group used statistically significantly more complex vocabulary (lower SFI) and was significantly faster in typing on the character level. In addition, female students in our sample appeared to have produced essays of about 10 words longer and used shorter total time in writing compared to the male students; these differences were not statistically significant but noticeable (Ziliak & McCloskey, 2008).
Summary Statistics of Essay Production by Gender Group
Note. The t tests of mean differences for SFI and second per character are statistically significant at p < .05. SFI = standardized frequency index.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Table 10 shows the estimated β parameters for the gender groups from the full semi-Markov model. The estimated
Estimated
Note. 1, 2, and 3 in the label column stand for E, P, and T, respectively. LCI = lower confidence interval; UCL = upper confidence interval; SD = standard deviation.
Source. Copyright by Educational Testing Service, 2019. All rights reserved.
Results of the logarithm of the hazard ratios in Figure 8 revealed that, compared to the male group, the female group made more frequent E-to-P transition across the time span. For P-to-E and T-to-E, they made similar or more frequent transitions; for P-to-T, they made similar or less frequent transitions. The female group made less frequent T-to-P transitions within the 5-second duration, but they were more likely to make T-to-P transitions after that.

Logarithm of the hazard ratios of transition intensities for the gender groups. The dashed horizontal line stands for a log(ratio) of zero (when the transition intensities for the two groups are equal).
Overall, from the above gender comparison, we may infer that the female group was more fluent in writing than the male group even after score matching: On average, female students were faster in character-level typing, they used more complex words, they spent longer time in continuous text production before making long pauses, and they were more likely to make short and quick editing and pause transitions. These characteristics may explain why the female students in our sample produced relatively long essays (10 words longer) in shorter total writing time (40 seconds shorter).
4. Discussion
In this study, we used keystroke logs to model the writing processes and investigated differences in writing processes between subgroups given the same essay quality. We focused on the duration time that students spent in each state (long pauses, editing, or text production state) and the probabilities of transitions from one state to an other. We found that differences in writing processes existed between subgroups given the same quality of essays, as well as across the three demographic groups studied. For example, both the low SES students and the Black students showed significantly lower efficiency in text production than did comparison groups; that is, their final texts were smaller portions of the total keystroke events compared to the higher SES students and White students, respectively. In contrast, compared to male students, the female students were more fluent, typing faster, using more complex words, spending longer time in text production, and engaging in quick and frequent editing and pauses.
While there appear to be notable writing process differences across the three demographic comparisons, a common observation is that the studied focal groups (low SES students, Black students, and female students) spent longer time at the text production states before entering the long pause states, and then they made less frequent transitions within 5 seconds or so for T-to-P. However, the meaning of this finding might not necessarily be the same for each group. For example, the low SES students and Black students might have spent more time in text production because they appeared to need to expend greater effort to produce essays of the same quality as their counterparts (indicated by their lower text efficiency). For the female students, the additional time in text production was also associated with greater fluency.
These results are consistent with what the literature suggests for the potential causes of lower writing performance. For instance, difficulties with transcription skills (keyboarding and spelling) can reduce the amount of working memory available for other writing processes such as planning and evaluation. This reduction may lead to a more serial writing strategy. In such a strategy, students with less fluent transcription skills may need to spend a greater amount of time producing and editing the text (e.g., as in Table 6, the low SES group had significantly negative β parameters
This study had several limitations including that we used a data set with small sample sizes. A larger sample size is preferable for modeling processes and for reducing random effects associated with individual students’ responding. In addition, we used the propensity matching method to preprocess the data before comparing the subgroups, which may reduce information. Other statistical approaches that do not sacrifice data, such as supplying both a subgroup indicator and the matching variable as covariates in the Cox regression step, or supplying weights to standardize groups, are worth investigation in future analysis. These weighting methods include weighting by the minimum discriminant information and weighting adjusted for errorprone covariates (refer to Haberman, 2015, and Lockwood & McCaffrey, 2016).
Because of the specific writing task and limited sample sizes, the results obtained here may not be generalizable to a broader population or to different writing conditions, and the causal explanations we have suggested may or may not ultimately be confirmed. Further research might attempt to cross-validate our keystroke log-based writing state classification results in order to disambiguate what students are doing in each state, particularly during long pauses.
One additional limitation of this study is the time homogeneity assumption in the semi-Markov models. Common writing strategies such as whether writers edit more at a later stage of the writing session may call for heterogeneous models, which naturally is the next step in semi-Markov modeling. However, writing strategies common in the general population may not be associated with a particular subgroup leading to conclusions similar to those presented here. A final note is that statistical significance is not the same as practical significance; β estimates do not offer meaningful interpretation in terms of effect size (Ziliak & McCloskey, 2008) of subgroup differences.
Despite these limitations, this study offers important results. Perhaps the most important of those results is the potential value for analyzing writing process data of semi-Markov models over the more commonly used Markov models. Because they consider the continuous nature of writing state duration and by relaxing the memoryless property of the duration time distribution, semi-Markov models are well suited to the evaluation of such processes and group differences in them. Besides the studied continuous time stochastic processes, other alternative modeling approaches are worth exploring, including the dynamic Bayesian networks, if writing time is sliced into equal intervals (Mislevy, Almond, Yan, & Steinberg, 1999; Murphy, 2002). A second important result is that this study provides a supplement to existing sources of evidence about group differences in writing, which have (for the most part) focused on differences in scores and their correlations with socioeconomic variables. With inclusion of process differences, we add critical evidence that might help us to validate a more detailed causal account.
Footnotes
Appendix
Authors’ Note
Any opinions expressed in this publication are those of the authors and not necessarily of Educational Testing Service.
Acknowledgments
The authors would like to thank Daniel McCaffrey, Shelby Haberman, and especially two unanimous reviewers for their thoughtful comments on a previous draft of this article.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors prepared the work as employees of Educational Testing Service.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
