Abstract
National test data indicate that some students do not perform well in writing, suggesting a need to identify students at risk for poor performance. Research supports Written Expression Curriculum-Based Measurement (WE-CBM) as an indicator of writing proficiency, but it is less commonly used in practice. This study examined the usability of WE-CBM compared with Reading Curriculum-Based Measurement (R-CBM). Participants included 162 teachers who were given examples of WE-CBM and R-CBM and then completed a usability measure for both curriculum-based measurement (CBM) types. Teachers not only rated WE-CBM as usable but also rated R-CBM significantly higher in usability, with no significant differences in acceptability. Practical implications that may inform modifications to WE-CBM are discussed.
Many groups of students lack the ability to write at the most basic level of proficiency (Graham, 2013). For example, over two thirds of fourth-grade students in a computer-based writing pilot assessment obtained a writing score in the bottom half of the scoring scale and 76% of students at both eighth and 12th grades performed below the proficient level on the 2011 National Assessment of Educational Progress (NAEP; 2003) writing assessment (National Center for Education Statistics [NCES], 2012; White, Kim, Chen, & Liu, 2015). In addition, the graduating class of 2015 obtained the lowest score on the writing section of the SAT, with the average score declining 13 points since the test was initiated in 2006 (American Academy of Arts and Sciences, 2016).
Given the need for advanced literacy and communication skills in the workforce (Business Roundtable, 2009; The Conference Board, 2006; Trilling & Fadel, 2009), it is essential to identify students at risk for poor writing performance. Although the validity and reliability of Written Expression Curriculum-Based Measurement (WE-CBM) has been studied, there is considerably less research on the diagnostic accuracy of WE-CBM for screening and it is used less often in practice. The purpose of this study was to investigate the factors impacting teacher use of WE-CBM as universal screening tool.
WE-CBM
The literature regarding WE-CBM includes studies of early writing in kindergarten through second grade as well as students at the elementary level. The studies of writing for use with early elementary students include various measures of writing and copying of letters, words, and sentences (e.g., Coker & Ritchey, 2010, 2014; Hampton & Lembke, 2016). Writing tasks for students beyond second grade generally involve writing a passage of text. The initial studies of WE-CBM with elementary students indicated that allotting 1 min to think and 3 min to respond to an age appropriate story starter served as a reliable and valid indicator of general student writing proficiency (Deno, Marston, & Mirkin, 1982; Marston, 1981; Tindal & Parker, 1989), and more recent research indicates that older students may be required to write for longer periods of time to obtain technical adequacy in writing samples (Keller-Margulis, Mercer, & Thomas, 2016; McMaster & Espin, 2007). Most studies of WE-CBM use narrative story starters; however, other genres of prompt, namely expository prompts, have also been explored (e.g., McMaster & Campbell, 2008; Mercer, Martínez, Faust, & Mitchell, 2012), mostly with secondary students. WE-CBM differs from other types of curriculum-based measurement (CBM) because there are multiple scoring indices used to determine writing performance as well as any number of possible responses to one story starter. WE-CBM involves the use of numerous metrics that fall into the categories of production-dependent, production-independent, and accurate-production indices (Jewell & Malecki, 2005). The production-dependent indices focus on writing fluency and are dependent on the total amount written (Jewell & Malecki, 2005). Specific metrics most often used include Total Words Written (TWW), Correct Writing Sequences (CWS), and Words Spelled Correctly (WSC). TWW is defined as any letter or group of letters separated by a space, even if the word is misspelled or is a nonsense word (Deno et al., 1982). WSC is defined as the total number of words spelled correctly when viewed in isolation or as a word in the English language (Deno et al., 1982). CWS is defined as two adjacent correctly spelled words acceptable within the context of a phrase to a native speaker of the English written language (Videen, Deno, & Marston, 1982).
In contrast to the production-dependent, the production-independent indices refer to outcomes that are independent of writing length and focus on the overall writing accuracy (Jewell & Malecki, 2005). These indices are consequently derived from production-dependent metrics to calculate the percentage of accuracy. For example, Percentage of Words Spelled Correctly (%WSC), which reflects the percentage of correctly spelled words, is calculated by the computation WSC / TWW and then multiplying by 100. Percentage of Correct Writing Sequences (%CWS), which reflects the percentage of correct sequences of words, is calculated by the computation CWS / Total Writing Sequences (TWS) and multiplying by 100, where TWS is the total possible adjacent writing sequences.
Finally, the accurate-production index combines elements of both fluency and accuracy, as introduced by Espin et al. (2000). For example, Correct Minus Incorrect Word Sequences (CIWS) can be calculated by subtracting Incorrect Writing Sequences (ICWS) from CWS (Espin et al., 2000). Espin et al. (2000) compared CIWS with teacher ratings of writing proficiency and found CIWS to be a valid indicator of proficiency at the middle school level. The study suggested that scoring metrics that consider both correct and incorrect word sequences (i.e., CIWS) are the best indicator for writing at the secondary level and are better predictors for writing proficiency than utilizing CWS alone (Espin et al., 2000).
Most WE-CBM research has focused on technical features of a static score with an emphasis on predictive and criterion validity (e.g., Espin et al., 2000; Gansle, VanDerHeyden, Noell, Naquin, & Slider, 2002; Keller-Margulis, Payan, Jaspers, & Brewton, 2016). To date, there is only limited consensus regarding appropriate scoring metrics for screening, and there is evidence of the need for a developmental perspective (e.g., grade level) when determining which scoring metric to use (Jewell & Malecki, 2005; Malecki & Jewell, 2003) as well as a need to consider language background (e.g., Keller-Margulis, Payan, et al., 2016).
There are very few studies of the diagnostic accuracy of WE-CBM as a measure for identifying students at risk for poor performance on high stakes criterion measures, such as statewide tests of achievement. Existing literature indicates that some measures used with WE-CBM offer sufficient diagnostic accuracy. CWS, for example, tends to offer adequate accuracy for identifying students at risk for poor performance on a criterion (e.g., Ritchey & Coker, 2013). When the criterion used to study the diagnostic accuracy of WE-CBM is the statewide achievement test, results suggest that the diagnostic accuracy of WE-CBM differs across outcome metrics and across students of diverse language backgrounds (Keller-Margulis, Payan, et al., 2016). The accurate-production index of CIWS and the production-independent metric of %CWS offering adequate diagnostic accuracy for fourth-grade students who were native English speakers, however, varied for students with diverse language backgrounds (Keller-Margulis, Payan, et al., 2016).
Despite the availability of various scoring metrics and research suggesting that WE-CBM is technically adequate, WE-CBM remains largely underutilized as a measure of screening for writing proficiency as well as for progress monitoring (i.e., monitor the progress of any child considered at risk for academic difficulties) in practice as other versions of CBM (e.g., Fletcher & Vaughn, 2009; McMaster & Espin, 2007). Although universal screening and progress monitoring are distinctly different, ideally WE-CBM functions as a screener and then is subsequently used to monitor progress as needed, allowing for the seamless use of one measure for both purposes. It is difficult to determine exactly how frequent the use of WE-CBM is in practice because, unlike other types of CBM, the story starters or prompts are easy to construct and therefore could be developed and used without the support of commercial products such as AIMSweb. Nevertheless, possible explanations for differences in implementation for WE-CBM compared with other types include limited research compared with R-CBM, the complexity of scoring and the resulting interpretation, and issues of acceptability.
Acceptability of Curriculum-Based Measurement
Acceptability is typically defined as the extent to which an assessment (or intervention, or other practice) is deemed to be fair and appropriate (Kazdin, 1980). There is limited research on the acceptability of CBM, with existing research examining the construct of acceptability in various ways (e.g., Allinder & Oats, 1997; Foegen, Espin, Allinder, & Markell, 2001; Gansle, Gilbertson, & VanDerHeyden, 2006). Studies examining acceptability have focused on whether the manner in which information about CBM is communicated impacts acceptability ratings (Foegen et al., 2001). Although the presentation style, either focusing on statistical or anecdotal information about use, was not rated differently, there was a stronger belief in the utility of CBM as a method of evaluating and modifying instruction as opposed to its ability to represent performance on a standardized test (Foegen et al., 2001).
Allinder and Oats (1997) looked at acceptability of CBM as a progress monitoring tool, where acceptability refers to examining instructional utility. Allinder and Oats (1997) conducted the first study monitoring actual teacher implementation of CBM, examining student growth in the area of mathematics CBM (M-CBM) over a 4-month time period. Teacher acceptability of M-CBM was measured using the CBM Acceptability Scale (CBM-AS), which assessed the major components of CBM use including effectiveness judgment, time required, and amount of skill and training needed to implement the measure. Teachers who rated in the high acceptability group differed significantly on two implementation measures: mean number of probes given and mean level of ambitiousness. Rates of growth between students of teachers with high acceptability and low acceptability were also significantly different, where teachers who rated higher in acceptability had students gain more throughout the use of CBM. These results indicate that teacher acceptability of M-CBM led to higher levels of instructional utility. Although informative, solely examining the acceptability construct does not inform what changes are necessary in making practices more appealing to use (Chafouleas, Briesch, Riley-Tillman, & McCoach, 2009). Consequently, it may be necessary to look at factors extending beyond acceptability to include those that impact overall use.
Findings from Gansle et al. (2006), the only known study to date examining factors contributing to the acceptability specifically of WE-CBM, indicated that teachers judged their own holistic ratings as more representative of student writing skills than any other measure, even though the technical adequacy of holistic ratings is not well established (Gansle et al., 2006). The study found that the more CBM training a teacher reported, the lower he or she rated holistic scores as descriptive of student writing skill. While Gansle et al.’s (2006) acceptability study provided useful information regarding specific writing scoring metrics that teachers may believe to best represent writing proficiency, little is still known about factors impacting implementation of WE-CBM.
Usability
The construct of usability is a complex system of factors that includes the well-known construct of acceptability as well as other variables related to the adoption of assessments or interventions in schools (Briesch, Chafouleas, Neugebauer, & Riley-Tillman, 2013; Chafouleas et al., 2009). The construct of usability has been studied in relation to a variety of factors such as Acceptability, Feasibility, Understanding, System Support, System Climate, and Home-School Collaboration. Acceptability is the extent to which an assessment is judged to be fair, reasonable, and appropriate while Feasibility is defined as the amount of time and resources required to conduct the assessment. Understanding is knowledge of what the assessment is, how to implement it, and why it should be used, and Systems Support is the extent of support from others in the environment for implementation of the assessment (e.g., professional development, ongoing consultation, additional resources). System Climate is the support at the philosophical level, including the extent to which the assessment fits within the school climate. Finally, Home-School Collaboration is the extent to which collaboration and communication between the family and school is needed to effectively implement the assessment (Miller, Chafouleas, Riley-Tillman, & Fabiano, 2014). These domains can be examined through the use of a usability measure, the Usage Rating Profile–Assessment (URP-A; Chafouleas, Miller, Briesch, Neugebauer, & Riley-Tillman, 2012). While the construct of usability suggests several factors that impact use of an assessment, the most studied component of usability is acceptability (Rosenfield, 2000), as discussed in the above section, and it is reported to account for the largest amount of the variance in predicting usability (Chafouleas et al., 2009). While prior research has studied usability of a behavioral screening tool (Miller et al., 2013), there is no research to date that has examined the usability of any type of CBM, including WE-CBM.
The Present Study
Despite poor performance in the area of writing, and the availability of WE-CBM as a tool for screening to identify students at risk (e.g., Diercks-Gransee, Weissenburger, Johnson, & Christensen, 2000; Keller-Margulis, Payan, et al., 2016; Saddler, 2013), WE-CBM is not regularly used in practice. Ratings of overall usability and its component skills, such as acceptability, may provide insight regarding the lack of use. These constructs have been minimally studied regarding WE-CBM. Given Reimers, Wacker, and Koeppl’s (1987) suggestion that knowledge is necessary for rating factors such as acceptability, research regarding WE-CBM should include opportunities for teachers to actively engage in WE-CBM prior to rating acceptability and other factors contributing to usability. Previous research examining the acceptability of curriculum-based assessment (CBA; for example, Eckert, Shapiro, & Lutz, 1995) included a case vignette approach to presenting CBA as an assessment method. To date, there are no studies using case vignettes to present WE-CBM as a potential assessment method. Considering there may be teachers who have low training in WE-CBM (Gansle et al., 2006), utilizing a case vignette that includes opportunities to practice scoring WE-CBM is one way to address teachers’ previous knowledge in CBM as a potential confounding factor in predicting usability.
As Reading Curriculum-Based Measurement (R-CBM) is likely used more frequently in practice compared with WE-CBM, the purpose of the present study was to compare the usability of WE-CBM with R-CBM. The study included examination of teacher ratings of Acceptability, Understanding, Feasibility, System Climate, System Support, and Home-School Collaboration, as measured by the URP-A (Chafouleas et al., 2012). The ultimate goal of this study was to determine how usability ratings for WE-CBM might differ to inform revision of the measurement approach in ways that improve usability. Specific research questions included the following:
Method
Participants
Participants included a total of 162 teachers who completed the survey (93.2% female), recruited from six different school districts in a metropolitan area in the southern United States. Participant demographics are included in Table 1.
Teacher Participant Variables, By Percentage.
Note. N = 162 for participants who completed the survey. CBM = curriculum-based measurement; R-CBM = Reading Curriculum-Based Measurement; WE-CBM = Written Expression Curriculum-Based Measurement.
Measures
Demographic Questionnaire
A demographic questionnaire was used to assess teacher characteristics such as gender, teacher type (i.e., general or special education), years of experience, and levels of training in WE-CBM. Some of the content included on the demographic questionnaire was informed by Gansle et al. (2006). Teachers self-reported level of training in CBM by responding to the statement, “I have received prior training in the use of curriculum-based measurement (CBM) as a screening tool,” measured on a 1 to 7 Likert-type scale where 1 indicated strongly disagree, 2 indicated disagree, 3 indicated slight disagree, 4 indicated neutral, 5 indicated slightly agree, 6 indicated agree, and 7 indicated strongly agree. Teachers were given two choices, either English/Language Arts or non-English/Language Arts (e.g., math, science, etc.), to the question of the demographic survey: “I teach primarily in (check one).” Teachers self-reported response to the question, “I use CBM at least. . .,” which included weekly (14%), once per month (19%), once per year (5.5%), less than once per year (2.5%), never using CBM (53.3%), and selected “Other” (6.13%).
URP-A
The URP-A is a self-report measure about factors influencing use of an assessment methodology that was derived from the Usage Rating Profile–Intervention (URP-I; Chafouleas et al., 2009; Chafouleas et al., 2012), designed to capture dimensions of usability regarding interventions in the school setting. The initial research regarding the URP-I yielded four separate factors (Acceptability, Feasibility, System Support, and Understanding), assessed through 34 items. Acceptability was found to be the strongest factor, explaining 30% of the variance (Chafouleas et al., 2009). However, during this process, an unanticipated factor of System Support (the fourth factor mentioned above) also emerged (Briesch et al., 2013). Consequently, this led to development of the Usage Rating Profile–Intervention Revised (URP-IR). Both confirmatory factor analysis (CFA) and exploratory factor analysis (EFA) found a six-factor model of usage, including subscales of Acceptability, Understanding, Family-School Collaboration, Feasibility, System Climate, and System Support.
The purpose of the URP-A (Chafouleas et al., 2012) is to determine which factors influence teacher use of school-based assessments. Initially designed as a school-based measure of behavioral assessment, the URP-A was validated on a sample of elementary and middle school teachers (Grades 1–8) that completed the URP-A after implementing the Direct Behavior Rating Single Item Scale (DBR-SIS) in the classroom (Miller, Neugebauer, Chafouleas, Briesch, & Riley-Tillman, 2013). In current form, the URP-A is a self-report measure that examines teacher assessment usage on a multidimensional construct and has evolved to include 28 items rated on a one to six Likert-type scale (strongly disagree to strongly agree). A CFA was used to examine the six hypothesized constructs (Acceptability, Understanding, Feasibility, Home-School Collaboration, System Climate, and Systems Support). This hypothesis was confirmed, indicating consistency in factor structures between the URP-IR and the URP-A, with explanation of variance for items ranging from .25 to .87. Internal consistency was measured using Cronbach’s alpha (.90 for Acceptability, .80 for Understanding, .83 for Home-School, .83 for Feasibility, .71 for System Climate, and .63 for System Support; Miller et al., 2013). The final version of the URP-A consists of an overall Usability score and six subscales, including Acceptability, Feasibility, Understanding, System Support, System Climate, and Home-School Collaboration, as defined above in the “Usability” section. For the present study, two items (Items 7 and 12) were modified from the original measure, by substituting the word “academic” for the word “behavior” (e.g., “This is a good way to assess the child’s academic problem”).
Procedures
This study was approved by the Committee for the Protection of Human Subjects at the authors’ institution and agreement from school districts was obtained prior to inviting participation from elementary teachers. A link to an anonymous electronic survey was sent to potential participants in a large metropolitan area in the southern United States, through email either directly from the primary investigator or from a school administrator. Surveys were sent to a total of six school districts. The survey was widely distributed to teachers; however, the exact number of teachers receiving the survey is unknown. Teachers were informed that it was estimated to take 30 min to complete the survey. Teachers were first asked to complete a demographic questionnaire, where teachers self-reported their level of CBM training by responding to the statement, “I have received prior training in the use of curriculum-based measurement (CBM) as a screening tool,” measured on a 1 to 7 Likert-type scale where 1 indicated strongly disagree and 7 indicated strongly agree. Two brief case vignettes were presented, one specific to R-CBM and another specific to WE-CBM. Each vignette included a description of an elementary campus with a large number of students failing the state standardized reading or writing assessment. CBM used as a universal screening measure was suggested as a form of measurement to identify those students at risk of failing. Each vignette introduced the procedures of the specific area of CBM (R-CBM or WE-CBM) with a brief example of a student response. The vignettes were constructed using expert consensus between the authors. The vignettes were presented in counterbalanced order, where two versions of the study were created in Qualtrics and half of participants received each study link.
As simply reading about the scoring in an analog format may limit generalizability (Eckert & Hintze, 2000), the present study was designed to include active scoring of the two CBM types being investigated. Immediately after reading each vignette, teachers completed guided scoring practice. For R-CBM, teachers were introduced to and practiced scoring by counting Words Read Correctly (WRC) on a scored passage. WRC is a scoring measure used when conducting R-CBM and is the number of words read aloud correctly in 1 min. For WE-CBM, teachers were introduced to and practiced scoring CWS, and used to measure complexity in writing including sentence structure, punctuation, grammar, and spelling, again by reviewing an already scored WE-CBM sample. This metric was selected because it is a complex yet common measure used in practice with moderate validity coefficients (e.g., Keller-Margulis, Payan, et al., 2016; McMaster & Espin, 2007). In addition, use of CWS is based on findings from Gansle’s (2006) WE-CBM acceptability study which indicated teacher ratings of CWS were higher than other common scoring metrics. Other scoring metrics typically include TWW and WSC.
In the vignette and guided counting scoring activity included in the study, teachers were informed that WE-CBM consists of having a teacher or other educator administer a brief story starter and then score the response. The vignette included the story starter “It was a sunny day and. . .” and prompted the teacher to view (a) an example student response to the story starter with the correct CWS listed and (b) a guided practice section, consisting of a sample student response. The instructions directed the teacher to practice counting CWS (see Appendices A and B, present online under Supplemental Material section, for full vignette). After teachers completed the guided practice for each CBM type (i.e., WE-CBM or R-CBM), they selected a rating of a whole unit from 1 (strongly disagree) to 6 (strongly agree) scale for each URP-A question (e.g., “I understand how to use this assessment”) for that CBM type. Teachers were asked to rate their perceptions of the usability of CBM in the context of universal screening, as described in the vignette. Participants were provided with a US$5 electronic Starbucks gift card within 24 hr of completing the survey.
Results
Participant demographics are listed in Table 1. In addition to the percentage of participants with prior CBM training (reported in Table 1), the results were disaggregated to examine teachers by special education status. When self-reported training with R-CBM specifically was examined, 31.1% of general education teachers reported previous training compared with 51.8% of special education teachers. In writing, 11.1% of general education teachers reported prior training in WE-CBM compared with 25.9% of special education teachers.
Descriptive statistics for the overall Usability score and each factor of the URP-A are included in Table 2. To interpret mean Likert-type scale rating scores, scoring guidelines from the URP-A include strongly disagree (1.0), disagree (2.0), slightly disagree (3.0), slightly agree (4.0), agree (5.0), and strongly agree (6.0). Participants on average slightly agreed that both R-CBM and WE-CBM are usable. The mean scores across the domains of the URP-A indicate that R-CBM and WE-CBM were both rated as acceptable with WE-CBM slightly lower than R-CBM in all areas except for the Home-School Collaboration domain. The Home-School Collaboration domain ratings for both CBM types were in the disagree to slightly disagree range. The ratings for WE-CBM in the System Support domain were also closer to slightly disagree.
URP-A Ratings by CBM Type.
Note. Likert-type scale meaning = strongly disagree (1.0), disagree (2.0), slightly disagree (3.0), slightly agree (4.0), agree (5.0), and strongly agree (6.0). URP-A = Usage Rating Profile–Assessment; CBM = curriculum-based measurement; R-CBM = Reading Curriculum-Based Measurement; WE-CBM = Written Expression Curriculum-Based Measurement.
Differences in Usability by CBM Type
The present study also examined whether teacher ratings, as measured by overall URP-A Usability score and the six distinct factor scores, significantly differed by CBM type while controlling for teachers’ descriptions of their previous levels of CBM training and special education status. Seven separate repeated-measures ANCOVAs were conducted, one for the overall Usability score and one for each factor score. Normality of data was examined using plots. Teacher self-reported level of training in CBM and whether their training was in general or special education were covariates. Multivariate results are reported throughout for each ANCOVA. Significant main and interaction effects are reported using
Overall usability
The repeated-measures ANCOVA for teacher-rated URP-A overall Usability scores revealed a significant main effect for CBM type, Wilks’s Lambda F(1, 161) = 4.53, p = .035,
Acceptability
The repeated-measures ANCOVA on teacher-rated URP-A Acceptability scores did not reveal significant main effects for CBM type, Wilks’s Lambda F(1, 159) = 2.82, p = .095, or significant interaction effects with training or special education status.
Understanding
Results for the teacher-rated URP-A Understanding domain revealed a significant main effect for CBM type, Wilks’s Lambda F(1, 159) = 12.36, p = .001,

The chart depicts the significant interaction effect of CBM type by training level of the mean URP-A Understanding score.
There was also a significant interaction effect between CBM type and teacher special education status—Wilks’s Lambda F(1, 159) = 5.95, p = .016,

The chart depicts the significant interaction effect of CBM type by teacher general/special education status of the mean URP-A Understanding score.
Feasibility
The repeated-measures ANCOVA on teacher-rated URP-A Feasibility scores revealed a significant main effect for CBM type, Wilks’s Lambda F(1, 159) = 4.42, p = .037,
Home-School Collaboration
The repeated-measures ANCOVA using Home-School Collaboration scores did not reveal a significant main effect for CBM type, Wilks’s Lambda F(1, 159) = 0.183, p = .669, or any interaction effects with training or special education status.
System Climate
Results for the System Climate scores revealed a significant main effect for CBM type, Wilks’s Lambda F(1, 159) = 4.98, p = .027,

The chart depicts the significant interaction effect of CBM type by training level of the mean URP-A System Climate score.
There was also a significant interaction effect between CBM type and teacher special education status, Wilks’s Lambda F(1, 159) = 5.95, p = .016,
System Support
The results for System Support scores revealed a significant main for CBM type, Wilks’s Lambda F(1, 159) = 5.63, p = .019,
Discussion
National test data indicate some students do not demonstrate proficient writing skills, suggesting a need for measures to identify students at risk in writing. Although there is a growing research base supporting WE-CBM as a reliable and valid indicator of general student writing proficiency (e.g., Jewell & Malecki, 2005; Ritchey & Coker, 2013), it tends to be less frequently used as a universal screener in school-based practice. No research to date has examined factors impacting teacher perceptions of the usability of WE-CBM to other types of CBM. The purpose of the present study was to better understand teacher perceptions of the usability of WE-CBM compared with R-CBM as a universal screener. Results of the present study indicate that R-CBM was rated significantly higher than WE-CBM on the URP-A overall Usability scale and the factor scales of Feasibility, Understanding, System Climate, and System Support. On the URP-A factors of Understanding and System Climate, there was an observable impact of prior CBM training on teacher ratings and teacher special education status had an observable impact on the URP-A Understanding scores. This overall finding appears to be consistent with the likelihood of previous exposure to CBM in reading and writing with a higher percentage of special education teachers reporting prior training with both of these assessment types.
URP-A Overall Usability and Factor Scores
Results of the present study indicate that teachers slightly agree that WE-CBM is usable, providing hope for the future increased use of WE-CBM. Gansle et al. (2006) indicated that teachers rated the use of CWS to score a brief writing sample as more representative of student writing skill as opposed to other CBM metrics. The present study lends to support to this finding, suggesting that teachers consider WE-CBM (with CWS as the scoring method) as a usable screening measure in practice. Teachers rate R-CBM to be significantly greater in usability than WE-CBM, potentially reflecting lower general emphasis on writing compared with reading in schools. With pressures to bring students to grade level in reading but not writing, it makes sense that teachers would be more committed, and potentially more familiar with reading screening measures.
In contrast, however, teachers slightly disagreed that they have adequate system support to implement WE-CBM, indicating a greater need for support from administrators and co-workers while also suggesting that perhaps writing is not a major focus of universal screening systems. Allinder (1996) suggested participative decision making and open communication between teachers as strategies to increase CBM use. For example, this could actively include use of focus groups of stakeholders to discuss the adoption of WE-CBM for screening, engagement of teachers with previous CBM implementation experience as leaders, and shared responsibilities related to WE-CBM implementation and interpretation. These same strategies could potentially help strengthen system support as well as address system climate issues and ultimately increase implementation.
Interestingly, there was no significant difference between reading and writing URP-A scores in the Acceptability domain, in spite of general knowledge that R-CBM is utilized more in practice and is likely more often the focus of screening as well as instruction and intervention. This is promising evidence given that Allinder and Oats (1997) found that in the context of progress monitoring, teachers with higher degrees of acceptability gave more CBM probes and set higher goals for students, leading to higher levels of student growth. However, as previously stated in the literature, the presence of general acceptability does not directly translate to implementation (e.g., Shapiro & Eckert, 1993; Sterling-Turner & Watson, 2002). Although acceptability is highly correlated with use, it is not sufficient in predicting usability (Chafouleas et al., 2009). The Understanding factor score had the largest practical significance between WE-CBM and R-CBM ratings, suggesting the importance of this domain when planning for implementation. When teachers understand WE-CBM, they may be more likely to support its use, potentially facilitating administrative adoption of the screening tool.
There was also a main effect for the Feasibility ratings with R-CBM rated higher than WE-CBM. This finding is not surprising given the additional time required to score CWS for writing samples, compared with scoring words read correct for R-CBM. The CWS metric, although it requires slightly more time to score, demonstrates better technical adequacy than other metrics such as TWW (McMaster & Espin, 2007). This could be the source of differences in ratings for WE-CBM, particularly in the context of universal screening where large numbers of writing samples would be scored and may suggest an issue to focus on in improving the measurement approach.
The finding of no main effect for the Home-School Collaboration domain along with lower ratings for both R-CBM and WE-CBM in this domain indicate that collaboration with the family is not rated as necessary for the effective use of either measure as a screening tool. That said, communicating screening results to families so they are aware of instructional decisions being made regarding their children is thought to be extremely important. There are several possible explanations for this finding. It could be that the study participants did not think communication between the home and school was required for the measures to be useful or that they were unaware of the value of this practice.
The examination of interaction effects indicated significant differences between CBM type and teacher previous CBM training in URP-A Understanding, where training impacts R-CBM more than WE-CBM Understanding scores. This suggests that the amount of training received has a larger impact on Understanding ratings for R-CBM; however, there was no difference between the high and low training groups for WE-CBM. In the demographic survey, 30.5% of participants reported on-the-job CBM training prior to participation in the study. As expected, teachers reported having the greatest experience with some type of CBM training in reading (i.e., on-the-job training, in-service, interaction with the school psychologist or another professional, journal articles, course work), where nearly 35% of participants reported training in reading, 28% indicated some type of CBM training in math, 13.4% reported some type of CBM training in writing, and 10.4% indicated CBM training in spelling.
In the case of the interaction between CBM type and teacher special education status in URP-A Understanding ratings, special education teachers had significantly higher ratings of WE-CBM on the Understanding factor than general education teachers. Although the descriptive results indicate that only a quarter of special education teachers had prior training with WE-CBM, over half of them had prior training in reading, which could have generalized to their ratings of CBM overall. This prior exposure likely results in increased familiarity with WE-CBM for this group and may therefore make them more open to using it as a screener. Consequently, special education teachers may then be a valuable resource to assist in supporting administrative decisions to implement WE-CBM screening in general education classrooms to help detect writing issues as early as possible. The last significant interaction was between CBM type and self-rated previous CBM training in URP-A System Climate ratings, again with those teachers reporting high levels of training rating R-CBM higher in System Climate than those reporting lower training. This finding makes intuitive sense given that those teachers who have received more training are likely to currently be placed at or have prior experience working in environments where implementation is desired.
Suggested Modifications for WE-CBM in Practice
Results indicated that the overall URP-A Usability score is higher for R-CBM than WE-CBM, indicating that teacher self-report parallels practice. These findings may inform what types of modifications are necessary to make WE-CBM more used in practice. For example, results from mean teacher ratings on the Acceptability subscale suggest that teachers have adequate buy-in for WE-CBM as a fair and effective measure. However, teachers rated WE-CBM lower compared with R-CBM in the individual item “I would be committed to carrying out this assessment,” with a Likert-type scale discrepancy of .6. While teachers may find WE-CBM acceptable, it may be too different from holistic approaches to assessing writing that were noted to be rated most favorably by teachers in Gansle et al. (2006). From a systems perspective, it may be helpful for campus leaders to address these reservations by implementing related motivational interviewing questions (e.g., “What would it take for you to be able to implement WE-CBM?”) allow enough time during teacher services days to learn and practice scoring the measures. In the Understanding domain, the highest item discrepancy was found for the item “I am knowledgeable about the assessment procedures” (Likert-type scale discrepancy = 1.0), in favor of R-CBM. This suggests a need to focus on clarifying the procedures for implementing WE-CBM including both administration and scoring. Consequently, educators should work to increase teacher knowledge of WE-CBM through providing professional development and ongoing consultation to teachers interested in WE-CBM, with the goal of increasing understanding.
Information gained from the Feasibility subscale includes whether participants feel the time it will take to implement WE-CBM is reasonable. Interestingly, the three lowest rated items on this domain had to do with time (e.g., “The total time required to implement the assessment procedures would be manageable,” “I would be able to allocate my time to implement this assessment,” and “The amount of time required for record keeping would be reasonable” where M = 3.7). This result is consistent with both Witt and Martens’s (1983) finding that Feasibility is maximized when it involves a minimum amount of time and effort and Chafouleas et al.’s (2009) finding that measures are seen as more feasible when time away from instruction and routines is minimized. Time is an essential bottom line when it comes to assessment in schools. The focus should be on minimal assessment, collecting just enough to direct the process of decision making and intervention. Given the time required to score some WE-CBM outcome metrics, like CWS, one option is to only use WE-CBM with students identified as at risk in some other way. This would direct resources and time toward evaluating those students who are most likely to require additional support.
If screening all students, one scoring metric that has been previously studied as a potential time-efficient metric is Correct Punctuation (Amato & Watkins, 2011; Gansle et al., 2002). Although this might be a brief option for scoring and offer efficiency for universal screening, the research is still limited. Ultimately, additional research is required to ensure that implementation of this measure in school settings is done in a manner deemed feasible and valuable.
Similarly, results of the System Support subscale suggest that teachers need more support in WE-CBM. The provision of professional development (e.g., in-services, on-the-job training), consultative support (e.g., from school psychologists, instructional specialists), and more resources (e.g., example scoring guided practice, integrity checklists) could address this current limitation. In addition, because effect sizes were small for differences in R-CBM and WE-CBM, this also warrants a need for supports and trainings of CBM in general.
Finally, results suggest that teachers believe R-CBM has more approval from others in assessment implementation (System Climate); for example, items that addressed whether the CBM type was consistent with the way things are done in the participant’s system were rated higher for R-CBM than WE-CBM. It may be helpful for the CBM trainers (e.g., school psychologists, instructional specialists, lead teachers) to work with administrators to increase buy-in for WE-CBM (e.g., presenting research on school statistics of written expression standardized test rates; sharing cost effectiveness of the measure), particularly if local or national test data suggest a significant need in this area.
Limitations
There are several limitations to the present study. The survey was distributed to teachers across six school districts and each school had different recruitment procedures depending on the research approval process in the school district. As a result, it is unknown exactly how many teachers received an invitation to participate. In addition, the study only looks at one possible way to implement WE-CBM as a universal screening tool, in terms of writing prompt selected, amount of time student had to respond, and scoring measure selected (e.g., CWS). There are a variety of methods of implementing WE-CBM including the use of various writing durations, prompt types, and use for progress monitoring response to instruction. It is possible that if teachers were exposed to and practiced a different type of writing measure, their perceptions of usability may have been different. However, it is important to note that the method selected to be presented to teachers was selected based on what is considered to be most standard in the literature and using a scoring method that has been widely studied and considered to be reliable and valid.
Another important limitation is the analog study design format. As there may be increased complexity involved in addressing student writing and reading problems in practice, these findings, based on a vignette scenario, may not generalize to a naturalistic school environment. Generalizability is also a limitation as only a small sample of teachers from school districts in a specific region participated, and the sample lacked diversity (i.e., was 74.7% White and 92% female) and is likely not representative of the teaching workforce.
In addition, it is possible that teachers have been exposed to these measures in their practice and therefore teachers come into the present study with prior levels of training and experience utilizing CBM, in particular in the area of reading. We attempted to address this issue by providing teachers with a standard opportunity to gain exposure to scoring prior to rating their perceptions of usability of the measures. While teachers likely have not had the full experience of implementing WE-CBM in the classroom, the guided practice is thought to fit into the “can work” stage (involving a laboratory or analogous condition to the context) out of Bowen et al.’s (2009) “can work,” “do work,” and “will work” stages of Feasibility. Nevertheless, the previous experience of the sample with using CBM may limit generalizability.
Teacher interpretations of the specific demographic questionnaire items may also be a limitation. For example, there were no specific anchors (e.g., attended one training vs. regular use) to assist teachers when rating their previous CBM training. In addition, in the demographic questionnaire item “I teach primarily. . .” and prompted teachers to either select English/Language Arts or non-English/Language Arts, there were no further descriptors, which left the item open to teachers to select what they felt was the best fit. This could potentially result, for example, in confusion for some teachers such as those who deliver instruction mainly with reading intervention. The present study partially addressed this issue by categorizing the 1 to 7 Likert-type rating into a dichotomized variable of “low” and “high” training groups, if the 1 to 7 Likert-type scale self-rating of previous CBM training was found to be significant. It is also important to note that some of the URP-A items do not use CBM-specific terminology (e.g., in the Feasibility domain, the word “record keeping” is utilized where “scoring” may have been more appropriate). Although there are risks to item revision, this was done on two URP-A items in the present study where the word “behavior” was replaced with “academic” (e.g., “This is a good way to assess the child’s academic problem”). It is unknown how modification of the wording on this item impacted the validity of the URP-A.
Finally, the results are presented without a Bonferroni correction for multiple comparisons due to the conservative nature of this statistical procedure. This decision was made in an effort to present all possible differences between R-CBM and WE-CBM, however, in doing so reported differences with small practical utility. When the correction for multiple comparisons is applied to the results of the current study, the only significant finding is in the domain of Understanding. Future studies should further explore whether this or any of the differences identified in the present study persist when examined with a sample of participants who are more familiar with both types of CBM or use them regularly in practice.
Practical Implications
The present study contributes to practice by carefully considering how usability information can suggest ways to effectively promote WE-CBM in schools. As results from the present study indicated that there were no significant differences between teacher acceptability of R-CBM and WE-CBM, this suggests that teachers may be open to using WE-CBM if they received more professional development. Some suggestions to enhance awareness and ultimately acceptability and use of WE-CBM across various stakeholders in the school setting include highlighting the need to screen for students at risk in the area of writing, as well as providing ongoing professional development about the utility of WE-CBM for this purpose.
Additional practical implications come from the findings related to the Understanding Factor in this study, given that Understanding was the only factor that had a medium effect size for the difference between WE-CBM and R-CBM URP-A ratings. School administrators may want to facilitate teacher training opportunities, in combination with repeatedly exposing teachers to WE-CBM to address Understanding. Keeping in mind that in the present study, all participants had exposure to both types of CBM with the guided practice section, results from the present study suggest that training impacts reading but not writing implementation, which may be related to the emphasis on reading in schools. It is possible that some of the districts presently have an environment that supports R-CBM including training and regular use of the assessment approach, meaning that previous experience could possibly impact ratings, despite the equal exposure to both approaches as part of the study. Creating a system climate that equally supports writing may positively impact perceptions of WE-CBM usability.
Future Research Directions
There are many directions for future research as a result of this study. Although factor analysis supports a six-factor model of the URP-A when used regarding behavioral screening assessments (Miller et al., 2013), it is important for future research to validate the URP-A for assessments of academic skills. For example, while Home-School Collaboration may be important to consider in behavioral assessment (e.g., Miller et al., 2013), it may be an irrelevant category for understanding the usability for CBM as a universal screener.
Results from the present study indicate a need for greater system support from administrators for implementation of WE-CBM. Given the importance of intervention and assessments being seen as acceptable to variety of stakeholders (e.g., Nastasi & Truscott, 2000), future study of the URP-A might also include administrators themselves, as the use of WE-CBM as a screener could likely be a school-wide decision. As a building wide procedure, this would require administrator leadership and follow-up support. One key function of the URP-A is to inform decision making (Briesch et al., 2013). Future research with the URP-A could also examine its utility as a measure of changes in overall Usability in response to modifications of WE-CBM. For example, as teachers rated WE-CBM significantly lower than R-CBM in URP-A Feasibility ratings, future research could investigate manipulation of the scoring variable selected such as using a less time-consuming scoring measure and examining changes in Feasibility ratings.
The sample in the present study was largely drawn from suburban school districts with high percentages of White students. Although the vignettes did not specifically reference a particular population of students, the population of the district where teachers were drawn from participation may have served as their reference group when participating in the study. Future research should also examine teacher perceptions of usability of WE-CBM with diverse populations such as English Language Learners (ELLs) and students with behavioral difficulties. As URP-A Acceptability was highly correlated with overall Usability scores, it is likely that acceptability continues to play an important role in the use of WE-CBM.
Conclusion
Results of the present study inform how teachers perceive WE-CBM on the individual level (Acceptability, Understanding), intervention level (Feasibility), and environmental level (whether they need more support, consistency with the beliefs of their school system, and whether family involvement is needed). This is preliminary evidence regarding whether teachers believe they have the appropriate skills, resources, and supports necessary to implement WE-CBM. Valuable insight into the barriers and how to better facilitate use of WE-CBM were identified and suggest that additional research regarding WE-CBM is greatly needed.
Supplemental Material
AEI781007_Supplementary_Material_REV2 – Supplemental material for Assessing Teacher Usability of Written Expression Curriculum-Based Measurement
Supplemental material, AEI781007_Supplementary_Material_REV2 for Assessing Teacher Usability of Written Expression Curriculum-Based Measurement by Anita M. Payan, Milena Keller-Margulis, Andrea B. Burridge, Samuel D. McQuillin and Kristen S. Hassett in Assessment for Effective Intervention
Supplemental Material
Appendix – Supplemental material for Assessing Teacher Usability of Written Expression Curriculum-Based Measurement
Supplemental material, Appendix for Assessing Teacher Usability of Written Expression Curriculum-Based Measurement by Anita M. Payan, Milena Keller-Margulis, Andrea B. Burridge, Samuel D. McQuillin and Kristen S. Hassett in Assessment for Effective Intervention
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
