Abstract
Through time-series graphs, teachers often evaluate progress monitoring data to make both low- and high-stakes decisions for students. The construction of these graphs—specifically, the presence of an aimline and the data points per x- to y-axis ratio (DPPXYR)—may impact decisions teachers make. The purpose of this study was to evaluate the impact of graph construction manipulations on pre-service teachers’ accuracy with instructional decision-making. Participants included 94 pre-service teachers enrolled in an introductory course focused on students with disabilities at two universities. Following instruction on progress monitoring, students evaluated 48 graphs representing eight data sets with six manipulations (i.e., with and without aimline; DPPXYR set at 0.05, 0.10, 0.15). Results suggest that the presence of an aimline increased accuracy, whereas the manipulation of the DPPXYR led to mixed findings. Implications for future research and practice are discussed.
Special educators and general educators use progress monitoring data to make both low- and high-stakes decisions that impact their students. There are several factors that will increase the likelihood educators make valid decisions when interpreting progress monitoring data. Graph construction is one element that has been under researched in the area of progress monitoring data and could identify salient graphical elements that will increase validity in decisions made by educators. We have structured our introduction to first focus on special education teachers’ use of progress monitoring data to make decisions for students receiving special education services followed by a description of how general education teachers use progress monitoring data for students at-risk for or identified with a disability. Finally, we will synthesize prior research on teacher evaluation of progress monitoring data and the role of graph construction in teacher decision-making.
Special Education Teachers’ Use of Progress Monitoring Data
The Individuals With Disabilities Education Act (IDEA, 2004) emphasized the need for special education teachers to use data to inform whether the Individualized Education Program (IEP) is constructed to protect a student’s access to a Free Appropriate Public Education. Special education teachers lead multidisciplinary teams in developing an IEP for each student receiving special education services (i.e., the IEP team). These IEPs consist of three core components aligned with progress monitoring: (a) presence of at least one measurable, annual goal; (b) a statement of how a student’s progress toward the annual goal will be measured; and (c) appropriate, objective procedures for monitoring student progress (Yell, 2019).
Special education teachers collect data on annual goals frequently. These data are then evaluated, typically by presenting data via a time-series graph, to determine whether the student is making adequate progress toward the annual goal or, if not, how to intensify instruction to increase student response. For academic skills, special education teachers will often use curriculum-based measures. For annual goals focused on adaptive behavior, social skills, or transition/vocational skills, special education teachers will likely engage in direct observation of target behaviors (see Direct Behavior Rating; Chafouleas et al., 2009). The ability to evaluate progress, or lack thereof, through the visual analysis of these data presented via time-series graphs enables special education teachers to validly ascertain whether the student’s IEP is adequately constructed to protect their right to a free appropriate public education (Yell, 2019).
General Education Teachers’ Use of Progress Monitoring Data for Early Intervention
To best serve students in public school systems, the reauthorization of the IDEA (2004) placed a greater emphasis on early intervening interventions for students struggling both academically and behaviorally prior to the identification of a disability. These early intervening interventions aim to reduce Type I errors (false positives) or the identification of a disability when this is not warranted. Today, educational systems typically accomplish this by implementing a framework of tiered interventions, which address school-wide improvement in instructional efficacy and the use of data to inform decision-making. These frameworks are commonly referred to as multitiered systems of support (MTSS; Kuchle et al., 2015), which is the language we will use throughout the article.
Within this system, MTSS teams (e.g., general education teachers, special education teachers, school psychologists) support students at risk for not meeting socially important end-of-year goals through a system of tiered interventions, which increase in intensity based on student responsiveness to instruction (Coolong-Chaffin & Wagner, 2015; Sailor et al., 2020; Sugai & Horner, 2009). A critical element of an MTSS framework is the reliance on student outcome data to inform this dynamic, decision-making process. Similar to special education teachers, progress monitoring data for academic outcomes are typically collected using curriculum-based measures administered on a weekly or biweekly basis (i.e., once every 2 weeks) to gauge student responsiveness to instruction (National Center on Intensive Intervention [NCII], 2013; Schumacher et al., 2017). For behavioral outcomes, data often consist of direct observation of target behavior(s) or teacher report (e.g., daily behavior report card) and are collected in a similarly frequent manner to assess progress. Whether academic or behavioral in nature, these data are plotted on a time-series graph, which educators analyze to determine student responsiveness and individualize interventions to accelerate student growth (Kubina et al., 2017).
Furthermore, support of the importance of teachers’ assessment evaluation abilities comes from a change made in the reauthorization of IDEA (2004). This amendment declared that state educational agencies cannot require the consideration of “a severe discrepancy between achievement and intellectual functioning” as the sole method for identification of a specific learning disability. This change allowed states to incorporate an MTSS framework (i.e., Response to Intervention [RTI]) to identify students with a specific learning disability. This is currently recommended as best practice with Miciak and Fletcher (2020) labeling it as a three-pronged approach, with frequent progress monitoring and evaluation of time-series data to determine student responsiveness as a core component. Special education teachers, general education teachers, and school psychologists are all critical stakeholders employed by the local education agency and are tasked with making this recommendation and explaining this information to the family and other key stakeholders.
Preparing Pre-Service Teachers to Evaluate Progress Monitoring Data
Given the importance of in-service teachers’ use of progress monitoring data to make decisions, it becomes imperative to train pre-service teachers to begin to develop skills in this area. The Standards for Initial Special Education Preparation published by the Council for Exceptional Children (CEC, 2015) recommends pre-service special education teachers “use knowledge of measurement principles and practices to interpret assessment results and guide educational decisions for individuals with exceptionalities.” In addition, the Council of Chief State School Officers (CCSSO, 2013) specifies that teachers work “independently and collaboratively to examine test and other performance data to understand each learner’s progress and to guide planning” as part of its Interstate Teacher Assessment and Support Consortium (InTASC) core teaching standards. Collecting and interpreting progress monitoring data correctly is integral to fulfill both objectives.
Research on Teachers’ Evaluation of Progress Monitoring Data
To situate the current project within related literature, we provide a brief synthesis of prior experiments. Fuchs (2004) suggests a continuum for categorizing research on the use of curriculum-based measurement in data-based decision-making. The continuum consists of three stages of investigation: (a) the technical adequacy at static time points (i.e., screening), (b) the technical characteristics of slopes across time (i.e., progress monitoring), and (c) the instructional utility as teachers engage in data-based decision-making. The focus of the current project is to provide information that informs the third stage of this continuum, which involves the instructional decision making made by pre-service teachers based on graph characteristics.
As evidence accumulated to support the technical adequacy of curriculum-based measurement at static time points and for evaluating slopes through progress monitoring, other researchers also began addressing the third stage—instructional utility of data-based decision-making. Foundational studies demonstrated that teachers often had a difficult time interpreting and making “accurate” decisions based on student data (e.g., Stecker et al., 2005; van den Bosch et al., 2019).
Three recent experiments all used the same think-aloud protocol to increase the teachers’ quality in interpreting data and developed a rubric protocol to score teachers’ evaluation of progress monitoring for its coherence, accuracy, specificity, and reflectivity (Espin et al., 2017; van den Bosch et al., 2017; Wagner et al., 2017). Espin and colleagues (2017) identified large variability in the ability of in-service special education teachers to coherently speak about progress monitoring data. Of the 14 special education teachers included, Espin and colleagues identified 5 teachers demonstrated a high level of skill, 4 demonstrated a mid level of skill, and 5 demonstrated a low level of skill. An unexpected finding was that almost 40% of the statements made by the five teachers with the lowest rated performance were inaccurate.
Van den Bosch and colleagues (2017) identified slightly improved quality in interpreting progress monitoring data for 23 in-service general and special education teachers when compared with Espin and colleagues (2017)—these teachers’ quality of evaluation were similar to the “gold standard” raters used in the study (i.e., expert evaluators of curriculum-based measurement graphs). Despite high accuracy in think-aloud protocol, the authors reported that teachers still had lower accuracy in making instructional decisions based on the data patterns.
Wagner and colleagues (2017) evaluated quality in interpreting progress monitoring data of 36 pre-service special education teachers. The authors found that the “gold standard” raters had higher quality interpretations than the pre-service teachers prior to internship. An interesting finding from the experiment was that the pre-service teachers had similar pre- and post-internship performance on their evaluation of graphs, suggesting structured intervention and training is needed to see gains.
In addition, two recent experiments had pre-service professionals evaluate AB design graphs (Lane et al., 2021; Wilbert et al., 2021). Wilbert and colleagues (2021) had 186 first-year, pre-service teacher education majors evaluate time-series graphs displaying AB single-case designs. These AB designs were labeled as baseline and intervention, which are comparable to any comparison in an MTSS framework of a lesser intensity intervention versus increased intensity. Type I error rates were low when simulated data series did not include trend (5%); however, as trend was added to the data series, Type I errors increased to 25%. The authors found low inter-rater reliability among pre-service teachers in evaluating the magnitude of intervention efficacy.
Lane and colleagues (2021) trained 60 pre-service professionals enrolled in a course focused on principles of behavior management within a college of education to evaluate 15 AB design graphs displaying various data patterns. Upon visual analysis, students indicated whether or not the intervention was effective. Post instruction, the authors found high accuracy rates (~97%) in determining whether an intervention effect was present. Interestingly, through the face validity process for validating simulated data series, the team found the graphs that experts identified as least likely for professionals to encounter graphs with the lowest accuracy.
Graph Construction
To evaluate student responsiveness to instruction, either through MTSS framework or IEP implementation, a key component is the construction of time-series graphs. One concern is the lack of standardization in graph construction by practitioners, and this may then impact visual analysis (Lewis et al., 2022). A majority of work related to time-series graphs has been focused specifically on single-case research designs. Typically, the x axis includes time (e.g., dates or sessions) and the y axis includes the primary outcome of interest (e.g., academic, behavior). Dart and Radley (2018) proposed a schema for thinking about graph construction by categorizing graphical elements as either aesthetic-altering or analysis-altering. Aesthetic-altering elements will change the way a graph “looks” but do not have evidence suggesting that when manipulated, they impact a person’s interpretations. Analysis-altering elements are those elements that when manipulated have evidence to suggest the decisions made by a reader of the graph will be altered. For example, when evaluating progress monitoring data to determine whether a student is making adequate growth, the manipulation of an analysis-altering element may lead to different conclusions.
To date, there are two potential analysis-altering elements: (a) ordinate scaling (Dart & Radley, 2017) and (b) data points per x- to y-axis ratio (DPPXYR; Radley et al., 2018). Dart and Radley (2017) manipulated the ordinate maximum value (i.e., y axis) by presenting a full scale of possible values (i.e., 100%) and truncations of 80%, 60%, and 40%. Findings suggest that truncating the ordinate of graphs presenting ABAB designs led experienced single-case designs research to make more Type I errors (i.e., false positives): (a) 80% ordinate max had 4.7% error rate, (b) 60% ordinate max had 6.3% error rate, and (c) 40% ordinate max had 21.9% error rate. By truncating the ordinate, mean level changes across phases (i.e., baseline to intervention) appear greater in magnitude and trend estimations appear steeper, which will impact teachers’ evaluation of data and intervention effectiveness.
In a follow-up study, Radley and colleagues (2018) identified the concern that the x-axis to y-axis ratio may also distort time-series data patterns and impact visual analysis (see Kubina et al., 2017). They proposed the DPPXYR as a potential metric to quantify this relation; it considers the ratio of the x-axis length to y-axis length while also considering the density of data points plotted along the x axis which others have raised as a potential element (see Ledford et al., 2019). As the x axis decreases in comparison with the y-axis length, trend estimates appear to increase, which may impact teacher evaluation of time-series graphs (see Figure 1). To compute the DPPXYR, the following formula is used:
Radley and colleagues (2018) presented multiple-baseline design graphs with DPPXYR set at 0.14, ±0.5 SD, and ±1.0 SD to experienced single-case design researchers. Findings suggest that the likelihood of a Type II error (i.e., false negative) increased for +0.5 SD and +1.0 SD and Type I error (i.e., false positive) increased for −0.5 SD and −1.0 SD. Given the findings, Dart and Radley (2018) suggested graphs with DPPXYR set between 0.14 and 0.16 would be best to minimize Type I and II errors. The authors raised concerns that progress monitoring software, which often automatically generate graphs, may construct graphs not adhering to recommendations. They cite AIMSweb (Shinn & Shinn, 2003) generated graphs with DPPXYR = 0.05. This would likely inflate Type I errors, meaning teachers would overestimate the progress students were making, which could lead to inaccurate decisions in an MTSS framework or evaluating IEP appropriateness.

Example of the six graphs for each data set.
The most closely related study published by Dart and colleagues (2021) investigated 159 graduate students’ evaluations of progress monitoring graphs that were generated by four curriculum-based measurement vendors: (a) AIMSweb (Shinn & Shinn, 2003), (b) DIBELS Next (Good et al., 2011), (c) easyCBM (Anderson et al., 2014), and (d) FASTBridge Learning (Fastbridge Learning, 2019). The authors found the ratio of the x-axis to y-axis ratio varied by vendor: 1.48 (AIMSweb), 1.81 (DIBELS Next), 1.50 (easyCBM), and 1.93 (FASTBridge); this means DIBELS Next and FASTBridge reported a longer x axis in comparison with y-axis length than the other vendor. Participants were asked if a treatment effect was present (yes/no) and also asked to rate the magnitude of treatment effect (scaled 0–100). Results suggest that the vendor graph profile did not impact participant rating of treatment effect (yes/no)—however, the graphs did produce different magnitudes of treatment effect, with the x:y ratio not being a likely reason. The authors did identify several differences across computer-based graphing systems: (a) presence of a trend line, (b) the width of tick marks along x axis, and (c) colored overlays showing different percentile performance.
One constant across computer-based graphing systems was the presence of an aimline. An aimline, as typically constructed, consists of a linear line connecting an initial level of performance (e.g., baseline datum, Week 1 datum) to a set goal or criterion for the end of the intervention period (e.g., end-of-year goal, annual goal on IEP; Ardoin et al., 2013). Although learning is not linear, this line represents the linear growth a student would need to make on a week-to-week basis to meet the end-of-year goal. A commonly applied decision rule involves identifying the number of consecutive data points (e.g., 3 or 5) below the aimline to indicate nonresponsiveness to current intervention. Given the number of variables that were present in this investigation, our study isolated the effects of two graphical elements: (a) DPPXYR and (b) aimline.
Purpose for Current Study
One element to improve decision-making in the interpretation of progress monitoring data is a focus on elements related to graph construction of time-series data. Given the prior research on potentially analysis-altering elements, we aimed to extend these findings in the context of progress monitoring data series. To date, there is no clear guidance on how to scale the ordinate scale for outcome data that do not have an upper-bound max. For example, many academic skills are measured via rate (e.g., correct digits per minute). With rate, there is no possible upper-bound max. Thus, for the current experiment, we did not investigate ordinate manipulation. The DPPXYR was raised as an element worth investigating for progress monitoring data given findings from single-case research designs and the concern raised by Dart and Radley (2018) that commonly used computer-based programs generate graphs with DPPXYR values less than 0.14, which increased Type I error rates based on the single-case design graphs.
Findings from Dart and colleagues (2021) were inconclusive regarding x:y ratio scaling because of the additional graphical element variables that were different across vendor graphs. Thus, we aimed to isolate the DPPXYR to investigate if it impacted the evaluation of progress monitoring graphs. Another element we aimed to investigate was the presence of an aimline. It is recommended to include an aimline on progress monitoring data, and most computer-based progress monitoring programs include an aimline (see Dart et al., 2021). Therefore, we aimed to investigate whether this graphical element fits the definition of an analysis-altering element. The following research questions guided this investigation:
Method
Participants and Courses
Participants
Participants were undergraduate students enrolled in an Introduction to Special Education course in the fall semester of 2020. Participants were recruited from three sections of the course across two universities in the southern United States. Total enrollment across these three classes was 113, and 111 (98.2%) students provided their consent to participate. The data of 17 participants were removed due to incomplete responses, and the complete responses from 94 participants were retained for analysis.
In our sample, a majority of participants were female and identified as White. Participants’ ages ranged from 19 to 55. Students were dispersed across the following education majors: elementary education, secondary education, special education, early childhood education, music education, world languages education, and child development. Ninety-three participants enrolled in the course as a major-area requirement and were pursuing initial teacher certification. One participant enrolled in the course as an elective for their major. See Table 1 for full demographic information separated by pre-service special educators (i.e., special education majors) and other pre-service educators (i.e., all other education majors).
Participant Demographics by Education Major.
Courses
The courses provided foundational knowledge in the legislation, policies, and procedures for educating children with exceptionalities. The focus of the courses centered primarily on recipients of special education services, the procedures for identifying children with disabilities, and educator responsibilities for implementing research-based instruction. Topics included evaluation procedures (e.g., RTI) under the IDEA, the role of MTSS in providing an appropriate education to all students, and progress monitoring procedures for effective, instructional decision-making. In work conducted by Schumacher and colleagues (2017), in-service teachers who lacked knowledge on RTI procedures (e.g., progress monitoring and data-based decision-making) required considerable instruction on the procedures prior to providing support implementing the procedures compared with in-service teachers who already had knowledge of RTI. This suggests instruction on RTI procedures for pre-service teachers could facilitate the implementation of RTI in practice.
To achieve this aim, class instructors supported the instruction of progress monitoring and data-based decision-making by using case studies published by the IRIS Center as class activities (Brown et al., 2009a, 2009b). The case studies provided guidance on progress monitoring and data-based decision-making followed by sample graphs and data on which to practice. After instruction and guided practice in class, students were assigned the study’s survey to demonstrate independence in making accurate instructional decisions.
Procedures
Students in each course section took a 96-item, online survey as a course assignment pertaining to progress monitoring. The survey was created on the online survey platform Qualtrics. After obtaining Institutional Review Board (IRB) approval, one of the researchers, who was not an instructor of any participating course section, included a recruitment message and consent form on the first page of the survey. Students who consented for their data to be used in this study were routed to a page to collect demographic information before completing the survey assignment. Students who did not consent were routed directly to the survey assignment without collecting any demographic information. Course instructors did not have access to the Qualtrics survey or responses. The noninstructor researcher collected all responses, de-identified consenting participants’ data, and reported assignment completion to the students’ corresponding instructor for grading.
Survey Instrument
The study’s survey consisted of 48 graphs created from eight different data sets. After viewing each graph, participants were prompted to answer two questions. The first question presented, and the primary data source for this experiment, was Given the student’s current performance, what instructional decision do you feel is needed? Response options included keep intervention intensity, increase intervention intensity, and decrease intervention intensity. Prior to viewing graphs, students were provided definitions for the response options. For decrease intervention intensity, we clarified that this would be removing the Tier 2 intervention and only providing Tier 1 instruction. For keep intervention intensity, we clarified that this would mean continuing the current intervention. For increase intervention intensity, we clarified that this would include introducing a Tier 2 intervention in addition to the Tier 1 instruction.
The second question presented was Given the student’s current performance and assuming no changes in instruction, what is your confidence the student would reach the end-of-year goal that is set? Response options for this item included a Likert-type scale with 0 indicating no confidence and 10 indicating high confidence. Data from this question were not used in our data analytic plan as they did not address our research questions. The full survey is available on Open Science Framework (see Kuntz et al., 2021).
Data sets
We created each of the eight unique data sets from which we generated the progress monitoring graphs. Because of the variety of major-area programs our sample of pre-service teachers were entering, we designed data sets that aligned with the type of outcome data teachers would collect if they were using curriculum-based measurement for vocabulary skills (see Hosp et al., 2016). Vocabulary curriculum-based measures would be a tool all majors (e.g., early childhood, elementary education, secondary education, music education, special education) could use to track student progress, unlike other commonly used measures (e.g., oral reading fluency, math computation). We labeled the y axis as correct academic responses and scaled this from 0 to 20. Most vocabulary curriculum-based measurement approaches have 20 as the ceiling and 0 as the floor; students are tasked with correctly matching as many terms to definitions as they can in a specified time frame. Thus, the data sets we created would mimic the data teachers would be analyzing.
Four data sets depicted student data for both Tier 1 instruction and Tier 2 instruction. For these data sets, we included a solid phase change line to indicate which data corresponded with Tier 1 instruction and which data corresponded with Tier 2 instruction. We included eight data points per condition (i.e., 16 total). The other four data sets depicted data from one tier of instruction—either Tier 1 or Tier 2. Each data set included eight data points.
We created four data sets to show a student was not making adequate progress toward the end-of-year goal. The rate of improvement was not steep enough to meet the end-of-year goal, and most of the data points were below the aim line. The correct response for students to select would be increase intervention intensity. We created three data sets to show a student was making adequate progress toward the end-of-year goal. The rate of improvement was steep enough to meet the end-of-year goal—yet, the student would not meet the end-of-year goal until the end of the school year. A majority of the data points were closely clustered around the aimline. For example, given the slope of student performance, it would be expected for the student to reach the annual goal at Week 25 or 26 (i.e., end of year on graphs we used). The correct response for students to select would be keep intervention intensity. Finally, we created one data set to show a student was making accelerated progress toward the end-of-year goal. The rate of improvement was steep enough to meet the end-of-year goal well before the end of year (e.g., Week 11 or 12). All of the data points were well above the aimline. The correct response for students to select would be to reduce intervention intensity.
Graphs
We created time-series graphs for all variations of each data set. We held the scale of the x and y axes constant. The x axes were scaled from 1 to 26, labeled as weeks, and numbered as 1, 3, 5, and so on. The y axes were scaled from 0 to 20, labeled as correct academic responses, and numbered as 0, 2, 4, and so on. Tick marks on both the x and y axes were presented outside the graph space for each major value. Data points were depicted as black, solid squares in a 6-point font. In graphs with phase change lines (i.e., Tier 1 to Tier 2), a solid, vertical line was included between the data for each phase. The x axes and y axes were black with a 1-point line thickness. Lines connecting data points were black with a 1.5-point thickness. When the aimline was present, we displayed it in red with a 0.5-point line thickness connecting the Week 1 data point to the end-of-year goal at Week 26.
We manipulated the graphs across the two hypothesized, analysis-altering elements (i.e., the DPPXYR and the presence of an aimline). For each data set, we included three manipulations of the DPPXYR (i.e., 0.05, 0.10, and 0.15), and each manipulation was graphed with and without an aimline (i.e., present, not present). In sum, we created six graphs for each of the eight data sets (N = 48).
Data Analysis
As our independent variables, we analyzed the presence of an aimline and each DPPXYR to determine the accuracy of participants’ instructional decisions. To answer our first research question, we calculated descriptive data on the percentage of participants making an accurate intervention decision for each graph. To answer our second research question, we also separated these descriptive data into two groups, (a) pre-service special educators (n = 16) and (b) all other pre-service educators (n = 78), to observe any differences. To answer our remaining research questions, we ran statistical tests to determine whether the presence of the aimline or the DPPXYR manipulations had an effect on correct responses and if an interaction was present. Due to the small number of pre-service special educators, we ran statistical analyses on the entire sample.
Overview of statistical analyses
The focus of our analysis was on testing whether presence (or absence) of an aimline and the DPPXYR individually or interactively predicted the probability of a student making a correct decision. The aimline variable was coded according to whether an aimline was presented in a particular graph: 0 = aimline not present or 1 = aimline present. The DPPXYR variable was coded according to the three manipulations: 1 = 0.05, 2 = 0.10, and 3 = 0.15. For the regression analyses described below, this DPPXYR variable was re-coded into two dummy variables in which Category 3 (i.e., 0.15) was treated as the reference category. Therefore, the regression slopes for the dummy variables are interpreted as a difference in predicted logits between the Category 1 (i.e., 0.05) condition or the Category 2 (i.e., 0.10) condition and the Category 3 condition. The 48 graphs were presented randomly to participants, thereby eliminating the possibility of order effects. Participants’ decisions were coded as either 0 = incorrect decision or 1 = correct decision.
Given that responses on the outcome variable (i.e., decision) are nested within-person that would potentially create nonindependent responding, we chose to analyze our data using a multilevel mixed-effects approach. The binary nature of the outcome variable disallowed a standard linear modeling approach. Therefore, we settled on analyzing our data using mixed-effects logistic regression. We performed two separate analyses, with aimline and the DPPXYR dummy variables included as Level 1 predictors (there were no Level 2 predictors in our model). Our first analysis involved modeling the aimline and the DPPXYR variables as independent predictors of the probability of a student making a correct judgment. Next, we re-specified the model to include the interaction between aimline and DPPXYR to determine whether there was any evidence of moderation effects (e.g., DPPXYR as a moderator of the effect of aimline or vice versa). In both models, the between-person intercepts were allowed to randomly vary. All analyses were performed using “melogit” in Stata 17.
Results
Descriptive Results
For each graph in the survey, participants responded to the question Given the student’s current performance, what instructional decision do you feel is needed? We calculated the percentage of participants who responded to each graph as keep intervention intensity, increase intervention intensity, and decrease intervention intensity. Table 2 displays the percentage of responses for each graph across pre-service special educators and other pre-service educators.
Descriptive Results of Participant Responses for Each Graph by Education Major.
Note. DPPXYR = data points per x- to y-axis ratio. Aimline codes are 0 = no aimline and 1 = aimline. Correct responses are in bold.
From the entire sample, we recorded the number of participants who identified the correct intervention decision (i.e., increase intensity, maintain intensity, decrease intensity). Overall, participants responded correctly for 65.1% (SD = 8.2%, range = 14.6%-85.4%) of responses across the 48 graphs. Participants’ correct responses on each graph ranged from 7.4% accuracy to 91.5% accuracy. For the 24 graphs displaying data to increase intervention intensity, they responded correctly in 69.4% (SD = 20.3%) of opportunities. For the 18 graphs displaying data to maintain intervention intensity, participants responded correctly in 71.2% (SD = 12.1%) of opportunities. For the six graphs displaying data to decrease intervention intensity, they responded correctly in 29.8% (SD = 17.8%) of opportunities.
Pre-service special educators
Pre-service special educators responded correctly for 68.1% (SD = 25.8%, range = 6.3%–100.0%) of responses across the 48 graphs. For the 24 graphs displaying data to increase intervention intensity, they responded correctly in 73.5% (SD = 27.9%) of opportunities. For the 18 graphs displaying data to maintain intervention intensity, participants responded correctly in 69.8% (SD = 18.0%) of opportunities. For the six graphs displaying data to decrease intervention intensity, they responded correctly in 29.2% (SD = 18.8%) of opportunities.
Other pre-service educators
In comparison, the other pre-service educators responded correctly for 64.5% (SD = 22.2%, range = 6.4%–91.0%). For the 24 graphs displaying data to increase intervention intensity, they responded correctly in 67.9% (SD = 21.4%) of opportunities. For the 18 graphs displaying data to maintain intervention intensity, participants responded correctly in 71.5% (SD = 12.1%) of opportunities. For the six graphs displaying data to decrease intervention intensity, they responded correctly in 29.9% (SD = 19.5%) of opportunities
Graph-altering variables
We also reviewed correct responses across our graph-altering variables: presence of an aimline, absence of an aimline, DPPXYR of 0.05, DPPXYR of 0.10, and DPPXYR of 0.15 (see Table 2). For graphs with the presence of an aimline, participants responded correctly in 67.2% (SD = 16.4%) of opportunities. For graphs with the absence of an aimline, participants responded correctly in 63.1% (SD = 27.4%) of opportunities. Participants selected the correct response in 66.2% (SD = 18.3%) of opportunities for the graphs with a DPPXYR of 0.05, 64.2% (SD = 23.5%) of opportunities for the graphs with a DPPXYR of 0.10, and 65.0% (SD = 22.8%) of opportunities for the graphs with a DPPXYR of 0.15.
Multilevel Model Results
As described above, we began by regressing the judgment dependent variable onto the aimline and DPPXYR variables. See Table 3 for the results of our statistical analyses. Overall, the model fit the data reasonably well, Wald χ2(3) = 11.01, p = .0117, indicating that the model represented a significant improvement in fit relative to an intercept-only model. The aimline variable emerged as a positive and significant predictor (b = 0.205, SE = 0.066, p = .002) of the probability of a student making the correct judgment from a graph. The odds ratio (OR) for aimline was 1.228, indicating that when a graph was shown with an aimline, the odds of a correct judgment was (100%) [1.2278 − 1] = 22.78% greater than the odds when the graph did not include an aimline.
Mixed-Model Results.
Note. b = regression coefficient; SE = standard error; OR = odds ratio; AIC = Akaike information criterion; BIC = Bayesian information criterion; dum = dummy variable; var = variance; CI = confidence interval.
Statistical tests were significant at p <.01.
Statistical tests were significant at p < .001.
No regression slope associated with the dummy variables was statistically significant (p = .465 and .657, respectively). The likelihood ratio (LR) test of our mixed logistic regression model versus a standard logistic regression was statistically significant, LR χ²(1) = 307.27, p < .001, indicating significant between-participant variation in the probability of correct responses to the graphs. This assessment was further supported by the fairly large intraclass correlation coefficient (ICC) for the model (ICC = 0.134; see Sommet & Morselli, 2017).
Next, we re-specified our model to include the previous independent variables as well as the interaction between them. Although the model fit the data reasonably well, Wald χ²(5) = 11.69, p = .039, relative to an intercept-only model, it did not fit significantly better than the previous model, LR χ²(2) = 0.75, p = .689. In other words, the addition of the interaction terms in Model 2 did not result in a significant improvement in fit from Model 1. Moreover, none of the predictors in Model 2 were statistically significant (all ps ≥ .123). These results indicated that Model 1 is the preferred model to explain the relationships between aimline and DPPXYR and the probability of correct judgment.
Discussion
Emerging literature has questioned and examined the influence graph construction has on visual analysis of single-case design data. In the realm of progress monitoring, investigations have documented the quality of pre-service and in-service, special education, and general education teachers’ interpretations of progress monitoring data. Effective teachers use graphs to make decisions that can have a profound impact on students’ future instruction and academic placement. Yet, little research has examined how well teachers are able to make accurate data-based decisions based on different variations of graphical elements. We examined the effects of two graphing elements (i.e., aimline and DPPXYR) on pre-service teachers’ accuracy in making correct intervention decisions. Our results highlight important findings that can guide future research and practice.
First, the presence of an aimline on the graphs we examined was the only graphing element that had a statistically significant impact on correct responses. This provides evidence that an aimline should be considered an analysis-altering element when constructing progress monitoring graphs for use in practice. The OR indicated that accuracy of a correct response increased more than 20% when the aimline was present compared with when the aimline was absent. It is promising to see many commonly used computer-based systems include an aimline (Dart et al., 2021), which likely will increase the accuracy in decision-making. Neither the DPPXYR nor the interaction of the DPPXYR with the aimline had a statistically significant effect on correct responses. This may indicate that pre-service teachers who do not have explicit training on visual analysis rely on visual guides as opposed to data patterns when making instructional decisions. Another possibility is that our measurement system (i.e., a trichotomous variable) was not sensitive enough to identify if the DPPXYR impacts graph interpretations; Radley and colleagues (2018) used a Likert-type scale from 0 to 10 on magnitude of treatment effect.
Second, the accuracy of participant responses (M = 65.1%) indicates that pre-service teachers made correct instructional decisions more often than chance (i.e., 1:3). Most notably, however, the majority of errors occurred in cases where the data indicated that the student was meeting expectations and the intervention intensity could be reduced (M = 29.8% accuracy). This may indicate that pre-service teachers are able to analyze data patterns but are less likely to reduce intervention intensity once it has been established for a student. This may be due in part to inexperience with the contextual information and possible resource strain involved in implementing these interventions for students who no longer need them.
Interestingly, this accuracy rate was much lower than that found by Lane and colleagues (2021; i.e., ~97%). This could be due to a multitude of factors. First, the participants in Lane et al. may have differed substantially from the current sample on unknown factors (e.g., additional exposure to visual analysis). Second, data series used in both experiments differed substantially, perhaps differences in difficulty may impact accuracy. For example, in this study, the authors investigated growth in a learned academic skill—possibly demonstrating more gradual growth over time, whereas Lane et al. created graphs depicting reversible behaviors—possibly demonstrating greater apparent level changes in the data. Third, the response options differed. Lane et al. asked a dichotomous question with a “yes or no” response, whereas we asked students to make an instructional decision, providing a trichotomous response (i.e., “increase intervention intensity, keep intervention intensity, reduce intervention intensity”).
Different data patterns impact the accuracy rates of participants. This was a phenomenon found in the current experiment as well as those conducted by Lane and colleagues (2021) and Wilbert and colleagues (2021). Wilbert and colleagues identified trend as an element impact accuracy, and Lane and colleagues identified the “face validity” of the likelihood of a practitioner encountering a data pattern impacting accuracy rate. Further research is needed to establish typical data patterns observed to help train pre-service professionals in analyzing these specific types of data patterns. Furthermore, in practice, continued research is needed regarding the effects of ongoing intervention on student academic and nonacademic factors when they have met or exceeded the academic goal.
Third, despite small differences among pre-service educators, our data suggest the importance of interrater reliability among school-based teams in making instructional decisions. Participants made correct responses on 65.1% of graphs, which would indicate an inaccurate decision on roughly one third of graphs. The variance in participants’ accuracy across graphs, however, ranged from 7.4% (no aimline, a DPPXYR of 0.15, and a correct decision to reduce intensity) to 91.5% (aimline, a DPPXYR of 0.15, and a correct decision to increase intensity). This highlights an important point when collaborating on student support or IEP teams to make instructional decisions—a team may have members reaching different conclusions based on the data series presented. Building our knowledge of not only an individual’s evaluation of time-series graph but also how to enhance a data teams’ communication about a data series to increase accurate decision-making is warranted. The think aloud protocol used by Espin et al. (2017), van den Bosch et al. (2017), and Wagner et al. (2017) may be the tool to support this process, although none of these experiments evaluated a collaborative group process in data evaluation. Group dynamics may affect this process and ultimately the data-based decision that is recommended.
Limitations and Future Research
The limitations of this study indicate directions for future research. First, our measurement of accuracy (i.e., dichotomously) may not have been sensitive enough to notice differences in pre-service teachers’ perception of student responsiveness when manipulating aimline and the DPPXYR. Espin et al. (2017) used a “think aloud” protocol to capture more of the nuance of teacher perceptions when visually analyzing progress monitoring data presented on a time-series graph; however, this project answered the more practical question pertaining to what teachers are going to do with the data. In future research, more sensitive measures to accompany an item related to accuracy in final decision would be helpful such as using open-ended “think aloud” responses for each item and rating scales to assess participants’ confidence in their decisions.
Second, the participants were undergraduate education students with limited classroom instruction or practical experience with progress monitoring data. These participants also analyzed data from a fictional student. Participants with more RTI training and/or experience who analyze fictional and real student data may yield different results. As seen by Wagner et al. (2017), however, undergraduate student performance did not significantly improve across their programs. Future research could focus on in-service teachers in a variety of school-based roles, training, and years of experience to identify real-world implications of graph manipulation on visual analysis. Furthermore, instruction on RTI and the evaluation of progress monitoring data in teacher preparation programs could be evaluated. Binks-Cantrell and colleagues (2012) worked to validate the Peter Effect in reading (i.e., “one cannot be expected to give what one does not possess”). This can be extended to RTI models, and specifically evaluating progress monitoring data, in that the abilities of pre-service teachers are limited by the abilities of their instructors.
Third, the simulated data, resulting graphs, and graph-altering elements used in this study were not vetted by other experts. We generated data using slopes to identify correct intervention decisions but did not confirm these decisions through the visual analysis of experts in the field. Future research could assess validity of the data with an expert panel and evaluate graphing variables through a component analysis of the features of common software programs (e.g., AIMSweb).
Fourth, our sample was too small to identify significant differences between pre-service special educators and other pre-service educators. Descriptively, more pre-service special educators did respond accurately on many of the graphs compared with other pre-service educators; however, there are not enough data to draw a meaningful conclusion. Future research could assess the role specialized training may have on evaluating progress monitoring data.
Fourth, simulated data do not capture real-life, student factors, which may allow for teachers to make more accurate decisions. These are factors that an automated software system would not be able to use in providing its recommendations such as student attendance, medication changes, presence of a disability, delivery, and pacing of instruction. Future studies could compare software recommendations with teacher decisions using their own students’ data.
Implications for Practice
Findings from this study have several implications for practice. First, similar to the first limitation, practitioners need to take a close look at data that are centered around the aimline, or “close call,” compared with data noticeably below or above the aimline. Second, we provided support for classifying an aimline as an analysis-altering graphical element, which supports current practices used in RTI software programs. Practitioners can ensure these programs are set up to use factors leading to more accurate decisions. Third, with increased accountability for general educators to identify students at risk, this work suggests that visual analysis training in teacher preparation programs may assist with that function. Fourth, the graphing elements need to be simple enough for teachers to understand, set up in the software, and interpret accurately. Specifically, the aimline is one simple graph manipulation that could be used to improve teacher interpretations.
Conclusion
In this study, we sought to identify graphing elements that reliably predict accurate intervention decisions based on simulated progress monitoring data. It is important that teachers are able to make correct decisions regarding student progress to ensure equitable educational opportunities are provided to all students. We assessed 94 pre-service teachers on their ability to make accurate intervention decisions. We presented our participants with 48 graphs varying by the presence of an aimline, the manipulation of the DPPXYR, and analyzed correct instructional decisions. We found that the presence of an aimline was the only element statistically significant in improving pre-service teachers’ accuracy. This work can inform how data are presented to practitioners to increase the likelihood of correct decision-making and how instructors in teacher preparation programs may impact this work.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
