Abstract
Nonoverlap is widely used as a statistical summary of data; however, these analyses rarely correct unwanted positive baseline trend. This article presents and validates the graph rotation for overlap and trend (GROT) technique, a hand calculation method for controlling positive baseline trend within an analysis of data nonoverlap. GROT is validated for controlling positive baseline trend and validated socially by visual analysis agreement. The flexibility and generality of GROT is demonstrated by using it with two alternative slope calculations: White and Haring’s bi-split and Tukey’s tri-split. In addition, GROT is presented as a technique that can be adapted for any non-overlap effect size method; examples here include the original percent of nonoverlapping data and newer nonoverlap of all pairs. examples here include the original percent of nonoverlapping data and newer nonoverlap of all pairs. Caution is urged to control baseline trend only when it is pronounced and reliable. GROT moves the field forward as a robust technique suitable for both visual and statistical analysis.
Judging data nonoverlap between phases is a popular technique with single-case researchers for its simplicity and close integration with the visual analysis of data graphs (Parsonson & Baer, 1978). Nonoverlap has been used for decades, and for more than 20 years, percent of nonoverlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) served as a handy numeric summary of nonoverlap. A form of data nonoverlap was first used even earlier, when Owen White and Norris Haring (1980) introduced Phase B data overlap from an extended phase “celeration” or “split middle” line. The extended celeration line (ECL) for measuring growth in 6-cycle logarithmic graph paper was a key tool in precision teaching, as developed by Ogden Lindsley and his students, initially at Universities of Kansas and Washington (Calkin, 2005; Lindsley, 1991; White, 1974, 1986).
Data nonoverlap has increased legitimacy within the broader research community as it has become recognized that a complete test of nonoverlap (weighing all data points equally) has much in common with respected nonparametric “dominance statistics,” Kendall’s Tau, Mann–Whitney U test, Sommer’s d, and area under the curve (AUC; Acion, Peterson, Temple, & Arndt, 2006; Cliff, 1993; Delaney & Vargha, 2002; Grissom & Kim, 2005). With minor calculations, output from these dominance statistics can be converted to PND (Huberty & Lowman, 2000). Thus, complete nonoverlap is now acknowledged to be a robust, distribution-free technique with good statistical power (D’Agostino, Campbell, & Greenhouse, 2006). There are currently more than nine nonoverlap techniques from which to choose (for comparisons, see Parker, Vannest, & Davis, 2011).
Although nonoverlap is easily accessible and practitioner friendly, its major shortcoming is the inability to consider baseline trend. Visual analysts warn that positive baseline trend weakens the inference that change was due to the treatment, which is termed conclusion validity (Kane, 2001; Kazdin, 2003; Orme, 1991). Positive baseline trend raises a competing hypothesis that progress could be due in part to preexisting improvement momentum. Because baseline trend is a serious challenge to conclusion validity, several parametric statistical methods have been designed to control for it: Crosbie’s ITSACORR model (Crosbie, 1993, 1995); Last Treatment Day prediction technique of White, Rusch, Kazdin, and Hartmann (1989); mean-shift and mean-plus-trend family of models (Center, Skiba, & Casey, 1985–1986); and mean-shift and mean-plus-trend models (Allison & Gorman, 1993; Faith, Allison, & Gorman,1996).
With one notable exception, nonoverlap techniques have not attempted to control trend. The exception is ECL introduced by White and Haring (1980). This pencil-and-ruler technique has proved itself useful since its inception nearly 40 years ago. ECL begins with hand-fitting a bi-split median line (Koenig, 1972) to Phase A data, and then extending it through Phase B. The nonoverlap calculation is the percentage of Phase B data points above that extended line, compared with an expected 50%. There are three limitations for ECL. First is the low statistical power provided by its binomial test. Second, the ECL is historically tied to Koenig’s bi-split median trend line, which is presently superseded by the Tukey tri-split line (Tukey, 1977). ECL is not inherently restricted to using the bi-split line, though, and could adapt to other trend lines (though apparently adaptations have never been published). Third, ECL may not be viewed as a true nonoverlap method, as it does not directly contrast Phases A and B data points, but rather Phase B data points overlapping an extended Phase A trend line.
Despite those limitations, the ECL method of merging data nonoverlap and trend has not been equaled in nearly four decades. In fact, the recently published as percentage of data exceeding a median trend (PEM-T; Wolery, Busick, Reichow, & Barton, 2010) appears identical to ECL. Ma’s percent of data exceeding the median (PEM; Ma, 2006) appears a simpler version of ECL for baseline data that have no trend (Parker & Hagan-Burke, 2007). However, ECL remains superior to PEM by offering a viable (albeit low power) analysis summary.
The field of single-case research (SCR) nonoverlap methods presently has nine contenders. They have been compared in detail elsewhere (Parker et al., 2011), so here a brief overview will suffice. The nine are as follows: (a) ECL (White & Haring, 1980), (b) PND (Scruggs et al., 1987), (c) PEM (Ma, 2006), (d) percent of all nonoverlapping data (PAND; Parker, Hagan-Burke, & Vannest, 2007), (e) Pearson’s phi (φ; Parker et al., 2007), (f) improvement rate difference (IRD; Parker, Vannest, & Brown, 2009), (g) nonoverlap of all pairs (NAP; Parker & Vannest, 2009), (h) Kendall’s Tau for nonoverlap between groups (TAUnovlap; Parker, Vannest, Davis, & Sauber, 2011), and (i) Combined Mann–Whitney U test nonoverlap with Tau baseline trend control (Tau-U; Parker, Vannest, et al., 2011). This list does not include percentage reduction data (PRD; O’Brien & Repp, 1990), which is a parametric and mean-based method, rather than nonoverlap. It also does not include the percentage of zero data (PZD; Scotti, Evans, Meyer, & Walker, 1991), which is applicable to only some clients and goals. The limitation of failing to consider Phase A trend applies to all but two of the nine listed, exceptions being the venerable ECL and the new Tau-U. The limitation of offering no p values applies only to PND. The weakness of very low statistical power (especially undesirable with small samples) applies to ECL, PND (for which statistical power is unknown), and PEM. The other six nonoverlap techniques represent improvement due to greater power, less likelihood of chance-level results, and better precision for data-based decisions (Parker et al., 2011).
The present article introduces a new visual-graphic method with the same aims as ECL, with some distinct advantages over its predecessor. The new method, graph rotation for overlap and trend (GROT), allows users to control positive baseline trend and calculate a nonoverlap-based effect size on the adjusted data set. GROT is a flexible technique that can be applied using any trend line slope estimate or nonoverlap effect size. Unlike ECL, GROT is a true nonoverlap technique, with more direct interpretability. GROT yields a “PND” summary score, more meaningful than ECL’s interpretation as the “ratio of data points around an extended baseline.” GROT’s second advantage as a true nonoverlap method is that it offers considerably more statistical power than ECL. ECL relies on the low power binomial test or median-based “Sign Test” for proportion of data split by the extended median-based trend line. In contrast, the flexibility available in GROT allows users to apply effect size metrics with higher statistical power. For example, NAP or any other “dominance” nonoverlap test possesses 91% to 94% of the power of a regression test as measured by the “Pitman Efficiency” rating (Hollander & Wolfe, 1999). Pitman Efficiency index for the Sign Test used in the ECL method is closer to 60%, depending on sample size and distribution shape (Hodges & Lehmann, 1956). The third of GROT’s advantages is its broad applicability with any trend and with any nonoverlap index. Thus, it can be adapted to user preferences, and be used with new or future trend estimates and nonoverlap indices as they are developed. Despite the differences between ECL and GROT, they have much in common. Both are graph based, relying on visual analysis and pencil-and-ruler operations on paper. Both are “distribution free” and robust to outliers. Both can be applied to ordinal as well as interval-level scales, and to data which fail to meet parametric distribution assumptions (Wilcox, 2010). Therefore, GROT provides three improvements as answers to limitations of ECL, while retaining the strengths which have promoted its longevity.
GROT is first demonstrated here on two data sets, with results technically validated by the well-reputed regression method by Allison and colleagues (Allison & Gorman, 1993; Faith et al., 1996). Next, GROT graphs are subjected to visual judgments to answer two questions. First, “Will results produced from GROT agree with visual judgment of expert but blind raters?” and second, “Will effects be reduced (between Phases A and B) due to baseline trend control?”
In brief, the GROT procedure is as follows to provide the reader a general idea of the steps. More detailed procedures appear following this brief explanation and these details correspond to some of the options for this flexible method, including two trend calculation options and two nonoverlap options.
First, a trend line is fit to Phase A (any calculation may be used as demonstrated later). Second, it is “dropped down” to the X- and Y-axis intersect (keeping parallel with the original line), and also extended through the end point of Phase B. Third, the graph paper is rotated so that new dropped trend line becomes parallel with a new horizontal base of the graph. In addition, the line between the two phases (intervention onset line) is redrawn so that it is vertical and now perpendicular to the new horizontal axis. Finally, one can compare Phases A and B data visually and/or statistically. We later present two statistical methods as illustrations and visual judgments as validation. The physically rotated graph has the effect of statistically controlling for Phase A trend. The effect is identical to using semipartial correlation to statistically control for baseline trend, as in the well-reputed technique by Allison, Faith, and colleagues (Allison & Gorman, 1993) without the calculation.
Two Trend Line Slopes
Because GROT is a general method which works with any trend line slope estimate, this article demonstrates two rank-order slopes: the bi-split or “quarter intersect” slope (Koenig, 1972; White, 1974) and Tukey’s tri-split median-based slope (Tukey, 1977). Other options for trend calculation include the Theil–Sen or “Kendall’s slope” (Sen, 1968; Theil, 1950) and the linear regression slope. We do not provide examples for these but they also work with GROT.
Koenig bi-split, median-based slope
The most popular hand-fit trend line in special education was first introduced to educators by Koenig (1972), modified by White (1974), and popularized by White and Haring (1980) and by Kazdin (1982). Koenig’s bi-split “quarter intersect” method was first widely used in schools within the precision teaching (Pennypacker, Koenig, & Lindsley, 1972). The slope was calculated on a “celeration line” plotted on a “standard celeration chart” (6-cycle logarithmic ruled graph paper) and is distinguished from performance rate (Calkin, 2005). Readers may be interested to note that this bi-split method of calculating celeration or slope was known even earlier outside of education, where it was called the “Brown–Mood slope,” as it was popularized by Brown and Mood (1951) from an even earlier publication by Wald (1940).
The quarter intersect method entails first splitting the data vertically into earlier and later halves, and then marking the intersection of the median X and median Y values for each half. A line is then drawn to connect the two median intersects across the two phases. Optionally, the trend line may be raised or lowered (keeping parallel with the original) so it splits all data points 50% above and 50% below it.
Tukey tri-split median-based slope
The Tukey tri-split line was popularized by Tukey and colleagues from the Exploratory Data Analysis group at Princeton (Hoaglin, Mosteller, & Tukey, 1983; Tukey, 1977). The tri-split line was well known from the 1940s (Bartlett, 1949; Nair & Srivastava, 1942; Wald, 1940) and outside of education is often referred to as “Wald’s trend line” or “Wald’s slope” in deference to its earliest source. Tukey and colleagues, however, did the most to popularize the technique.
The Tukey (1977) tri-split slope begins with dividing the data into three equal parts on the X-axis, for example, data at Sessions 1 to 3, 4 to 6, and 7 to 9 on a 9-point data series: We will refer to this as early, middle, and late, respectively. The trend line is based only on the early and late thirds of the data, and the middle data portion has a limited role, adjusting the trend line up or down (keeping parallel to the original). The intersect of median X and Y values are marked for the early segment and the late data segments, and a trend line is drawn to connect the two intersects. Optionally, the line can be adjusted up or down (keeping parallel with original) so it splits all data, 50% above and below it. Another option is to raise or lower the line so it passes through the median of the middle data segment. The Tukey method is currently used to train teachers in progress monitoring, as a substitute for the bi-split method (Hintze & Stecker, 2006).
The most extensive evaluation of median-based trend techniques was a Monte Carlo study by Johnstone and Velleman (1985). The Tukey tri-split method consistently outperformed the bi-split method on power and efficiency (Pittman coefficient). In schools research, Parker, Stein, and Tindal (1992) predicted student oral reading fluency scores with bi-split, tri-split, and linear regression lines. The Tukey tri-split line was closer to the regression line and surpassed even the linear regression line in predicting actual future performance.
Another promising trend line, not included in this article, is the “Theil–Sen slope” (Sen, 1968; Theil, 1950), also known as “Kendall’s robust line-fit method” (Sokal & Rohlf, 1995). Theil–Sen is the median slope of many “mini-slopes” created from all pairwise data comparisons made in time order (early to late) in a time series. It is available in an increasing number of free applications: the free student MYSTAT software (SYSTAT, 2008), the freely downloadable WinPEPI software for health care and medical researchers (Abramson, 2010), and the free software KTRLine Version 1.0 (Granato, 2006) from the U.S. Geological Survey Office. Although not yet used in schools research, it is mentioned here because of its future promise. In the Johnstone and Velleman (1985) Monte Carlo study, Theil–Sen consistently outperformed both bi-split and tri-split hand-fit lines in power and efficiency. Although many trend estimations are available, we use two best known and most accessible methods as exemplars for how to calculate trend line as the first step in GROT.
Two Nonoverlap Indices
Just as two “commonly used” and likely known trend methods were selected as examples, we provide illustrations for two overlap methods which are most commonly known PND (Scruggs et al., 1987) and NAP (Parker & Vannest, 2009). Just as GROT is a method that can include any type of method for calculating Phase A trend, GROT is applicable for any type of AB nonoverlap index. Simple AB contrasts are chosen for the current demonstration due to the fact that this contrast is the most basic SCR design to analyze. Part of the appeal of SCR for intervention researchers is the flexibility in constructing the design. Recent guidelines for evaluating SCR design quality (Kratochwill et al., 2010) call for a minimum of three demonstrations of experimental control. For example, this may take the form of an ABAB “reversal” design or three staggered AB contrasts within a multiple baseline design. It should be noted that the AB contrast alone does not meet minimum criteria for SCR design; however, multiple AB contrasts can be aggregated to provide an omnibus effect size for more complicated SCR designs.
The generality of GROT is demonstrated here by applying it with PND and NAP. These two nonoverlap indices are demonstrated by applying them first outside of GROT, to raw scores from the first example data set (see Figure 1).

Example data set illustrating the calculations of (a) PND and (b) NAP.
In Figure 1, PND is calculated as the percentage of Phase B data points above the highest data point in Phase A. First, the number of data points in Phase B is noted (nB = 11). Next, the highest point in Phase A is located (third data point is 6.8), and a horizontal line drawn to their right. Above that line, we count 10 of the 11 Phase B data points. PND is calculated as 10 / 11 = 91%. In Table 1, the third column shows the two scores critical to PND calculation: (a) the highest score in Phase A (6.8) and (b) the only smaller value in Phase B (6.1).
Control of Phase A Tri-Split and Bi-Split Trends via Semipartialling in First Sample Data Set.
Note. PND = percent of nonoverlapping data; NAP = nonoverlap of all pairs. Values in NAP’s “overlap zone” are presented in bold and with asterisks.
NAP is a “complete” nonoverlap method, as it equally considers all data points in both phases. As a complete method, it is supported by “dominance” statistics, mentioned earlier and described in more detail later in this article. NAP computation is not as simple as PND, so some users will prefer software, but hand calculation is easy enough to be accessible. NAP is output directly from a receiver operator characteristic (ROC) curve module as empirical AUC and may also be obtained from the two U values from a Mann–Whitney U test.
Although NAP can be instantly calculated by AUC or Mann–Whitney U test, hand calculation is described first to enhance understanding. The NAP formula is the number positives added to .5 the number of ties, minus the negatives, divided by the number of pairs: (positives + .5 × ties) / pairs.
First, the number of data points in Phases A and B are multiplied together to obtain the total number of paired comparisons (6 × 11 = 66 pairs). Next, the “overlap zone” is visually identified (see Figure 1b). The overlap zone extends from just under the lowest Phase B data point up to just above the highest Phase A data point, this zone will contain data to be labeled “negative” or a “tie.” For simplicity, we count the negatives and subtract from total number of pairs to get the number of positives rather than count all positives, which is generally faster. Ties are data equal to each other on the Y-axis in Phases A and B. Figure 1b overlap zone contains one negative pair (Data Point 3 compared with Data Point 7; negative = 1) and one tie (Data Point 5 compared with Data Point 7; tie = .5). Note that this example has only one data point in Phase B for illustration, and any additional data points in the overlap zone would require additional comparisons. The same “overlap zone” is represented in Table 1, column 5, by data presented in bold and with asterisks. Out of 66 paired comparisons, negative = 1 and tie = .5, so the remaining pairs must be positive (positive = 64.5). Therefore, the PND [NAP = (positive + .5 × ties) / pairs] equals (64.5 + .5) / 66 = 98%. NAP calculations are more involved than for PND, so some users will prefer to obtain NAP directly as “empirical AUC” from a ROC test, also termed a Diagnostic Precision test or Sensitivity/Specificity test (Swets, 1995). A complete description of NAP is available in Parker and Vannest (2009) for interested readers.
Calculation with a stats package is straightforward. Input “Phase” as the actual or true value, and “Scores” as the criterion or test variable. The empirical AUC is output (.98), along with its statistical significance (p < .00), and 90% confidence intervals (CI) [.84, .99]. NAP also is available from a Mann–Whitney U test, with one simple calculation required. A full-featured Mann–Whitney module will output larger and smaller U values, which for these data are large or UL = 64.5, and small or US = 1.5, so NAP = UL / (UL + US) = 64.5 / 66 = .98. The significance test yields Z = 3.12, so two-tailed p = .001. ROC-AUC and Mann–Whitney U tests yield identical NAP values, but they rely on different sampling distributions, so theirp values and CIs will differ. For NAP, chance-level results are .50. NAP can be transformed so that chance-level results equal 0: NAP0–100 = 1 − (NAP50–100 / .5). This transformation would change the NAP of .98 to .95.
The two nonoverlap results from original data (PND = 91% and NAP = 97.7%) will be compared with PND and NAP values obtained from the GROT baseline trend control procedure. Because PND is the simplest nonoverlap to calculate, and NAP is the most powerful, these two options will be used to demonstrate GROT.
GROT on First Example Data
GROT’s four steps are demonstrated using the same example data set: (a) set a trend line to Phase A (this example will use a tri-split line); (b) keeping parallel to the original slope, move the trend line down to the intersect of X- and Y-axes, and also redraw the phase separation line perpendicular to the slope; (c) rotate the graph so the trend line now becomes a new horizontal axis; and (d) calculate nonoverlap. These steps are shown with the tri-split slope in Figure 2 and in Table 1. Two nonoverlap summaries, PND and NAP, are both calculated for sake of comparison.

Example data set demonstrating (a) Koenig’s bi-split line plotted for Phase A and extended through Phase B; (b) the bi-split line dropped to the X- and Y-axis intersect, the graph rotated, and PND recalculated; (c) recalculation of NAP on adjusted data.
Figure 2 shows the Koenig bi-split slope (.40) plotted through the median intersects of the two halves of Phase A data. Figure 2b shows dropping the bi-split trend line down to run through the axis intersect, and then redrawing the phase division line perpendicular to the reset bi-split slope. Figure 2b also shows the final two steps: rotating the graph to use the bi-split trend line as the horizontal axis, and re-calculating PND. No redrawing of the graph is needed; it can be simply rotated (as was done here). PND calculated on the “detrended” data is 82%, which is less than the 91% calculated on the original data (see Figure 1), demonstrating that the nonoverlap effect was reduced by controlling the Phase A positive trend. This example shows that the original trended data and nonoverlap calculation overestimated the treatment effects. This visual-graphic method works equally well with any nonoverlap statistic. For example, Figure 2c shows NAP recalculated on the GROT detrended data. In this example, the original effect size estimate of 98% is reduced to 95%.
GROT validation by Allison and Gorman regression control on first data set
GROT accuracy (as a hand-calculated technique) can be validated by comparing its results with those from the best available regression control method by Allison and colleagues (Faith et al., 1996). Their procedure is as follows: (a) Calculate Phase A slope (they use a regression slope, but we will use the tri-split slope = .53); (b) multiply the slope by a simple linear series (see Table 1, column 4); and (c) subtract the series of those products from the original data series. This will result in transformed scores, with Phase A trend removed (see Table 1, column 5). In column 5, key scores for calculating PND are labeled PND, and key scores for calculating NAP are asterisked. For PND, the highest Phase A score (5.2) is identified, and all but two Phase B scores (2.4, 4.4) are higher, so PND = 9 / 11 = 82%. For NAP, the “overlap zone” contains three Phase A scores (5.2, 2.50, 3.5) and two Phase B scores (2.4, 4.4), and of their pairings, four are in the negative direction (negative = 4), and ties = 0, so of the 66 total pairs, 62 must be positive (positive = 62). Therefore, NAP = 62 / 66 = 94%. These results are identical to those obtained from the GROT-rotated graphs.
A graphic comparison also validates GROT against the Allison and Gorman control method. In Figure 3, the rotated and the Allison and Gorman detrended (semipartialled) scores are plotted together.

Comparison of GROT-rotated scores with regression (Allison & Gorman, 1993) detrended scores.
Their respective locations on the Y-axis are identical. Their respective locations on the X-axis are not identical, because rotating the graph skews the X-axis. However, a nonoverlap test is ordinal, so it does not depend on correct intervals on the X-axis—only correct order, or relative positions. Thus, this visual-graphic test also validates GROT.
GROT was also used with a tri-split median trend from Phase A, demonstrated in Table 1, columns 6 and 7. In column 6, the bi-split trend (.40) is multiplied by the time series, and the result is subtracted from original scores to yield detrended scores in column 7. Again, critical values for calculating PND are labeled, and values in NAP’s “overlap zone” are asterisked. PND again equals 9 / 11 = 82%. NAP is slightly higher (than for the tri-split slope): 63 / 66 = 95%. These results are identical to those obtained from visual analysis of the GROT-rotated graph in Figures 2b and 2c.
GROT on a second example data set
GROT is applied to control positive baseline trend in a second demonstration data set. Figures 4a and 4b show PND and NAP calculation on the original data. Figure 4c shows the Tukey tri-split slope calculated for Phase A and extended through Phase B. Figures 4d and 4e show the recalculation of PND and NAP on the GROT-rotated data. PND is reduced from 90% to 80% due to the GROT rotation, and the more comprehensive analysis NAP is reduced from 99% to 92%.

Second example data set (a) uncorrected PND calculation; (b) uncorrected NAP calculation; (c) Phase A trend plotted as tri-split slope; (d) GROT rotated, and PND calculated on rotated scores; (e) GROT rotated, and NAP calculated on rotated scores; and (f) validation of GROT-rotated data by Allison et al. regression detrending.
GROT validation by Allison and Gorman on the second example data set
Figure 4f repeats for the second example data set, the validation of GROT by the Allison and Gorman semipartialling regression procedure. Original and detrended scores for the second example are presented in Table 2 to validate the graph rotation procedure. Considering first the tri-split slope (columns 4 and 5), for PND, the highest detrended score is the eighth in order (7.2). In Phase B (column 5), 8 of 10 scores are higher than 7.2, so PND = 80%. For NAP, the scores in the “overlap zone” are all bold. Of these 6 × 2 = 12 combinations, 7 are “negatives” or drop from Phase A to Phase B. There are a total of 9 × 10 = 90 pairwise combinations between phases. With 7 paired comparisons negative, and no ties, the remainder must be positive. Therefore, NAP = (positive + .5 × ties) / pairs, which is 83 / 90 = 92%. The sixth and seventh columns in Table 2 show calculation of PND and NAP on GROT-rotated data using a bi-split trend line. There is no Figure associated with these columns; they are included to permit replication.
Control of Phase A Koenig Bi-Split Trend via Semipartialling in Second Sample Data Set.
Note. PND = percent of nonoverlapping data; NAP = nonoverlap of all pairs. Values in NAP’s “overlap zone” are presented in bold and with asterisks.
GROT Validation by Visual Analysis
GROT is designed to be a technique compatible with visual analysis, yet it was unknown whether the rotation would challenge the visual analyst, as this novel rotated graph may not have been previously encountered. Therefore, two questions were posed related to the interpretability of rotated graphs. The first question was whether visual analysts could make reliable judgments about behavior change from rotated graphs. The second question was how well visual analysts could identify the decreased behavior change from Phase A to B. The point of GROT is to display two phases with smaller differences in performance due to Phase A trend control. It was hypothesized that visual analysts would be able to detect those differences, at least in data sets with pronounced initial Phase A trends. Our hypotheses were that (a) rater agreement would be at least as high with GROT graphs as with original data graphs because of the effect of having a transformed flat baseline and (b) that raters would correctly detect less change from Phase A to B in some GROT graphs and not in others, but would not identify more change in GROT graphs.
Method
From a corpus of 372 published single-case distinct data series, AB data series were identified which met the dual criteria of (a) visually apparent Phase A trend in the same direction as desired behavior change and (b) this positive Phase A trend confirmed by Kendall’s Tau rank correlation test. A total of 49 AB data sets met these criteria. For each of the 49 data sets, two graphs were presented on separate 4 × 8 note cards, first with the original data graph, second with a GROT-rotated graph. Three graduate students in school psychology and special education independently rated each graph for magnitude of change from Phase A to B on a 3-point scale: large, medium, and small. Graphs were presented in random order.
To answer the first question about reliable judgments, interrater reliabilities were calculated and compared among the three raters on the original data graphs and the GROT graphs. To answer the second question about visual judges’ ability to judge GROT graphs as showing smaller change, we calculated cross-tabulations for each rater on original graph ratings versus GROT graph ratings. The resulting matrices were then examined for shifts in amount and direction of perceived effects from original to GROT graphs. Note that the two graph types were presented randomly, not in pairs, to reduce bias.
Results
To answer the first question about reliable judgments, linear weighted Cohen’s kappa (κ-LW) was calculated by Richard Lowry’s (2011) open source kappa calculation webpage (http://faculty.vassar.edu/lowry/kappa.html). Kappa-LW, unlike simple kappa, is sensitive to the amount or degree of disagreement on an ordinal scale (Parker, Vannest, & Davis, in press). For the original graph judgments, kappa-LW among the three raters was .84, .70, and .63, and the corresponding simple percent agreements were .88, .76, and .71. Interrater agreement on the GROT graphs by kappa-LW was .82, .78, and .76, with simple percent agreements .88, .86, and .85. Therefore, the GROT graphs permitted slightly higher agreement among raters, likely due to their flat baselines.
The second question related to visual analysts’ ability to detect smaller effects in the GROT graphs was answered by three cross-tabulations. Each rater (see Table 3) shows response shifts. Rater A judged 38 graphs as unchanged and 21 GROT graphs as showing reduced effects. Rater B judged 41 graphs as unchanged and 18 GROT graphs with smaller effects. Rater C judged 44 graphs as unchanged and 15 GROT graphs with smaller effects. There were no response shifts from original to GROT graphs indicating greater perceived effects.
Response Shifts From Three Judges Rating AB Graphs Before and After GROT Transformations.
Note. GROT = graph rotation for overlap and trend.
Discussion
Nine overlap techniques are currently available to calculate standardized scores for determining the “size” of change between two or more phases (Parker et al., 2011). However, only two are capable of handling trend Tau-U and ECL. This is unfortunate and important because trend in baseline data weakens conclusion validity (Kane, 2001; Kazdin, 2003; Orme, 1991) and thus our ability to promulgate practices with empirical evidence. Although many statistical models address this issue, they are largely inappropriate or inaccessible for nonparametric data or visual analysts. ECL, which was known as a technique in other fields in the 1940s, has a long history of use but limitations. Three in particular, the interpretation as a ratio of data points, low power, and inability to be applied universally are challenges that if overcome, would move the field forward and extend our knowledge base of valid, reliable techniques to address trend in nonoverlap analysis, which are compatible with visual analysis and avoid sophisticated statistical packages.
This article presented the GROT method for controlling positive baseline trend within a nonoverlapping data analysis and demonstrated validity by performance equal to the best, current regression method by Allison and colleagues (Allison & Gorman, 1993; Faith et al., 1996). This equivalence was demonstrated both numerically and graphically. GROT also demonstrated reliability with visual analyst ratings despite producing a novel rotated graph, which may have presented challenges. Finally, rater response shifts from original to GROT graphs indicate compatibility with visual judgments and produced reliable detection of the amount of change in simple AB graphs, more so than for original graphs.
GROT advances our knowledge base by providing an additional technique for visual analysts, a technique which appears to improve accuracy in determining effects. Over the past few decades, there has been ongoing research on the training and supports that would enhance reliability of visual judgments from graphs (Ferron & Jones, 2006; Fisher, Kelley, & Lomas, 2003; Ximenes, Manolov, Solanas, & Quera, 2009). The GROT graph may be of service toward that goal. GROT may have use as a visual analyst tool alone, aside from nonoverlap calculations.
Ratings of original and GROT graphs consistently placed GROT effects equal or lower than for the original graphs. No GROT graphs were identified as showing larger effects, and this strong finding held over three independent raters and 59 graphs. However, about 70% of graphs were judged as showing no change in magnitude of effect. There are several possible explanations. In some instances, we suspect that visual judges cannot detect small changes. Our 3-point scale of “smaller, same, larger” was less sensitive than necessary to detect changes. Or, in some instances, adjusting positive baseline trend may not have eliminated effects of a very large magnitude. However, the minimum amount of change detectable from original to GROT graphs is a question with practical implications for visual analysts as is an empirical comparison between effect size changes and visual analysis estimations.
Another way GROT advances in the field is its convenience. GROT is a method which can be carried out entirely with pencil and ruler on a paper graph, so it is fully accessible to visual analysis. It advances the field because nonoverlap is widely used by SCR practitioners but is commonly criticized for failing to consider positive “preexisting” Phase A trend. The addition of Phase A trend control permits nonoverlap methods to compete with leading parametric methods.
An asset of GROT is that it is a general graphic approach which works equally well for any trend line, including linear regression, Tukey tri-split, Koenig bi-split, or Theil–Sen slope. The Tukey tri-split and Koenig bi-split alternatives were demonstrated. GROT is applicable with any nonoverlap method. Here, only PND and NAP were applied, but other nonoverlap indices could be used as well.
Cautions on ControllingBaseline Trend
Although this article has the primary goal of promoting a new analytic method, it also needs to raise concerns about the overuse of baseline trend control, that is, its use with unreliable, highly variable trend lines. All trend control methods noted in this article, including ECL, Allison and Gorman’s (1993) regression method, and GROT, adjust the full data series according to the slope of the Phase A trend line. However, there are at least three concerns with the use or misuse of such control. These concerns are not new (Scruggs & Mastropieri, 1994, 1998), but to date have not been addressed, so practitioners should be aware.
When Phase A trend is stable (i.e., lacks variability), a linear trend line is a reliable summary of the data points contained within the phase. Controlling the Phase A trend in cases with stable (i.e., nonvariable) data is likely to render a more appropriate estimate of effect across phases. In contrast, the application of baseline trend control with highly variable Phase A data raises some concerns. As the control is based on slope or trend line, it is blind to the potential unreliability of the Phase A trend line, so control from highly unreliable trend will have the same impact on Phase B scores as from a reliable trend. This is counterintuitive; some linear trend lines reflect data so poorly that they should not be fit to the data, let alone permitted to modify Phase B scores. Phase A data may simply lack linearity, and a straight line would be inappropriate.
The second concern is that control of Phase A trend becomes more extreme with a longer Phase B. Phase A trend line slope is most reliable within Phase A, and even more so at the center of Phase A. If extended through a long Phase B, the reliability or credibility of this slope quickly approaches zero. Given a long enough Phase B, the baseline trend control will transform Phase B data far outside the bounds of the score scale. Regression texts warn us about the very low reliability of projections into the future, and the problem is even greater for N = 1 single-participant data.
The third concern is the open question of whether Phase A trend would continue unabated through Phase B had there been no intervention. This is a difficult question to answer statistically, but we do know that a strong trend in the first 5 or 8 data points of a baseline is not a good predictor of trend in the next 5 or 8 data points (Parker, Cryer, & Byrns, 2006). The evidence indicates that strong trend in the first third or half of a baseline tends to moderate considerably in the final two thirds or half of that same baseline. This evidence is not conclusive, but is suggestive that positive baseline trend may not continue at strength into Phase B.
We believe that these three concerns have sufficient weight that control of baseline trend should be exercised cautiously, which is to say not with quite short baselines or with highly variable baseline data. Baseline control has logical appeal and is carried out with precision. However, data transformations from short and highly variable Phase A data may be an exercise in false precision.
In summary, this article has presented and a visual-graphic method of controlling for baseline trend within data nonoverlap and included preliminary reliability and validity comparisons. The method offers greater flexibility and power than White and Haring’s (1980) respected ECL method and is more accessible and more directly interpretable than the Allison and Gorman’s (1993) regression method. GROT leads to reliable judgments of behavior change and correctly reflects reduced effects from original data. Initial indications of performance are positive, however, as with any new analytic method, it requires testing over time and by a variety of researchers. If present results are borne out by others, then we hope to have contributed to the merging of statistical and visual analysis of SCR data.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
