Abstract
Single case experimental design (SCED) is an indispensable methodology when evaluating intervention efficacy. Despite long-standing success with using visual analyses to evaluate SCED data, this method has limited utility for conducting meta-analyses. This is critical because meta-analyses should drive practice and policy in behavioral disorders more than evidence derived from individual SCEDs. Even when analyzing data from individual studies, there is merit to using multiple analytic methods since statistical analyses in SCED can be challenging given small sample sizes and autocorrelated data. These complexities are exacerbated when using count data, which are common in SCEDs. Bayesian methods can be used to develop new statistical procedures that may address these challenges. The purpose of the present study was to formulate a within-subject Bayesian rate ratio effect size (BRR) for autocorrelated count data that would obviate the need for small sample corrections. This effect size is the first step toward building a between-subject rate ratio that can be used for meta-analyses. We illustrate this within-subject effect size using real data for an ABAB design and provide codes for practitioners who may want to compute BRR.
Keywords
The generation of evidence-based practices (EBPs) for students for whom typical instruction may not be effective must rely on research that meets strong methodological standards (e.g., Odom et al., 2005). One class of methods that can meet strong standards when evaluating intervention efficacy is the single-case experimental design (SCED). SCEDs can yield solid causal inference about treatment impacts and are of interest to federal agencies such as the Institute of Education Sciences (IES; e.g., Kratochwill et al., 2010, 2013) and the National Institutes of Health (NIH) for n-of-1 designs which are a special case of SCEDs (Gabler et al., 2011). The What Works Clearinghouse (WWC), for example, now offers reports on special education interventions that are largely informed by SCED work (e.g., WWC, 2016) and IES funds SCED research to advance development of EBPs. The development of EBPs can be further advanced by systematically synthesizing SCED evidence, which represents a critical facet of behavioral disorders literature (cf. Briesch & Briesch, 2016; Chaffee et al., 2017; Dart et al., 2014; Kilgus et al., 2016; Maggin, Chafouleas, et al., 2011; Maggin, O’Keefe, & Johnson, 2011; Soares et al., 2016). In principle, syntheses could be expanded by combining SCED effect sizes with impact estimates generated from other design types, such as randomized controlled trials and quasi-experiments (e.g., Hitchcock et al., 2014). Hence, EBP generation and evaluation is inextricably linked to SCED work and, as demonstrated later in this article, there is an ongoing need for methodological refinement to related statistical analyses. We argue that expansion of analytic options could in turn support behavioral analysis praxis, which entails combining research and practice (see, for example, Nastasi & Hitchcock, 2016) and, more distally, practice via corresponding improvements in our understanding of evidence.
So what is the basis for arguing that there is need to refine statistical analyses of SCED data? In this article, we focus on effect size estimation to address this question. Of course, several SCED effect size procedures exist, such as those based on percent of nonoverlap data points between the baseline and intervention phases (Parker et al., 2011) and standardized between-subject mean difference corrected for small sample sizes (Hedges et al., 2012, 2013). However, the former set of indices is problematic because they do not well account for outliers (Harrington & Velicer, 2015), cannot account for trend, and only measure nonoverlap but not an actual effect size. The latter set of indices are a significant innovation in SCED analyses because they are between-subject effect sizes that can correct for small sample sizes assuming that data are intervally scaled. However, it is more common in SCEDs to use count (e.g., the number of times some discrete behavior occurred) or proportion data (e.g., the percentage of time a student has appeared to be attentive in a classroom; Rindskopf, 2014). Moreover, SCED data are often autocorrelated which means the error at a given time-point (say t) is systematically correlated with the error at a different time-point (say t + l). This is referred to as a l-lag autocorrelation (e.g., 1-lag, 2-lag). This autocorrelation is the antithesis of the independence of observation assumption that is the basic tenet of all general linear models such as analysis of variance (ANOVA) and regression. Unfortunately, other commonly used effect sizes in behavioral research such as R2 (Cohen, 1988) do not account for autocorrelations. Therefore, there is a need for new effect size procedures that do not necessarily replace existing approaches but can at least be used in a supplementary fashion with existing procedures such as visual analyses so that SCED researchers can draw yet more information from their studies.
To be considered as a contribution to research and later practice, any new effect size estimation procedure should (a) account for both autocorrelations and the scale of the data commonly used in SCEDs; (b) deal with small sample sizes; and (c) produce reliable interval estimates of uncertainty. To our knowledge the effect size we propose here, the Bayesian rate ratio (BRR) effect size, is the first to meet these needs. To demonstrate the BRR, in this article we use data from a published study of an ABAB design that was used to reduce disruptive behaviors of students in an urban fourth grade math classroom (Lambert et al., 2006). We apply the BRR to show how Bayesian statistical significance testing can be conducted using SCED count data, and we assess the degree to which visual analyses, the nonoverlap of all pairs (NAP) effect size, and the BRR produce both complementary and contradictory information about the intervention effect. Before these demonstrations are presented, we offer an overview of how Bayesian estimation can contribute to the analyses of SCED data. This is because understanding the potential contribution of the BRR to SCED work first requires a review of the challenges that come with analyzing SCED data.
Challenges in Analyzing SCED Data
One of the main reasons for use of SCEDs is the need to document a functional relation between specified independent and dependent variables. Essentially each person (or case) serves as the unit of analysis and his or her own control to generate strong causal evidence about intervention effects. Evidence of intervention efficacy is documented primarily through visual analyses that focus on changes from baseline to intervention in the level, trend, variability, immediacy of effect, data overlap, and consistency in the behavioral pattern for similar phases (Gast & Ledford, 2014; Horner & Kratochwill, 2012). There is some, but not complete consensus among expert researchers on the decision-rules for making judgments regarding intervention effectiveness (e.g., Kratochwill et al., 2013), but the rules for visual analyses are not applied uniformly by behavioral and educational researchers (Horner et al., 2012). Furthermore, not all treatments exhibit immediacy effect and some treatment effects may not be visually striking, even though the overall data may show clinical and statistical effectiveness (Meadan et al., 2016). Therefore, although some researchers believe that visual analysis can be based on objective criteria (Horner et al., 2005; Roane et al., 2011), others see a need for quantitative methods to document intervention effects (e.g., Maggin et al., 2011; Parker et al., 2011).
We argue that, in principle, there is a need for both statistical and visual analysis to evaluate the causal validity of SCED findings via transparent, objective, and replicable procedures. Visual analysis primarily addresses the question of evidence of a functional relation between independent and dependent variables and statistical analysis quantifies the magnitude of the effect. We agree with authors who see visual analysis as an effective analytic approach but more information can be drawn from using multiple analytic methods and visual analyses do come with drawbacks.
Autocorrelation
A primary drawback from using visual analysis alone is based on the problem of autocorrelated errors (Harrington & Velicer, 2015), which is typical of SCED data given the need for repeated measures. Autocorrelation can contribute to decreased interrater reliability during visual analyses (Brossart et al., 2006) and increase in Type I errors (Horner & Kratochwill, 2012; Lanovaz & Rapp, 2016; Maggin & Chafouleas, 2013).
If visual analyses were imperfect with respect to distinguishing autocorrelation from true performance change, one would hope to use statistical analyses to offer complimentary procedures so that researchers are better able to understand treatment effects. However, regularly used statistical methods of analyses such as ANOVA and regression are poorly suited for most SCED studies. To begin, ANOVA/regression-based (i.e., ordinary least squares [OLS]) methods (a) entail assuming that observations are independent (i.e., the antithesis of autocorrelation) and (b) related analyses should be expected to contend with higher rates of Type II errors because SCEDs typically entail use of small sample sizes (Gresham et al., 2001).
On the other hand, OLS procedures can be used to detect the presence of autocorrelation. However, confidence intervals (CIs) of autocorrelation estimates, which are needed to help us understand whether we can rule out autocorrelation, tend to be inaccurate because they tend to have undercoverage. Undercoverage means that CIs are narrower than they should be (Shadish et al., 2013) and thereby makes it difficult to assess whether autocorrelation is a concern. A subtler issue is that the challenges of inadequate autocorrelation diagnostics and small sample sizes interact. Autocorrelation estimates are often negatively biased and are accompanied by larger sampling errors because SCEDs typically have a small number of observations per participant in a study. Huitema and McKean (1994) and McKnight et al. (2000) state that 50 observations per participant are about the minimum threshold needed to address these sampling error concerns. In contrast, a review of 809 SCEDs published in 113 studies in the year 2008 in 21 journals, studies typically had only four to six observations per phase (Shadish & Sullivan, 2011). This is important because OLS CIs have undercoverage, meaning fewer than expected autocorrelation CIs contain the true value (Shadish et al., 2013). This concern is exacerbated when there are a minimal number of data points per phase. In short, OLS procedures for assessing the presence of autocorrelation in SCED data may lead analysts to proceed with false confidence.
Effect Sizes
Concerns with the use of standard statistical approaches move beyond autocorrelation, Type I, and Type II errors. Effect size estimates are also problematic because of the reasons discussed below. Standardized mean difference type effect sizes obtained from SCEDs require correction for small samples and require distributional assumptions that might not fit with typical analytic scenarios (Hedges et al., 2012, 2013). Of course, nonoverlap indices represent a good option because they are free of distributional assumptions and can be applied to count data (Parker et al., 2011); there is reason after all for their long-standing use. However, NAP indices do not help researchers account for the distance between data points and consider only their nonoverlap. This renders nonoverlap between two closely spaced points the same as nonoverlap between two widely spaced points. By logic, however, we expect the effect size of the former case should be greater than the effect size of the latter. Moreover, the standard errors proposed for NAP are not free of distributional assumptions and may be biased in the presence of autocorrelation. Due to space restrictions, we do not review all nonoverlap SCED effect sizes options (see instead Parker et al., 2011). Furthermore, computing p values and CIs for nonoverlap metrics entail complex procedures (Parker et al., 2011). For all of these reasons, there is a need for different quantitative analytic solutions (e.g., Shadish et al., 2013). We argue that Bayesian methods can yield a viable solution that can overcome these challenges.
Bayesian Methods and SCEDs
A fundamental reason for why a Bayesian approach can be of use in the examination of SCED data is that it does not depend on large sample or asymptotic theory (Ansari et al., 2000; Ansari & Jedidi, 2000). Bayesian methods also allow more direct probabilistic interpretation of parameters than do frequentist methods, which generally use OLS and the sort of null hypothesis testing procedures that are based on Fisher’s work (Cohen, 1994). Bayesian estimation entails examining the posterior distributions of parameters such as intercepts, slopes, and effect sizes, and provides the probability (or credibility value) of each value an estimated parameter can take (Kruschke, 2013). Unfortunately, most applied researchers are not trained in Bayesian estimation (Natesan et al., in press). Perhaps as a result, Bayesian methods are not typically used in SCED work. Consider, for example, that of the 239 SCED articles published in the first half of 2018, only four mention the word Bayesian; of these four, none were empirical works (Natesan, 2019).
Fortunately, statistical methodologists have started to work out how Bayesian methods can be deployed to overcome various analytical challenges presented by SCED data. For instance, Moeyaert et al. (2017) compared maximum likelihood and Bayesian estimation of multilevel modeling of SCED data. Natesan and Hedges (2017) proposed a Bayesian unknown change-point model that overcomes the small data and autocorrelation challenges of SCEDs by using Bayesian methodology. Natesan et al. (2020). extended this work further to include multiple phases such as the ABAB design. These works do not however address count data because such data require making different distributional assumptions, and as mentioned above, count data are more common in SCED work than interval data. Therefore, the present article describes how to use within-subject Bayesian effect sizes and in particular addresses statistical complexities that arise from using count data (the BRR). It is of import to note that the BRR we present is a within-subject effect size and cannot be directly used in meta-analyses unlike the one proposed by Hedges et al. (2012). Nonetheless, we see this proposed BRR as the first step to build an equivalent between-subject effect size for count data that can be used for meta-analyses. Importantly, the programs used to compute the indices for ABAB designs are available to download for free from github (https://github.com/prathiba-stat/Bayesian-rate-ratio) along with annotations so that researchers can modify and input data for their own research. By demonstrating this method, discussing its advantages, and making the software codes accessible, this article can help researchers compute the BRR. In addition, since SCED researchers commonly use visual analyses, we show how to visually examine posterior density plots and regions of practical equivalence (ROPE), which we consider to be a part of the BRR process. With that background, there are three fundamental reasons for why a Bayesian approach should be considered when analyzing SCED data. We present these issues and then describe BRR.
Use of Bayesian Estimation: An Overview of Three Fundamental Issues
In Bayesian methods, each parameter estimate (an outcome that is calculated) represents a distribution of values; in contrast, when using frequentist methods one calculates a point estimate and applies a null hypothesis test. A Bayesian parameter can be of greater utility than a null hypothesis significance testing, and associated CI, derived from frequentist statistics. This is because (a) a frequentist CI is often misunderstood in practice and (b) by itself does not support replication research, representing two fundamental issues that warrant use of Bayesian estimation. As for the first issue, unless the interpreter of frequentist results is well acclimated to how the process works, it can be easy to misconstrue a finding. To explain, when a frequentist 95% CI (or 68%, etc.) is constructed around a point estimate, such as a standardized mean difference effect size (SMD), this does not mean that there is a 95% chance that the observed difference is a true representation of a population difference (Gelman et al., 2013). Yet according to Cohen (1994) this is the incorrect interpretation many will make.
To explain, consider 10,000 samples. If one were to obtain CIs from these 10,000 samples, then a 95% CI means that 95% of these CIs would contain the true value. In sum, constructing the frequentist CI entails using normal curve theory to provide a sense of how many of some number of (theoretical) sample draws contain the null value and thereby gives a researcher a basis on which to consider whether to reject a null hypothesis. With that background, the frequentist CI is often misinterpreted as representing the probability that the point estimate (in this example, the observed SMD) is the population parameter, or close to the population parameter (again, see Cohen, 1994). But to be clear, one might be highly confident in rejecting a null hypothesis using frequentist methods but still have limited capacity in guessing the actual value of population parameter. In contrast, in Bayesian estimation the probability that a statistical estimate falls in the 95% credibility interval is much more straightforward. The chances that the observed SMD reflects the actual population value is, well, to be interpreted as 95% (Kruschke, 2015).
This connects to the second fundamental issue. This form of interpretation supports replication research, which has become an important topic in special education (e.g., Cook, 2014). This is because researchers should be using prior information to hypothesize (and empirically test) the size of a plausible treatment impacts on some outcome measure. As more information is gathered, the hypothesis becomes more refined. Hence, having a Bayesian mind-set entails continued thought about replication. If researchers are working with distributions of plausible treatment values, they will be in a stronger position to specify the strength of an intervention in advance of a study.
A third big picture issue is that Bayesian methods can more easily accommodate model complexities and several data types in a way that addresses the numerous concerns described earlier in this article. These methods can handle proportion and count data, which are common in SCED work (e.g., Rindskopf, 2014), and Shadish et al. (2013) found that Bayesian estimates of autocorrelation were more accurate than frequentist estimates. In all, we are not advocating that frequentist methods be discontinued in SCED research but we do argue that they are often poorly suited to SCED data analysis so should be used more sparingly. In contrast, Bayesian approaches do not entail the same drawbacks and they can complement visual analyses. With that background, we turn to BRR details.
Applying the BRR to SCED Count Data
The Bayesian model used to analyze SCEDs in the present study is based on an interrupted time-series design that entails Bayesian estimation (Natesan, 2019; Natesan et al., 2020; Natesan & Hedges, 2017, 2019; Natesan et al., under review). As the name implies, an interrupted time-series design is a longitudinal design with time as the independent variable and has an outcome variable of interest tracked across time. A sudden introduction or withdrawal of a stimulus at a certain time-point causes an interruption in the pattern obtained until this time-point. Following this interruption, the outcome variable may follow a different pattern. This is the typical setup of an interrupted time-series design. Thus, SCEDs are variants of these designs. In fact, in the ABAB design that we will illustrate, there are three interruptions—baseline to intervention, removal of baseline from intervention, and reintroduction of intervention. We refer to the Bayesian estimation of an interrupted time-series design as a Bayesian interrupted time-series (BITS) design. In our conceptualization of BITS, intercepts vary by phase (as in AB phases used in most SCEDs). We assume use of count data and in this approach the dependent variable is modeled using Poisson regression. We do not assume trend in the data that may appear due to anything other than autocorrelation because research has shown that BITS CIs of SCED data with autocorrelation and trend due to source other than autocorrelation severely underperform (Natesan & Hedges, 2019). In fact, the model confounds the patterns due to two sources of trend, that is, autocorrelation and trend from other sources such as a growth or decline in the outcome variable, that it is impossible to separate the variance that can be attributed to trend and the variance that can be attributed to autocorrelation. Natesan and Hedges (2019) recommend that for SCEDs, models that estimate only autocorrelation or trend due to other sources be estimated and not both in the same model. Therefore, only the simplest model, that is, the model with intercepts and autocorrelations alone is considered in the present study. The observed value at the first time point
In Equation 1,
And
In Equation 2,
Consider a design with only two phases: baseline and treatment. Let the time-points in the baseline phase be
The intercepts are drawn from normal distributions with hyperpriors (i.e., prior on a prior) to reduce the impact of prior specification on the estimates (Natesan et al., 2016). The means of these normal distributions (
Although the use of appropriate priors is very much a growing field and there is no generic guidance on whether there is a prior that works for all parameters (this is probably not possible), the general rule for use of priors is to use reasonable estimates with reasonable uncertainty specification. For instance, Natesan et al. (2016) conducted a study of prior comparisons that showed that using priors that matched the generating distribution produced comparably good estimates as hierarchical priors as used in the present study. However, using extremely less informative priors such as having a very large standard deviation led to improper posteriors. In general, when nothing is known about the estimates, a sensitivity analysis where different priors are tested to see how they affect the posterior distributions is recommended.
An effect size estimate of the treatment can be obtained from the posterior distribution of the rate ratio of the mean of the distribution from which the intercepts are drawn as given in Equation 9:
The rate ratio can be interpreted as the ratio of the rate between the treatment and the baseline phases. Larger rate ratio values are desirable for positive outcome variables because this would indicate the effectiveness of the intervention in increasing the occurrence of positive outcome variables in the treatment phase compared with that of the baseline phase. In the ABAB design, there will be three rate ratio effect sizes that will measure intervention effect, removal of intervention effect, and reintroduction of intervention effect.
The details of the Gibbs sampler are given in the appendix. We now demonstrate how these concepts can be applied to a real data set obtained from an SCED that implemented a function-based comprehensive behavioral intervention. This intervention was implemented to decrease problem behavior and increase socially appropriate behavior of four children in an elementary school.
ABAB Example
In the study by Lambert et al. (2006), the effect of response cards on disruptive behavior of urban fourth-grade students during Math lessons was measured. The baseline phase was with a single-student responding (SSR) and the treatment phase was where each student would write a response to a question posed by the teacher. Students with frequent disruptive behaviors in the classroom were selected to participate in the study. We chose this study’s data because it used ABAB design with count data, which was appropriate for demonstrating BRR for count data. The number of disruptive behaviors during SSR and response card phase (RC) for the students was the outcome variable. The data are plotted in Figure 1.

Data plot of Lambert et al. (2006).
Visual Analysis Results and Discussion
Visual analysis is commonly used to determine the existence of a functional relation between the independent and dependent variables and to specifically determine the stability of the behavioral pattern, change in the level of performance, immediacy of effect, direction of the trend line, and consistency in data across similar phases. In addition to visual analyses, the NAP (Parker et al., 2011) effect size was computed to determine the magnitude of effect. NAP is a nonparametric technique to measure overlap for two phases and yields the percentage of improvement data across adjacent phases. It does not account for trend. Although Parker and Vannest (2009) claim that it is appropriate for nearly all data types and distributions, it cannot distinguish between various levels of nonoverlap. For instance, a 100% nonoverlap could be due to outliers or unusually big effects or very small effects.
As shown in Figure 1 and noted by the authors, the mean and median for Group A across blocks of sessions during the first baseline was seven instances of disruptive behaviors. The mean decreased to 0.5 during the first intervention phase (median = 0). Similarly, the mean increased to 7.875 (median = 8) during the second baseline and then decreased to 2 (median = 2) when intervention was reinstituted. Data show an immediate effect where the average of the last three Baseline I data points show an average of 6.33 which decreased to 0.67 for the first three data points in Intervention I. A similar pattern was noted following a reversal and reinstitution of intervention, going from an average of 0.933 in Baseline II to 2.66 for the first three data points for Intervention II. The trend is not clearly discernible from the figure. Finally, data also show consistency in the pattern across similar phases when the independent variable was manipulated to document replications of effect. Results for the NAP effect size are given in Table 1. The results show that there is no overlap between the phases.
NAP Results.
Note. NAP = nonoverlap of all pairs; CI = confidence interval; A1, B1, A2, and B2 refer to Baseline I, Intervention I, Baseline II, and Intervention II, respectively.
Statistical Analyses
JAGS 4.0.0 (Just another Gibbs sampler; Plummer, 2003) was used to fit the data. The R package runjags (Denwood, 2016) runs parallel chains and iterates the model estimates until convergence. Runjags checks convergence using two convergence diagnostics: the multivariate potential scale reduction factor (MPSRF; Brooks & Gelman, 1998) and Heidelberger and Welch’s convergence diagnostic (Heidelberger & Welch, 1983). Four chains were run. The corresponding JAGS code is given in github. The estimates are shown in Table 2.
Parameter Estimates From BRR.
Note. BRR = Bayesian rate ratio effect size.
Level changes
The estimated levels of the outcome variable in both baseline and intervention phases are approximately equal to the means reported in the visual analysis section (6.45 and 0.5, respectively). Readers can compare the exponent of the
Rate ratio effect size
Posterior density plots of the rate ratio effect sizes are given in Figure 2. To recap, the rate ratio is interpreted as a reduction or increase in treatment compared with the baseline. Therefore, the decrease in the outcome variable was 0.06 times of what it was in Baseline Phase I. When considering 95% of the highest density interval (HDI) of the posterior density, the outcome variable in Intervention Phase I is 0.02 to 0.17 times of what it was in Baseline Phase I. This shows that disruptive behaviors in the first intervention phase were only 0.2% to 17% of what they were in the baseline phase. Similarly, the outcome variable in Baseline Phase II is 2.9 to 23 times higher than it was in Intervention Phase I and the outcome variable in intervention Phase II is 0.144 to 0.38 times lower than in Baseline Phase II. The researcher is now free to choose an ROPE (Kruschke, 2013) where the null hypothesis that there was no effect can be accepted based on what values the researcher deems to be negligibly different from the null. The posterior distribution of the rate ratio in Figure 2 gives both the probable values of the rate ratio and their corresponding probabilities (i.e., probability density). For instance, the effect size between Baseline Phase I and Intervention Phase I is peaked at 0.06 (mode) and its probable values run from approximately 0.02 to 0.2 with 95% of the values lying between 0.02 and 0.17. This is the 95% HDI.

Posterior density plots of the rate ratio of the intercepts.
Suppose we decide that a treatment is effective only if the outcome variable is decreased to not more than 40% of the original frequency of disruptive behaviors. We see that none of the posterior distributions for the phase changes between Baseline I and Intervention I, and Baseline II and Intervention II contain the value of 0.4. Similarly, the posterior between phase change from Intervention I to Baseline II does not contain the reciprocal of 0.4 which is 2.5. The null can be accepted for all phase changes because the probabilities of the rate ratio of outcome variable being less than 0.4 times for phase change between baseline and intervention and greater than 2.5 times for phase change between intervention and baseline are 100% for all three phase changes as seen in Figure 2. The vertical lines at 0.4 and 2.5 in the figure show the hypothesized value chosen by the researcher and the percentage value in the figure represents the probability mass that falls on the right side of the hypothesized value for the first and the last phase changes. Obviously, this direction is reversed for the change in phases between Intervention I and Baseline II because here the researcher would be looking for an increase in the outcome variable to at least 2.5 times the value in the Intervention I phase.
In sum, we can see that the results of the statistical analysis support the results of visual analysis with respect to the median and mean estimates of the outcome variable in each phase and the presence of immediacy effect. However, what BRR adds over the visual analysis is that it produces a statistically sound effect size with posterior distribution which can be used to make decisions about the statistical significance of the effect and produces estimates of autocorrelation, and other statistics along with their respective posterior distributions. This is clearly an addition to the existing protocol that is generally used for analyzing count data in SCEDs.
Limitations
One key limitation of the BRR that we present, relative to synthesis work, is it is based on within-subject and not between-subject contrasts. Hence, it is not (yet) appropriate for use in synthesizing effects size estimates derived from group-design studies like randomized controlled trials. Fundamentally, the variance properties in SCEDs and group-design studies tend to be very different, rendering different playing fields (see, for example, Lipsey & Wilson, 2001). We do anticipate that with extensive simulation work a procedure that allows for syntheses across effect sizes is possible, but for now BRR should be seen as a way to gain statistical insights into single studies and facilitate syntheses across multiple SCEDs. There is a learning curve associated with implementing the codes that are attached to this article. But the rewards for computing BRR are well worth the effort as we have demonstrated in the previous sections. The appropriate use of priors needs to addressed at this juncture. Improper priors can lead to improper posteriors. Therefore, it is recommended that researchers try various prior specifications for their analyses and test if the posteriors are sensitive to prior specification. This type of sensitivity analysis can produce more confidence in the posterior estimates. Finally, the fact that this is a ratio presents the researcher with a possible set of two problems: (a) when the denominator value is zero or very close to zero and (b) when the denominator value has a very large posterior standard deviation. Although one could logically truncate the posterior by removing the lowest and the highest, say 1% of the posterior estimates to compute the rate ratio, this is a crude fix. Thus, this remains an avenue for further research.
Conclusion
The main purpose of this study was to present the BRR effect size for supplementary use in the analysis of SCED data. The model we presented estimates autocorrelation along with BRR. The estimates of the intercepts from both visual and statistical analysis were similar. The autocorrelation estimates could not be computed using visual analysis, but their rather high values from the statistical analysis shows that this statistic cannot be ignored. The BRR also produced plots that showed ROPE which are a nice addition to the visual plots in SCED analysis. Researchers can visually detect the magnitude of the effect based on the posterior distributions of the rate ratio. However, we cannot altogether abandon visual analysis because of their simplicity and ease of use. As mentioned before, Bayesian estimation has a steep learning curve associated with it and many articles, workshops, and courses need to be made available to help applied researchers use this method. Shiny apps and easy to use software tools would also help improve the user-friendliness of the methodology.
BRR can (a) deal with count data, (b) handle small samples, (c) produce interval estimates of uncertainty, and (d) complement visual analyses via ROPE. In principle, BRR could also be used in SCED research syntheses (a point to be demonstrated in future work). Hence, BRR is a new and useful analytic tool for SCED analyses. We have shared the programs used to produce rate ratio effect size estimates and the ROPE comparisons.
The fact that the rate ratio estimation is made possible even for such short time-series is compelling evidence of the flexibility of Bayesian modeling. To date, the model we presented is the only inferential statistical procedure that estimates intercepts and effect sizes, accounts for autocorrelations and small sample sizes, and works with count data. This form of estimation can thus yield better understanding of SCED data and could, in principle, support SCED research syntheses. However, recall the limitation that the effect size used here is still based on a within-subjects design. Further work is needed to understand if and how Bayesian procedures might quantitatively synthesize within-subject and between-subject effects.
We think that researchers who use SCEDs when combining research and practice (praxis) or researchers who engage in synthetic SCED work will be interested in use of the BRR as either an alternate or complementary analytic approach. Furthermore, given the amount of SCED synthesis work conducted in the field, exploring the use of new effect sizes that account for several difficulties with OLS methods and can be used to complement visual analyses, should provide a basis for the long-term viability of the BRR. The BRR will be of further utility within special educational and psychological research (and science-practitioners who conduct SCEDs) if user-friendly internet freeware can be developed and tested, which is a step that will be pursued in future work.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
