Abstract
In the single-case design (SCD) literature, five sets of standards have been formulated and distinguished: design standards, assessment standards, analysis standards, reporting standards, and research synthesis standards. This article reviews computing tools that can assist researchers and practitioners in meeting the analysis standards recommended by the What Works Clearinghouse: Procedures and Standards Handbook—the WWC standards. These tools consist of specialized web-based calculators or downloadable software for SCD data, and algorithms or programs written in Excel, SAS procedures, SPSS commands/Macros, or the R programming language. We aligned these tools with the WWC standards and evaluated them for accuracy and treatment of missing data, using two published data sets. All tools were tested to be accurate. When missing data were present, most tools either gave an error message or conducted analysis based on the available data. Only one program used a single imputation method. This article concludes with suggestions for an inclusive computing tool or environment, additional research on the treatment of missing data, and reasonable and flexible interpretations of the WWC standards.
Single-case designs (SCDs) have been used in a variety of research and clinical studies including education, psychology, psychiatry, neuroscience, and social work. Behavior Modification is one of the most important journals that draw researchers’ attention to breakthrough SCD methodologies. Horner et al. (2005) described single-case experimental research as “a rigorous, scientific methodology used to define basic principles of behavior and establish evidence-based practices” (p. 165). In SCDs, the outcome behavior is broadly defined; it can be a change in self-harm, whistling, sitting, showing empathy, mood, or emotion. These outcomes can be expressed as duration (e.g., how long the self-harm endures), frequency (e.g., how often whistling/sitting in a chair occurs in a 2-min interval), or intensity (e.g., the degree of mood swings). Because SCDs assess the outcome behavior repeatedly, the instrument adopted to measure the behavior should be reliable and insensitive to the environment (e.g., time, place, or participants’ mood). Observers of such an outcome behavior should be well trained. According to Horner et al. (2005), Kennedy (2005), Kratochwill and Levin (1992), and O’Neill, McDonnell, Billingsley, and Jenson (2011), the unique characteristics of SCDs, namely, repeated observation of the outcome behavior for each participant, reliable measurements, a combination of baseline and intervention phases, and one or more dependent variables, enable researchers and practitioners to investigate the effectiveness of behavioral interventions in situations where randomized controlled trials are not feasible. Moreover, SCDs enhance researchers’ and practitioners’ understanding of how, when, and for whom the behavioral interventions are most effective.
In the SCD literature, five sets of standards have been formulated and distinguished. The first set is concerned with the design soundness of SCDs (Horner et al., 2005; Kratochwill et al., 2013; Manolov, Gast, Perdices, & Evans, 2014; Smith, 2012; Tate et al., 2008; Tate, Perdices, McDonald, Togher, & Rosenkoetter, 2014; Tate et al., 2013). The second set of standards is concerned with measurement and assessment of the outcome behavior (Kratochwill & Levin, 2014; Smith, 2012). The third set of standards is concerned with data analysis to determine a functional (causal) relationship between the intervention and the outcome behavior (Beeson & Robey, 2006; Horner et al., 2005; Institute of Education Sciences [IES], 2013; Kratochwill et al., 2013; Lane & Gast, 2014; Maggin et al., 2011; Tate et al., 2014). The fourth set of standards is concerned with the reporting of intervention effects found in SCD studies (Beeson & Robey, 2006; Brossart, Vannest, Davis, & Patience, 2014; Heyvaert, Wendt, Van den Noortgate, & Onghena, 2014; Manolov et al., 2014; Wolery, Dunlap, & Ledford, 2011). And the fifth set of standards is concerned with meta-analysis of SCD findings (Heyvaert et al., 2014; Kratochwill & Levin, 2014; Smith, 2012; Tate et al., 2008; Tate et al., 2014; Tate et al., 2013; Wendt & Miller, 2012). If an SCD study does not meet the first set of design standards, the analysis and the reporting of the findings are not as meaningful or interpretable as those derived from studies that meet design standards (Kratochwill et al., 2013). The design standards advocated by Kratochwill et al. (2013) and Smith (2012) include features such as systematic manipulation of an independent variable. Researchers using SCD to investigate a functional relationship between an intervention and the desired outcome should try their best to meet these standards, or minimally, meet the standards with some reservations.
In this article, we focus on the third set of analysis standards for determining if a functional (causal) relationship exists between the intervention and the outcome behavior. Specifically, we reviewed computing tools that can assist researchers in determining if such a functional relationship exists, assuming that design standards have been met completely or partially. These computing tools include web-based calculators, free-download packages, and macros/computing codes in statistical software. According to the Institute of Education Sciences’ publication, What Works Clearinghouse: Procedures and Standards Handbook (IES, 2013, hereafter abbreviated as the WWC Handbook), a functional relationship between an intervention and the outcome behavior should be demonstrated in six data features: level/level change, trend, variability, immediacy of the effects, overlap, and consistency of data in similar phases; these are referred to as the WWC standards in this article. Either visual analysis or statistical analysis, or both, can be used to demonstrate these features. And each analysis can be facilitated by a computing software or algorithm. Results from a computing tool can be verified easily for accuracy and its treatment of missing data. They can also be compiled or compared, as in a meta-analysis. For these reasons, this article aims to review computing tools that can help researchers and practitioners determine the effectiveness of an intervention in SCD studies.
Recently, Neuropsychological Rehabilitation: An International Journal published the entire issue 3-4 of Volume 24 (2014) on conducting and analyzing data collected from SCD experiments. In that issue, Manolov et al. (2014) published a list of software that executed statistical techniques used in other articles published in the same issue. The present article improves on Manolov et al.’s list by aligning the computing tools with the WWC standards for determining a functional (causal) relationship between an intervention and an outcome behavior in SCDs. Specifically, this article aims to (a) publicize computing tools that can assist researchers in determining intervention effects in SCD studies based on the WWC standards and (b) evaluate computing tools in terms of accuracy and treatment of missing data. We present the WWC standards first, followed by a summary of the computing tools and our evaluation. This article concludes with suggestions for an inclusive computing environment or tool, additional research in the treatment of missing data and flexible interpretations of the WWC standards.
The WWC Standards
The WWC standards were formulated by the Institute of Education Sciences to help researchers and practitioners determine whether (a) the observed pattern of data in the intervention phase is due to the intervention effects, and (b) the observed pattern of data in the intervention phase is different from the pattern of data, predicated from data in the baseline phase (IES, 2013). The WWC standards recommend the examination of six data features collectively to support claims for an intervention effect. These six data features are defined below.
Level/Level Change
Level is defined as “the mean score for data within a phase” (IES, 2013, p. E.6). The determination of a level change can be facilitated by inserting a reference line into the baseline and the intervention data at an average (e.g., a mean or median value). Alternatively, a level change can be documented by standardized mean differences (Busk & Serlin, 1992; Cohen, 1988; Hedges, Pustejovsky, & Shadish, 2012), which use the mean difference between the intervention and the baseline phases, divided by the SD of the baseline phase (hereafter referred to as standardized mean difference) or by the pooled SD (hereafter referred to as pooled standardized mean difference) for each participant. To account for the autocorrelation, Hedges et al. (2012) proposed a different standardized mean difference index, hereafter referred to as the HPS d statistic. Computation details for HPS d are presented in the appendix.
According to Hedges et al. (2012); Hedges, Pustejovsky, and Shadish (2013); and Shadish, Hedges, and Pustejovsky (2014), HPS d and its underlying linear model have limitations. For example, applications of HPS d require a minimum of three participants and five assumptions. The five assumptions are as follows: (a) the intervention effect is constant across all participants; (b) there is no trend over time in either the baseline or the intervention phase; (c) the sampling errors of scores within participants are normally distributed; (d) the sampling errors of scores within participants have a first-order autocorrelation structure 1 ; and (e) the variance of scores between participants is a constant. Recently, Pustejovsky, Hedges, and Shadish (2014) extended the logic of HPS d to more complex models, such as hierarchical linear models, in which the intervention effect can vary across participants, the trends are allowed in both the baseline and the intervention phases, and the intervention-by-trend interaction is allowed to vary across participants.
Two newer indices proposed to assess a level change can account for trends. These are the mean phase difference (MPD; Manolov & Solanas, 2013) and the slope and level change (SLC) for the detrended data (Solanas, Manolov, & Onghena, 2010). The MPD is the mean difference between the intervention data and predicted intervention data, projected from the baseline trend. The predicted intervention score,
where nBaseline is the number of scores in baseline, y1 is the first baseline score, and
The SLC proposed by Solanas et al. (2010) is estimated in three steps, outlined in Equations 2 to 4. The first step uses the
where
where
where i = 1, 2, . . . , and nIntervention. To interpret SLC properly, Solanas et al. (2010) recommended the visual examination of detrended data as well as the original data, in light of the SLC value and a practitioner’s substantive criteria, to determine the effectiveness of an intervention.
Correlation coefficients can also be used to assess a level change by correlating an outcome variable with a dichotomous dummy variable that takes on two distinct values to denote the baseline or the intervention phase separately (such as 0 for baseline and 1 for intervention). The correlation coefficient can either be nonparametric (e.g., Spearman’s rs) or parametric (e.g., Pearson’s r). If Pearson’s r is used, it is statistically or functionally equivalent to conducting an independent t test that compares the baseline mean with the intervention mean.
If some form of randomization (e.g., phase-order randomization, intervention-order randomization, intervention start-point randomization, and case randomization) is incorporated into an SCD study, the level change between the baseline and the intervention phases can be assessed by permutation tests, also called randomization tests (Ferron & Levin, 2014). The randomization test computes the probability of obtaining the observed mean difference, under the null hypothesis of no treatment/intervention effect. If the probability computed is less than or equal to an alpha level prespecified by the researcher(s), the level change from the baseline phase to the intervention phase is decided to be statistically significant at that alpha level. A randomization test can be performed as a one-tailed test as well as a two-tailed test. The internal validity and scientific credibility of SCD findings, based on the randomization tests, are enhanced by the implementation of randomization (Kratochwill & Levin, 2014). One form of the intervention start-point randomization suitable for multiple-based designs is the regulated randomization (Koehler & Levin, 1998). In a regulated randomization multiple-baseline design, participants are randomly assigned to a randomly chosen start point in an interval. These intervals are staggered and structured in such a way that no two participants can be assigned to the same interval.
Even though the WWC standards do not explicitly specify other measures of central tendency, such as median or trimmed mean, as an evidence for level/level change, other scholars have begun to include them as evidence (Bulté & Onghena, 2013; Kratochwill, Levin, Horner, & Swoboda, 2014). Moreover, a level change may be defined in terms of raw mean difference, as long as the raw measurement (e.g., weight gain in pounds, number of bed-wetting) is meaningful.
Trend
Trend refers to “the slope of the best-fitting straight line for the data within a phase.” (IES, 2013, p. E.6). The best-fitting straight line can be a parametric (e.g., the least squares) or a nonparametric (e.g., the Theil–Sen) regression line. The least squares regression line minimizes the squared differences between the observed and the predicted regression lines fitted to each phase. The Theil–Sen regression line is a nonparametric robust regression line for skewed and heteroscedastic data, insensitive to outliers. The Theil–Sen regression line is constructed from the Theil–Sen slope (βTS), which is the median of the slopes determined by all pairs of scores:
where
In visual analysis, trends can be demonstrated by the split-middle method (Kazdin, 1982; White, 1972), the resistant trend line fitting (Franklin, Gorman, Beasley, & Allison, 1997), or the running medians method (Morley & Adams, 1991). A split-middle line is constructed for each phase. First, scores in each phase are split into halves by time points, such as the first five scores and the second five scores. For each half, the point corresponding to the median of this half of scores and the middle time point, such as 3 for the first half and 8 for the second half, is identified. Finally, these two median points are connected across each phase. Thus, a trend is readily apparent if the split-middle lines constructed from the baseline and intervention phases show slope and level differences. Because the split-middle line is constructed from medians, it is less influenced by outliers, than a trend that is constructed from all scores, such as the least squares regression line (Bulté & Onghena, 2012). The split-middle line demonstrates trends easily for shorter time periods such as between 4 and 14 time points. In contrast, the resistant trend line demonstrates trends better for longer time periods (Bulté & Onghena, 2012). The resistant trend line is constructed similarly as the split-middle lines, except that scores in each phase are divided into three equal intervals, instead of two. The slope of the resistant trend line is determined by the two outer intervals, that is, the difference in medians of the first and of the third intervals, divided by the difference in the median time points of the two intervals (Kratochwill et al., 2014). The intercept of the resistant trend line is determined from all scores in three intervals. The running medians method demonstrates the trend by connecting the medians of successive, nonoverlapping intervals of a certain number of scores. The interval width is called “batch size” which can be 4, 5, or any number that makes sense in terms of the number of scores and the resulting smoothed curve (Bulté & Onghena, 2012; Kratochwill et al., 2014; Tukey, 1977). Even though the WWC standards do not explicitly refer to nonlinear trends as an evidence for trends, other scholars have begun to include them as evidence (Bulté & Onghena, 2013; Kratochwill et al., 2014).
Variability
Variability is defined as “the range or standard deviation of data about the best-fitting straight line” (IES, 2013, p. E.6). Specifically, variability can be assessed by the standard error of least squared regression coefficients. Other indices, such as the interquartile range, range, SD, and variance, can also be used to determine the variability in data from the same phase.
To visually assess variability, researchers can use one of three methods: superimposing range lines or trended ranges on raw data, or drawing range bar graphs. Range lines are horizontal lines, parallel to the x-axis, that are drawn at the lowest and the highest scores for each phase. Trended range is created from connecting the maximums, or the minimums, from two halves of each phase, at the middle point of each half phase. A range bar graph is a vertical bar displaying the maximum, the minimum, and a central tendency index (e.g., mean or median) drawn at x-axis points labeled either as baseline or as intervention phase. It is similar to a box plot, with whiskers, but without the 25th and the 75th percentiles.
Immediacy of the Effect
According to the WWC Handbook, Immediacy of the effect refers to the change in level between the last three data points in one phase and the first three data points of the next. The more rapid (or immediate) the effect, the more convincing the inference that change in the outcome measure was due to manipulation of the independent variable. (IES, 2013, p. E.6)
This data feature can be assessed by visualization or by comparing the mean of the last three data points in one phase with the mean of the first three data points of the next phase. Even though the WWC standards explicitly points out using three data points before and after introducing a phase as a measure of immediacy, we recommend that researchers take contextual factors into consideration, rather than sticking with three data points in determining immediacy.
Overlap
According to the WWC Handbook, Overlap refers to the proportion of data from one phase that overlaps with data from the previous phase. The smaller the proportion of overlapping data points (or conversely, the larger the separation), the more compelling the demonstration of an effect. (IES, 2013, p. E.6)
Overlap can be assessed by such indices as the percentage of nonoverlapping data (PND; Scruggs, Mastropieri, & Casro, 1987), the percentage of nonoverlapping corrected data (PNCD; Manolov & Solanas, 2009), the percentage of data exceeding the median (PEM; Ma, 2006), the percentage of all nonoverlapping data (PAND; Parker, Hagan-Burke, & Vannest, 2007), nonoverlap of all pairs (NAP; Parker & Vannest, 2009), the improvement rate difference (IRD; Parker, Vannest, & Brown, 2009), and Tau-U (Parker, Vannest, Davis, & Sauber, 2011). The higher the overlap index value, the less overlap is implied between the baseline and the intervention data. Therefore, these overlap indices are technically measures of nonoverlap between these two phases. Guidelines are needed for interpreting each index based on its definition and statistical properties.
PND is defined as the percentage of intervention scores that exceed the highest baseline score, or fall below the lowest baseline score, depending on the nature of the intervention effect (Scruggs et al., 1987). Despite its simplicity, PND suffers from several limitations as an index of overlap: (a) it ignores all baseline data except for the highest, or the lowest, score; this score can be unreliable and subject to much sampling error (Parker et al., 2007); (b) it is undefined when there are no baseline data; (c) it lacks sensitivity to true overlap, or lack of, when PND reaches its maximum (i.e., 1.00) or minimum (0.00; Parker et al., 2007); and (d) its confidence interval (or precision in estimation) cannot be constructed (Parker et al., 2007).
PNCD is proposed by Manolov and Solanas (2009) to modify PND in the presence of a baseline trend. PNCD is defined as the percentage of detrended intervention scores that exceed the highest detrended baseline score, or that fall below the lowest detrended baseline score, depending on the nature of the intervention effect. Detrended baseline scores are computed according to Equation 2, and detrended intervention scores are computed according to Equation 3 (Manolov & Solanas, 2009). It is worth noting that, when there is no trend, nor autocorrelation, in the entire SCD data, PNCD performs worse than PND (Manolov & Solanas, 2009). Similar to PND, PNCD is undefined when there are no baseline data, and the confidence interval of PNCD cannot be constructed. At present, there has been no published study investigating the sensitivity of PNCD to true overlap, or lack thereof, when it reaches its maximum (i.e., 1.00) or minimum (0.00).
PEM is defined as the percentage of scores in the intervention phase that exceed the median of the baseline scores (Ma, 2006). PEM has the advantage over PND for measuring lack of overlap when it reaches its minimum. Yet PEM is worse than PND in lacking the sensitivity to true overlap, when it reaches maximum (Parker & Vannest, 2009). Confidence intervals for PEM can be constructed using the SD from a binominal distribution (Parker & Vannest, 2009).
PAND is defined as the “percent of all data remaining after removing the minimum number of data points which would eliminate all data overlap between phases A and B” (Parker & Vannest, 2009, p. 357). Its calculation requires minimally 20 data points in an SCD study. PAND is an improvement over PND because (a) PAND uses all data in an SCD study and (b) autocorrelation among scores has little impact on PAND (Parker et al., 2007). Similar to PEM, PAND can measure lack of overlap when it reaches its minimum, but cannot measure true overlap when it reaches its maximum (Parker et al., 2007; Parker & Vannest, 2009). Similar to PEM, the confidence intervals for PAND can be constructed using the SD from a binominal distribution (Parker & Vannest, 2009).
NAP measures the extent to which data collected from one phase (e.g., baseline) dominate the data collected from another phase (e.g., intervention; Parker & Vannest, 2009). NAP is conceptually equivalent to common language (CL), a term coined by McGraw and Wong (1992), as a measure of effect size for continuous data (Peng & Chen, 2014). Vargha and Delaney (2000) extended CL to ordinal data and referred to it as stochastic superiority. NAP is also conceptually equivalent to the area under the Receiver Operating Characteristic (ROC) curve—a plot of true positive intervention rate versus false positive intervention rate, at various thresholds (Kraemer & Kupfer, 2006). Parker and Vannest (2009) suggested three methods for estimating confidence intervals for NAP: (a) asymptotic confidence interval for the area under the ROC curve in NCSS (Hintze, 2006), SPSS, Stata (Newson, 2000), StatXact (Mehta & Patel, 2001), or SAS; (b) robust methods proposed by Wilcox (1996) in Minitab (Moore, McCabe, & Evans, 2005) or S-Plus (Snow, Chihara, & Hesterberg, 2005); or (c) methods proposed by Vargha and Delaney (2000) that have not been implemented in statistical software. Even though NAP uses all scores in an SCD study, its calculation and interpretation assume that scores from two phases are independent, which often are not the case in SCDs.
IRD is defined as the difference in the improvement rate (IR) between the baseline and the intervention phases (Agresti, 2002; Parker et al., 2009). The IR of the intervention phase is the number of scores in the intervention phase exceeding all baseline scores, divided by the total number of intervention scores. The IR of the baseline phase is the number of scores in the baseline phase at or exceeding any score in the intervention phase, divided by the total number of baseline scores. IRD is alternatively called the risk reduction, or the risk difference. The confidence interval for IRD can be obtained from “proportion statistics” (Parker et al., 2009, p. 139) or “risk analysis” (Parker et al., 2009, p. 139) in most statistical packages. Similar to PAND and PEM, IRD is inappropriate to document the overlap effect when it reaches its maximum (Parker et al., 2009).
Tau-U is derived from Kendall’s Tau and Mann–Whitney U (Parker et al., 2011). Kendall’s Tau measures “tendency for scores to improve over time” (Parker et al., 2011, p. 288) by applying Kendall’s rank correlation to compare scores within a phase in a time forward direction. The Mann–Whitney U measures overlap between two groups. Parker et al. (2011) showed the functional equivalence of Kendall’s Tau and Mann–Whitney U and proposed a way to consider nonoverlap and trend simultaneously when assessing an intervention effect. Trend in SCDs may appear either in baseline, intervention, or both phases. Assuming a higher (lower) score is a desired outcome, a positive (negative) trend in the baseline phase may imply a naturally occurring improvement without the intervention. Likewise, a positive (negative) trend in intervention may imply an improvement continuing from the baseline phase, not necessarily caused by the intervention. Thus, researchers need to control for, or remove, the effects due to a naturally occurring trend in the baseline phase when assessing an intervention effect. Likewise, such an assessment should incorporate a continuing trend from the baseline to the intervention phase. Tau-U was proposed to control for both of these data tendencies.
Tau-U’s sampling distribution is Kendall’s S distribution. It approaches a normal distribution when the total number of scores is equal to or greater than 10. When the total number of score is smaller than 10, the significance test of Tau-U is based on a permutation of scores, namely, a permutation test. Four different Tau-Us can be derived from a data matrix to assess the nonoverlap between baseline and intervention phases: (a) without considering trends in either phase, (b) with the baseline trend removed, (c) with the intervention trend incorporated, and (d) with the intervention trend incorporated and the baseline trend removed. Researchers who wish to evaluate nonoverlap under one of these four Tau-U conditions can simply partition the data matrix, because S is additive.
In summary, PND lacks sensitivity to a true overlap, or lack of, when it reaches its minimum or maximum, respectively. PEM, PAND, and IRD lack sensitivity to true overlap, or lack thereof, only when they reach their maximum. These three indices still show appropriate sensitivity when they are at their minimum, signaling a total overlap between two phases. Confidence interval estimation is available for all overlap indices, except for PND and PNCD. PAND and NAP are the only two overlap indices that can be converted to Cohen’s d, although assumptions of normality and equal variance held by Cohen’s d are questionable for SCD data. PNCD and Tau-U are the only overlap indices discussed in this article that can account for trends in either baseline or intervention phase.
Consistency of Data in Similar Phases
According to the WWC Handbook, Consistency of data in similar phases involves looking at data from all phases within the same condition . . . and examining the extent to which there is consistency in the data patterns from phases with the same conditions. The greater the consistency, the more likely the data represent a causal relation. (IES, 2013, p. E.6)
The consistency of data in similar phases can be examined by visualization or trend in same phases. For example, if statistically significant trends are found in all intervention phases, but not in baseline phases, a researcher can claim that data show consistency in similar phases.
Computing Tools for Implementing the WWC Standards
To search for specialized computing tools suitable for examining SCD data, we used a two-step strategy. First, we used a long search string, (“single subject” OR “single case” OR “n of 1”) AND (“analysis” OR “software” OR “package” OR “statistical”), to search for articles with these key words in public databases: PsycINFO, Web of Science, Medline, and ERIC. Second, we manually searched for computing tools referenced in published SCD methodology articles. Three inclusion criteria were used in the final compilation of computing tools: (a) computing tools must be specifically designed for SCDs; (b) information on the tools, such as computing codes or algorithms, or user’s guide or help menu for a web-based calculator or a graphic interface, must be provided; (c) references for the type of statistic computed must be provided. The computing tools we located are summarized in Table 1. In addition to these specialized tools, we considered two general-purpose statistical software (SAS and SPSS) and one programmable language (R) for comparison purposes. They were altogether six types of computing tools that were identified as suitable to implement the WWC standards: web-based calculators, downloadable specialized software, Excel package for randomization tests, SAS procedures (SAS Institute, Inc., 2014), SPSS commands (IBM Corporation, 2014) or Macros, and the R programming language. Except for SAS and SPSS, all tools are free. The aim of the search for computing tools is to implement the WWC standards, not to exhaust all the possible computing tools for SCD. Other computing tools (e.g., SIMSTAT) that are excluded from our search criteria may also be useful in SCD studies. Table 1 matches these tools with the six WWC standards to assist researchers in making an informed selection of a computing tool. Two of the tools are developed specifically for performing single-case randomization tests. Except for these two, we tested each computing tool for accuracy and treatment of missing data, using a data set published in Lambert, Cartledge, Heward, and Lo (2006). We used another hypothetical data set published in Koehler and Levin (1998) to test the two tools developed specifically for randomization tests. We describe the two data sets below, followed by the description of each computing tool. The results obtained from each of the computing tools are available at https://iu.box.com/shared/static/f2o18qryc0sb1lzyu7drjx0efe8ieb6m.docx
Computing Tools for Implementing Six Standards in WWC.
Note. The information presented is updated as of August 25, 2014. WWC = What Works Clearinghouse; SCR = single-case research; SMA = simulation modeling analysis; ExPRT = Excel package of randomization tests; SLC = slope and level change; MPD = mean phase difference; PNCD = percentage of nonoverlapping corrected data.
RegRand is accessible from http://www.matt-koehler.com/regrand/
SCR is accessible from http://www.singlecaseresearch.org/
SMA can be downloaded at http://www.clinicalresearcher.org/software.htm
ExPRT is accessible from http://code.google.com/p/exprt
Slope and Level Change in SAS/IML is available at Appendix C of Solanas, Manolov, and Onghena (2010).
Page test in SAS (Peng & Chen, 2015, in press).
DHPS Macro can be downloaded at http://faculty.ucmerced.edu/wshadish/software/software-meta-analysis-single-case-design
RcmdrPlugin.SCDA Package can be installed in R by typed install.packages(“RcmdrPlugin.SCDA”) in the R Console.
Scdhlm Package is accessible from http://blogs.edb.utexas.edu/pusto/software/
MPD can be downloaded at https://dl.dropboxusercontent.com/s/nky75oh40f1gbwh/MPD.R
RcmdrPlugin.SLC can be installed in R by typed install.packages(“RcmdrPlugin.SLC”) in the R Console.
SSD for R can be installed in R by typed install.packages(“SSDforR”) in the R Console.
R scripted written by Kevin Tarlow can be downloaded from http://ktarlow.com/stats/R/Tau.R
PNCD by Manolov is accessible from https://dl.dropboxusercontent.com/s/8revawnfrnrttkz/PNCD.R
The Lambert Data Set
The Lambert data set consisted of the number of intervals in which a disruptive behavior was recorded from each of nine students observed in the baseline (the single-student responding [SSR]) phase and the intervention (the response card [RC]) phase during the teacher’s instruction (Lambert et al., 2006). A disruptive behavior such as “engaging in a conversation during teacher-directed instruction, provoking others, laughing or touching others” was recorded in 10 intervals of a study session (Lambert et al., 2006, p. 89). Their study used a reversal (or an ABAB) design with two baseline phases (SSR1 and SSR2) and two intervention phases (RC1 and RC2). The dependent variable was the number of intervals in which at least one disruptive behavior was observed and recorded, with 10 as the maximum. Figure 1 presents the findings reproduced from Lambert et al.’s (2006, pp. 93-94) article with permission. Using visual analyses alone, Lambert et al. concluded that the use of report cards was successful in decreasing disruptive behaviors for these nine students.

Number of intervals of disruptive behaviors during SSR and RC condition.
The Lambert et al. (2006) data were recently reanalyzed in five articles published in a special issue of Journal of School Psychology (JSP; Volume 52, Issue2, 2014) and by Peng and Chen (2015, in press) to demonstrate alternative ways of analyzing SCD data, beyond the initial visual analysis. The analyses published in the special issue of JSP were complex, relying on statistical models (such as the hierarchical linear modeling) or methods (such as the Bayesian approach) that can be difficult to conceptualize or implement by practitioners not trained for these methodologies. Furthermore, breaks in Figure 1, due to student absences (Lambert et al., 2006), were ignored in the JSP analyses. In contrast, Peng and Chen’s (2015, in press) approach was aligned with the WWC standards and it treated student absences rightfully as missing data, consequently preserving the design structure for each participant throughout the analysis. Thus, we decided to follow Peng and Chen’s approach by aligning the computing tools with the WWC standards and evaluating each tool on the basis of accuracy and its treatment of missing data. Accuracy was noted if the results obtained from a computing tool matched the definition of the statistics that was originally proposed. If a computing tool can examine two or more data features, it was evaluated for its multiple features.
The Koehler and Levin Data Set
The Koehler and Levin data set is a hypothetical data set presumably gathered from a multiple-baseline design (Table 2). The Koehler and Levin data further assume that an intervention is implemented in three classrooms with 10 data points collected from each classroom, considered as a study unit. The designated starting intervals for intervention are 2 and 3 for Classroom 1, 5 and 6 for Classroom 2; and 8 and 9 for Classroom 3. Out of these designated starting intervals, Interval 3 was randomly selected as the actual starting point for Classroom 1, 6 was for Classroom 2, and 8 for Classroom 3. Using RegRand, described next, and permutation tests, Koehler and Levin (1998) concluded that the intervention was associated with higher performance (raw mean difference = 3.43, p = .021).
Koehler and Levin Hypothetical Data.
Note. Shaded areas denote designated intervals of the intervention start point. Bordered numbers indicated actual intervention start points.
RegRand
RegRand is a web-based calculator, available from http://www.matt-koehler.com/regrand/. RegRand assesses level/level change in regulated randomization multiple-baseline designs (Koehler & Levin, 1998). Level changes in RegRand are assessed by permutation tests on mean differences between the baseline and the intervention phases. Our tests showed that RegRand was accurate. We could not find an explicit reference to the missing data treatment from Help of RegRand. RegRand cannot handle missing values. If missing values in the data were ignored, consequently, the number of data points was not the same across participants, the calculator returned an error message.
The Simulation Modeling Analysis (SMA)
SMA is a free downloadable software, available from http://www.clinicalresearcher.org/software.htm. SMA analyzes data collected from AB, ABAB, and ABC designs, where A stands for baseline, and B and C stand for two interventions. SMA can be used to assess level/level change, trend, and immediacy of the effect. To assess a level change, SMA computes Person’s r, Spearman’s rho, and partial correlations. To assess a trend, SMA includes five built-in slope models. In these assessments, SMA computes the autocorrelation coefficient in time-series data with n < 30 per phase. The default in SMA is the lag-1 autoregressive model (Borckardt & Nash, 2014; Borckardt et al., 2008). In statistical significance testing, SMA uses the bootstrapping technique to determine the significance level. SMA generates data graphs to assist researchers and practitioners in determining immediacy of the intervention effect as well as consistency of data in similar phases. Our test of the descriptive statistics yielded by SMA showed that when missing data were present, the results of SMA were similar to those obtained as if missing values were entered as 0. We could not find an explicit reference to the missing data treatment from the user’s guide.
ExPRT
ExPRT is an Excel package of randomization tests developed by (Gafurov & Levin, 2014a, 2014b), available from http://code.google.com/p/exprt/. According to Gafurov and Levin (2014a), after downloading ExPRT, “user must go into ‘Trusted Locations’ in Microsoft Office 2010 Excel’s Trust Center (File/Options/Trust Center/Trust Center Settings . . ./Trusted Locations) and add the microcomputer path location that contains the ExPRT programs” (p. 2) to execute the program. ExPRT assesses the level/level change, trend, variability, and immediacy of the effect in AB, ABA, ABAB, and multiple-baseline designs. Specifically, ExPRT conducts randomization tests on the means to assess level change, on the linear slope to assess trend change, and on the sample variance, defined with n in the denominator, to assess variance change (Levin, Ferron, & Gafurov, 2015). Furthermore, ExPRT computes a standardized mean difference for each participant, according to Busk and Serlin (1992). This participant-level standardized mean difference is used to assess that level change for each participant. For each phase in an SCD study, ExPRT generates data graphs to assist researchers and practitioners in determining immediacy of the intervention effect as well as consistency of data in similar phases. Either raw or standardized data can be used in ExPRT. Our tests showed that ExPRT was accurate. ExPRT accepts different lengths of phases for different participants. Out tests showed that ExPRT was accurate. It omitted sessions or intervals in which a missing score occurred. Yet, missing data are not allowed at the randomized start point by any participant (Gafurov & Levin, 2014a).
SAS
PROC MEANS, PROC TTEST, PROC UNIVARIATE
These three procedures can be used to assess level and level change by computing means, or testing means or mean differences using a t test. These three procedures can also be used to assess variability by computing SD and variance. PROC UNIVARIATE can further compute median, mode, quantiles, and percentiles for data in each phase. PROC TTEST can be used to assess level change between adjacent phases as well as consistency of data in similar phases. For example, in Lambert et al.’s A and B data sets where nine students from two classrooms were observed in both baseline and intervention phases, independent-sample t tests can be conduced to each phase between means of sets A and B to assess consistency of data in similar phases. Our tests showed that these three procedures were accurate; they omitted missing data and conduced analyses based on available data only. For example, if there is a missing score for a participant at the third session, out of 10 sessions, PROC MEANS, PROC TTEST, and PROC UNIVARIATE analyze the scores from Sessions 1 through 2 and from Sessions 4 through 10.
Slope and level change (SLC)
The SLC program written in SAS/IML is a computing tool authored by Solanas et al. (2010), available from Appendix C of Solanas et al. SLC computes level change and slope change after removing the baseline trend. The SLC also graphs the detrended data. Thus, SLC also facilitates researchers or practitioners in determining immediacy of the intervention effect as well as consistency of data in similar phases. Our tests showed that the SLC program was accurate; it did not allow missing data. When missing values were entered in the data set as a period, an error message would appear in the log file and no results would return. An R script for the same SLC analyses is introduced later under RcmdrPlugin.SLC.
PROC REG
This procedure assesses trends by fitting a least squares regression line to data in one or more phase(s). This procedure also assesses variability by computing the standard error of least squared regression coefficients. PROC REG assess consistency of data in terms of trends in similar phases, whether it is baseline or intervention. Our tests showed that PROC REG was accurate. It omitted sessions or intervals in which a missing score occurs, in the same manner as PROC MEANS.
PROC CORR
This procedure can be used to assess a monotonic trend in ranked data in one or more phase(s), using the Page test (Peng & Chen, 2015, in press). PROC CORR can also be used to assess a linear trend in one or more phase(s), using Pearson product–moment correlation coefficient. Our tests showed that PROC CORR was accurate. It omitted sessions or intervals in which a missing score occurs, in the same manner as PROC MEANS.
Page test in SAS
The Page test program assesses monotonic trends in data collected from ABAB designs (Peng & Chen, in press). Our tests showed that the Page test program was accurate. Regarding missing data, the Page test program uses a conservative single imputation method to impute the missing values. This single imputation method imputes missing data in such a way so as to support a null hypothesis of no intervention effect. Hence, if the Page test statistic derived from this single imputation method rejects the null hypothesis at a predesignated alpha level, a researcher has confidence to conclude that the null hypothesis would have been rejected with other methods of imputing the missing data.
SPSS
EXAMINE, T-TEST
These two commands can be used to assess level and level change in data by computing means, medians, or t test of means or mean differences. These two commands can also be used to assess variability by computing SD and variance. T-TEST can be used to assess consistency of data in similar phases. For example, in Lambert et al.’s A and B data sets where nine students from two classrooms were observed in baseline and intervention phases, independent-sample t tests can be conduced to each phase between means of sets A and B to assess consistency of data in similar phases. To execute EXAMINE command via the pull-down menu, researchers can follow the path: Analyze → Descriptive Statistics → Explore. To execute T-TEST command via the pull-down menu, researchers can follow the path: Analyze → Compare Means → One-Sample T TEST/Independent-Samples T Test/Paired-Samples T Test. Our tests showed that these two commands were accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
DHPS Macro
DHPS is a SPSS Macro, accessible from http://faculty.ucmerced.edu/wshadish/software/software-meta-analysis-single-case-design. A SPSS graphical user interface for the macro can also be downloaded from the same link. DHPS Macro computes HPS d (Hedges et al., 2012, 2013; Shadish et al., 2014) to assess a level change for multiple-baseline designs and reversal designs. Our tests showed that the DHPS was accurate. The user guide of DHPS discusses different ways to code the session number for missing values.
REGRESSION
This command can be used to assess trends by fitting a least squares regression line to data in one or more phase(s). This procedure also assesses variability by computing the standard error of least squared regression coefficients. REGRESSION assesses consistency of data in terms of trends in similar phases, whether it is baseline or intervention. To execute the REGRESSION command via the pull-down menu, researchers can follow the path: Analyze → Regression → Linear. Our tests showed that REGRESSION was accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
CORRELATIONS
This procedure can be used to assess a monotonic trend in ranked data in one or more phase(s), using the Page test (Peng & Chen, 2015, in press) This procedure can also be used to assess a linear trend in one or more phase(s), using Pearson product–moment correlation coefficient. To execute the CORRELATIONS command via the pull-down menu, researchers can follow the path: Analyze → Correlate → Bivariate. Our tests showed that CORRELATIONS was accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
The Single-Case Research (SCR) Website
The SCR website is a web-based calculator, accessible from http://www.singlecaseresearch.org/. To assess trend for a phase and to compare trends from two phases, the SCR website computes the Theil–Sen Slope (Sen, 1968; Theil, 1950). Significance tests of the Theil–Sen Slope are provided based on Kendall’s Tau. To evaluate overlap, the SCR website computes NAP, IRD, and Tau-U. To evaluate consistency of data in terms of trends in similar phases, whether it is baseline or intervention, researchers can examine the Theil–Sen Slope. Our tests showed that the SCR was accurate. It ignored missing values because the results were the same with missing or without missing values in the data set.
R
R is a free integrated software environment for statistical computing and graphics, available from http://CRAN.R-project.org. The acronym, CRAN, stands for the Comprehensive R Archive Network. R compiles and runs on a wide variety of UNIX platforms, Windows, and MacOS. During the installation of R, a set of core packages (e.g., stats and base) are installed. The various analysis functions are stored in R’s packages. Once R is installed, users can type install.packages(“ ”) with the R package title in the double quote in the R Console to install additional packages. In the subsequent descriptions of R functions suitable for single-case data analysis, the function name (e.g., t.test) appears before the package name (e.g., stats) in brackets, such as t.test {stats}.
t. test {stats}, mean {base}, median {base}, summary {base}
Four functions (i.e., t.test, mean, median, and summary) stored in two packages (i.e., stats and base) can assess level and level change by computing means, medians, or testing means or mean differences using a t test. The summary function can compute median, mode, quantiles, and percentiles for data in each phase. Our tests showed that these four functions were accurate. By default, mean and median functions cannot handle missing values; the summary function returns descriptive statistics with available data only; and the t.test function performs tests based on available data only.
lm {stats}
The lm function, from the stats package, can assess trends by fitting a least squares regression line to data in one or more phase(s). This function also assesses variability by computing the standard error of least squared regression coefficients. It can further assess consistency of data in terms of trends in similar phases, whether it is baseline or intervention. Our tests showed that this function was accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
cor. test {stats}
The cor.test function, from the stats package, can assess a monotonic trend in ranked data in one or more phase(s), using the Page test (Peng & Chen, 2015, in press). The cor.test function can also assess a linear trend in one or more phase(s), based on Pearson product–moment correlation coefficient. Our tests showed that this function was accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
RcmdrPlugin.SCDA
SCDA stands for single-case data analysis. The command of install.packages(“RcmdrPlugin.SCDA”) can be typed in the R Console to install this package. This package integrates three R packages: SCVA (for single-case visual analysis), SCRT (for single-case randomization tests), and SCMA (for single-case meta-analysis).
SCVA enables researchers to visualize (a) levels based on mean, median, broadened median, trimmed mean, or m-estimator; (b) trends based on the least squares regression line, the split-middle method, or the resistant trend line fitting; (c) variability based on range lines, range bar graph, or trended ranges; and (d) immediacy of the intervention effect as well as consistency of data in similar phases based on data graphs. SCRT enables researchers to examine level/level changes by two randomization tests (“systematic” or “Monte Carlo”) of mean differences between the baseline and the intervention data (Bulté & Onghena, 2013). Systematic (also called exhaustive) randomization test calculates the test statistics for all possible permutations, whereas Monte Carlo (also called nonexhaustive) randomization test calculates the test statistics on a random sample. Monte Carlo randomization test is recommended to use, when it is not feasible to obtain the test statistic for all possible permutations. SCMA evaluates the level change based on standardized mean differences or the pooled standardized mean difference (Bulté & Onghena, 2013). To assess overlap, SCMA computes NAP or PEM. Our tests showed that the SCVA, SCRT, and SCMA were accurate in their computations. We could not find an explicit reference to the missing data treatment in Bulté and Onghena (2013). When there were missing values in the data, the results would be not available.
The entire RcmdrPlugin.SCDA package runs in a menu-driven environment called the R commander (Fox, 2005). Similar to SPSS or Excel, the R commander includes several menus (e.g., File, Data, Statistics, Graphs) and three windows: script, output, and messages. Commands generated by the R commander appear in the script window. These commands, along with the output, appear in the output window. Error messages, warnings, and notes appear in the message window (Fox, 2005). The RcmdrPlugin.SCDA is a plug-in package (Fox, 2007), written and maintained d by I. Bulté and P. Onghena specifically for SCD research. Another plug-in package suitable for SCD research, RcmdrPlugin.SLC, is introduced later.
g_REML {scdhlm}, effect_size_MB {scdhlm}, effect_size_ABk {scdhlm}
The scdhlm package is available from http://blogs.edb.utexas.edu/pusto/software/. The g_REML, effect_size_MB, and effect_size_ABk functions can assess a level/level change by estimating a design-comparable standardized mean difference ES for multiple-baseline designs (Pustejovsky et al., 2014), HPS for multiple-baseline designs (Hedges et al., 2013), and HPS for the ABk designs (Hedges et al., 2012), respectively. Our tests showed that the scdhlm was accurate. Users may refer to the users’ guide of DHPS for ways to deal with missing data.
MPD
MPD is a R script, available from https://dl.dropboxusercontent.com/s/nky75oh40f1gbwh/MPD.R. It assesses a level change based on MPD (Manolov & Solanas, 2013). Furthermore, the MPD script graphs raw data as well as the projected baseline trend. Our tests showed that the MPD in R was accurate. When there were missing values in the data, the results would be not available.
RcmdrPlugin.SLC
The command of install.packages(“RcmdrPlugin.SLC”) can be typed in the R Console to install this package. This package computes level changes and slope after controlling for a baseline trend proposed by Solanas et al. (2010). The package also graphs the raw data and the detrended data. Thus, RcmdrPlugin.SLC also facilitates researchers or practitioners to determine immediacy of the intervention effect as well as consistency of data in similar phases. Our tests showed that the RcmdrPlugin.SLC package was accurate. When there were missing values in the data, the results would be not available.
R script by Kevin Tarlow
The R script written by Kevin Tarlow is available from http://ktarlow.com/stats/R/Tau.R. The script can assess overlap based on Tau-U for each phase and for comparing two phases with or without controlling for trends in an AB design. Before executing this script, a researcher needs to load the Kendall package for R into the computer. Our tests showed that R by Kevin Tarlow was accurate. When there were missing values in the data, an error message would appear and the results for phases without missing values would be changed as well. We concluded that R script by Kevin Tarlow could not handle missing values.
PNCD by Manolov
PNCD by Manolov is a R script available from https://dl.dropboxusercontent.com/s/8revawnfrnrttkz/PNCD.R. PNCD by Manolov can be used to access overlap based on PNCD (Manolov & Solanas, 2009) in AB designs. When there were missing values in the data, the results would be not available.
SSD for R
The command of install.packages(“SSDforR”) can be typed in the Console to install this package. The functions of SSD for R graph data with the options of adding reference lines (e.g., a mean line, 1 SD above and below the mean lines) for assisting visual analysis, and it also compute statistics to quantify intervention effect. Specifically, SSD for R enables researchers to assess (a) levels based on the mean, median, trimmed mean (10% of the lowest and highest outcome scores are excluded), and standardized mean differences as well as the pooled standardized mean difference; (b) trends based on the least squares method; (c) variability based on standard deviation, range, interquartile range, and box plots; (d) immediacy of the intervention effect as well as consistency of data in similar phases based on data graphs; and (e) overlap based on PND, PAND, PEM, and IRD. Our tests showed that SSD for R was accurate. It omitted sessions or intervals in which a missing score occurred, in the same manner as PROC MEANS in SAS.
Discussion
This article focuses on the standards of analyzing SCD data to determine a functional relationship between the intervention and the outcome behavior, as outlined in the Institute of Education Sciences’ publication, What Works Clearinghouse: Procedures and Standards Handbook (IES, 2013). According to this publication, a functional relationship between an intervention and the outcome behavior should be demonstrated in six data features: level/level change, trend, variability, immediacy of the effects, overlap, and consistency of data in similar phases. These six data features are collectively referred to as the WWC standards in this article.
To help researchers and practitioners examine the six data features under the WWC standards, we discussed a wide range of analysis strategies and indices that have been proposed in the literature. We further provided a comprehensive list of computing tools that can carry out these analyses or compute the indices (Table 1). A majority of the computing tools are free web-based calculators or downloadable software/algorithms/macros, specifically programmed for SCD data analysis. Others are part of SPSS and SAS packages, or the R programming language. We evaluated each tool for accuracy and its treatment of missing data, using two published data sets from Lambert et al. (2006) and Koehler and Levin (1998). Our evaluations revealed that all tools were accurate for their intended purposes. Regarding the treatment of missing data, most tools in R could not handle missing values. Others often omitted the missing data and yielded results based on available data only. The user’s guide of ExPRT and SPSS DHPS Macro and the Page test in SAS explicitly indicated or suggested ways to deal with missing data. The Page test in SAS used a conservative single imputation method to replace the missing score with an observed score in the same phase so as to support the null hypothesis. None of the tools used the principled multiple-imputation method or the expectation–maximization method to deal with missing data.
Given the important contributions made by SCD research toward evidence-based practices, it is imperative that SCD research be conducted at the highest level of rigor to yield credible and creditable results. In the past, visual analysis has dominated approximately 90% of published SCD studies (Parker & Hagan-Burke, 2007). The subjectivity associated with visual analysis and its lack of a framework for testing a scientific hypothesis renders the generalization and synthesis of SCD findings difficult, if not impossible. The many quantitative methods, indices, and computing tools reviewed in this article can complement visual analysis in determining a functional relationship between an intervention and an outcome behavior, at both the participant’s level as well as the group’s level. Under the WWC standards, level/level change and trend are two features that can be assessed by any type of computing tools shown in Table 1. Both SSD for R and RcmdrPlugin.SCDA can assess all six WWC data features. The largest number of tools are developed in the R environment. For people who are not familiar with a programming language, the plug-in packages for the R Commander (e.g., RcmdrPlugin.SCDA) increases the usability of R. Because the WWC standards encourage the demonstration of all six data features, we recommend the development of an inclusive computing tool or environment in which all data features of the WWC standards can be assessed.
A second recommendation is for additional research on the treatment of missing data. Due to the repeated observation of an outcome behavior in SCD setting, the occurrence of missing data is typical in such studies. The current practice of assuming no missing data (e.g., RegRand), omitting missing sessions or intervals and yielding results based on available data only (e.g., PROC MEANS in SAS), replacing missing values with 0 (e.g., SMA), or the single imputation (e.g., Page test in SAS) wastes information already collected, misrepresents the results, or is deemed too conservative. Newer and more principled methods, such as multiple-imputation, expectation–maximization methods need to be applied to SCD to help retain the richest information possible.
Finally, we would like to recommend that the WWC standards be interpreted more flexibly, in light of SCD characteristics and the nature of measurements. For example, level or level change can be inferred from a median, trimmed mean, or percentiles, in addition to the change in means. Trend can be inferred from a nonlinear model fit to data, as well as a straight line fit. Variability can be inferred from interquartiles, as well as from range or SD. Immediacy of the effect can be inferred from more than three data points from each phase. It is our hope that future research in the three areas identified above will likely advance the credibility of SCD studies and findings.
Footnotes
Appendix
This appendix provides details for computing HPS d in AB designs for the same number of time points at baseline and intervention. Hedges, Pustejovsky, and Shadish (2012) defined the statistical model for each participant in baseline and intervention phases. The statistical model for the baseline phase is
where Yij is the jth score of the ith individual, i = 1, . . . , m (the number of participants), j = 1, . . . , n (the number of time points at Baseline or at Intervention or the phase length of baseline/intervention),
The statistical model for the intervention phase is
where j = n + 1, . . . , 2n.
It is assumed that (a) the individual level effects of
HPS d is used to estimate its parameter δ
The HPS d is the standardized average level change, defined as
where
and
The variance of
where bp and cp are functions of φ and the phase length n. Equations for bp and cp are
and
The variance of S2 is
where
The sampling distribution of S2 is approximately a chi-square distribution with ν degrees of freedom, and ν is given by
HPS d is a constant θ times a random variable with noncentral t distribution with ν degrees of freedom, where θ is
Followed the results in Hedges (1981), the bias in HPS d can be corrected by multiplying HPS d by the factor
Hence, the bias-corrected HPS d is as follows:
Bias-corrected HPS d =
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the Maris M. Proffitt and Mary Higgins Proffitt Endowment Grant of Indiana University, awarded to the second author while the first author worked on the project as a research associate.
