Abstract
Momentary time sampling (MTS), whole interval recording (WIR), and partial interval recording (PIR) are commonly used in applied research. We discuss potential difficulties with analyzing data when these systems are used and present results from a pilot simulation study designed to determine the extent to which these issues are likely to be problematic in the context of single case design studies. Results indicate that WIR and PIR may result in invalid effect size estimations. Although MTS more closely paralleled actual duration, it may induce variability in relatively short sessions, increasing the likelihood of Type II errors. Suggestions for practitioners, consumers, and researchers include careful use and reporting of data collected using interval-based systems and continued investigation of properties of these systems, particularly on the effects on effect size estimations.
In the field of special education, researchers and practitioners commonly use direct observational measurement in the context of single case designs to evaluate change over time in response to an intervention (Odom et al., 2005). Using these designs, a single participant’s behavior is evaluated during an intervention condition in comparison with behavior in a baseline condition, using several features that allow for demonstration of experimental control (e.g., rapid iteration or time-lagged introduction of conditions; Kennedy, 2005). Single case research designs are commonly selected because they allow repeated measurement across experimental conditions and, as such, permit researchers and practitioners to formatively evaluate a participant’s response to a specific treatment or instructional context, unlike non-observational measurements (e.g., traditional paper-and-pencil tests or other standardized instruments) that typically answer summative research questions (Gast & Ledford, 2014; Kazdin, 2011; Wolery & Harris, 1982).
When observing and recording each occurrence of a behavior is not feasible, researchers and practitioners are likely to select interval-based estimation procedures. A recent review found 60% of single case studies including young children with disabilities used interval-based measurement (in articles published in 2008–2012 in Education and Treatment of Children, Journal of Early Intervention, Journal of Positive Behavior Interventions, & Topics in Early Childhood Special Education; Lane & Ledford, 2014). This review reported that interval sizes between 5 and 30 s (M = 12; median = 10) are often used and that partial interval recording (PIR) is the most frequently used system. This conclusion is specific to journals selected and the focus on young children but may be an accurate estimate of interval use in special education research.
Using Interval Systems
The assumption behind interval recording is that responses occurring in an interval can be considered estimates of the extent to which the behavior actually occurred. If an interval-based system is used, knowing when to record an occurrence depends on the interval system selected:
Momentary time sampling (MTS): An occurrence is marked when the behavior is occurring when the interval ends.
Partial interval recording (PIR): An occurrence is marked when the behavior occurs any time during the interval.
Whole interval recording (WIR): An occurrence is marked when the behavior occurs for the entire interval duration.
Interval-based measurement systems have been used more commonly to estimate duration but can also be used to estimate count. For example, if a behavior occurs in 25 of 100 intervals, results can be interpreted as: (a) the behavior occurred for approximately 25% of the time (25% of intervals; duration) or (b) the behavior occurred 25 times during the session (count). Previous arguments have been made against using intervals for measuring count (also referred to as frequency; Mann, Have, Plunkett, & Meisels, 1991; Repp, Roberts, Slack, Repp, & Berkler, 1976) due to potential inaccuracies, but it is unclear to what extent inaccuracies occur based on interval length and the number of times a behavior occurs in a session.
Potential Problems With Interval Systems
Despite the wide use of interval-based systems, several problems have been discussed or analyzed in previously published studies or are considered as established tendencies: (a) PIR may overestimate actual behavior occurrence and WIR may underestimate actual behavior occurrence (cf. Mann et al., 1991; Powell, Martindale, Kulp, Martindale, & Bauman, 1977); (b) MTS may not provide accurate estimates of counts for behaviors that are of “short” duration (cf. Alvero, Struss, & Rappaport, 2007; G. Murphy & Goodall, 1980); (c) Estimates of occurrence vary widely based on the system chosen (cf. Alvero et al., 2007) and by interval size (Brulle & Repp, 1984); and (d) Limited empirical support exists for determining appropriate interval size, even though commonly cited rules exist (e.g., intervals should be the same size or smaller than duration per occurrence [DPO]; cf. Cooper, Heron, & Heward, 2007; Kennedy, 2005).
In addition to these more widely discussed issues, several other potential difficulties exist that have not been addressed or that have received limited attention. First, the comparison of data collected via interval-based systems has often been analyzed in terms of mean performance, ignoring session-by-session performance (and across-session variability), which is crucial in single case design. Second, the recommendation to use interval sizes based on DPO and/or inter-response time does not take into consideration that these characteristics that are likely to change between conditions. Third, common knowledge regarding over- and underestimates by PIR and WIR fail to take into consideration the degree to which estimates may vary by condition (e.g., greater overestimation during an intervention condition), which is particularly problematic when considering the increasingly common use of effect sizes. Finally, session lengths used in previous studies may reflect duration uncommon in special education research. For example, Wirth, Slaven, and Taylor (2014) found that increasing observation lengths from 1 hr to 8 hr systematically decreased measurement error when interval-based systems were used. However, an informal review of studies in recent issues of journals commonly including single case studies found session lengths ranged from 5 to 20 min. Thus, even the minimum session length used in some studies designed to evaluate the accuracy of interval-based systems may not accurately reflect practice.
Based on these potential problems, studies are needed that determine the accuracy of interval-based systems during the relatively short sessions common in behavioral research (e.g., 10-min rather than multi-hour sessions) and in the context of single sessions, rather than as summative estimations. It is important that estimates are accurate on a session-by-session basis because in the context of single case designs, decisions (e.g., condition change decisions) are made on the basis of visual analysis of sessions, not on means across conditions. In addition, consideration of the nature of single case research designs is needed when recommendations are made; for example, if the mean DPO is known for a behavior in baseline, it is possible to choose an interval that is shorter than that duration. However, this seems insufficient, given that behavior change between conditions is likely (or at least hoped for) and current suggestions do not take differences between conditions into account.
To understand the nature of data collected using interval systems in the context of single case designs, a pilot simulation study was conducted. The purpose was to (a) describe the accuracy of MTS, PIR, and WIR in estimating count and duration of a simulated behavior during short sessions; (b) describe differences in accuracy based on interval size and total duration; and (c) plot these estimates in hypothetical single case designs alongside hypothetical actual duration to illustrate potentially problematic error patterns in estimations.
Method
Procedures
To answer the research questions, authors (a) created a spreadsheet, using interval sizes ranging from 2 to 20 s; (b) generated data; (c) calculated estimates for interval-based systems; (d) assessed the accuracy of estimates; and (e) assessed data patterns in the context of hypothetical single case designs. Procedures for all steps are described below and procedures and corresponding screenshots for Steps (a) to (d) are available at http://vkc.mc.vanderbilt.edu/intervals.
Spreadsheet development
The first author prepared a spreadsheet with 601 rows and 13 columns. The first row indicated the interval size for each column; each subsequent row was numbered representing 1 s of a 10-min session. One column was arranged with 2-s intervals by drawing borders around each set of two 1-s rows throughout the column. This was repeated to generate columns for 12 other interval sizes between 3 and 20 s. Columns were used to estimate duration when different interval sizes were used (e.g., to analyze accuracy of each system when interval sizes were smaller, about the same, or larger than the DPO).
Generating data: Consistent DPO
The first and third authors used a random number generator to create data by generating a number between 1 and 600. This number represented the onset of a behavior; this second and the five following seconds were identified as an occurrence by highlighting appropriate rows, resulting in a simulated 6-s behavior occurrence. For example, if the number 6 was generated, the rows corresponding to numbers 6 to 11 were highlighted. Because researchers using interval systems rarely report DPO, and DPOs may vary considerably based on target behaviors, the 6-s duration was chosen for two reasons. First, the number resulted in easy calculations (e.g., 1 occurrence = 1% of intervals). Second, it provided sufficient potential for variability in a 10-min session (e.g., if 1-min DPOs had been used, a single occurrence would account for a large percentage of the session). After the first 6-s occurrence was generated, another number was identified, with the stipulation that occurrences could not overlap (e.g., two occurrences always lasted for a total of 12 s) but could be adjacent (e.g., 12 consecutive seconds could be coded as two occurrences). If an occurrence was generated that would have overlapped with one previously generated, authors discarded this occurrence and replaced it with another. Ten sets of data were coded with 10, 20, 30, 40, 50, 60, and 70 behavior occurrences for a total of 70 sets. Each set represented a single 10-min session. Each sheet held one set of data; each column represented the measurement of that set with various interval sizes. Sets were not coded with 80% or 90% total duration because authors could not consistently identify 80 or 90 non-overlapping occurrences. Thus, the actual count of occurrences was between 10 and 70 and the actual duration of occurrences ranged from 10% to 70% of a session.
Generating data: Varied DPO
Because sufficient data were not available to determine whether consistent versus varying behavior durations are differentially estimated with interval systems, we also generated 70 sets of behavior with an average DPO of 6 s, but with individual durations that varied between 2 and 10 s. In these sets, 10% of occurrences had durations of 2, 3, 4, 5, 7, 8, 9, and 10 s; the remaining 20% of occurrences had durations of 6 s. Occurrences were generated as described for consistent DPO data.
Estimating number and duration
After generating the simulated behavior data, authors calculated the number of intervals with behavior occurrences for each interval system and interval size. An occurrence was scored if the final cell was highlighted for the designated interval for MTS (e.g., it was “happening” at the end of the interval), if any cell was highlighted within the designated interval for PIR (e.g., it “happened” at any time during the interval), and if all cells were highlighted within the designated interval for WIR (e.g., it “happened” during the entire interval). The number of intervals during which the behavior was coded as occurring was considered the estimated count. This number was divided by the total number of intervals for each interval size to determine the percentage of intervals counted as occurrences and will be referred to as the estimated duration (e.g., an estimate of the percentage of time a behavior occurred). For each interval size, actual duration, and system, 20 duration estimates and 20 count estimates were produced using data generated in an excel spreadsheet (40 estimates × 13 interval sizes × 7 actual durations × 3 interval systems = 10,920 estimates).
Interobserver agreement
The first author calculated estimates for all data sets, the third author randomly selected 14 sets (10% of total sets; 2 sets for each interval length) and independently determined (a) whether behavior simulation was accurate by counting the number of highlighted cells (actual occurrence), and (b) interval estimations for each set (estimated occurrence). Agreement was 100% for generating the correct actual occurrence and 99.94% for estimated occurrence for each interval system (99.87% for MTS, 100% for PIR, and 99.95% for WIR). A lower than suggested percentage (e.g., 10% rather than 20%) of sets were selected for reliability coding. Given that coding was done on randomly selected sets and was consistently near-perfect, we did not feel additional coding was necessary to increase confidence in the veracity of coding.
Choosing single case designs for analysis
To show the effects of estimates on behaviors measured in the context of single case designs, the first author randomly selected one study using each of three designs (multiple baseline designs, A-B-A-B designs, and alternating treatments designs [ATD]) from a database of studies previously determined to have used interval systems (Lane & Ledford, 2014). Designs were chosen because they are commonly used (Hammond & Gast, 2010) and are often listed as the primary types of single case designs (e.g., What Works Clearinghouse [WWC], 2014). The first author estimated the median data point in baseline and intervention conditions for each of these published articles using printed graphs and a straight edge. Then, she determined the closest duration to that number that had been used in the previously generated data (e.g., rounded to the nearest 10; see above for procedures for generating data). Using these median points, she graphed a stable “actual” duration percentage for all conditions. For example, if the median data point in a baseline condition was at 38% and the median data point in an intervention condition was 63%, the generated graph showed stable responding at 40% during baseline conditions and stable responding at 60% for the intervention conditions. Five data points per condition were generated, consistent with current recommendations for single case research designs (Horner et al., 2005; WWC, 2014). Thus, stable data were predicted using values found in randomly selected published studies, but data were graphed using contemporary standards so conclusions would not be hindered if too few data points existed. After graphs were generated, the researcher randomly selected previously generated estimated data using PIR and MTS and plotted these data (see Figures 3 and 4) for interval sizes of 6 and 12 s. These sizes were systematically chosen because (a) common suggestions include the use of interval sizes that approximate duration of a single behavior occurrence, which was 6 s, and (b) a recent review suggested the mean interval size used in the context of special education research for young children was 12 s (Lane & Ledford, 2014).
Results
Consistent Versus Varied DPO
Estimates generated for consistent 6-s behavior sets and for varied 2- to 10-s behavior sets were not discrepant. Because differences that existed were small (less than 5% average differences and similar ranges) and did not result in divergent conclusions, we combined data for varied and consistent behavior durations for ease of analysis and because of limited publication space.
Accuracy of Interval Systems at the Summary Level
PIR, WIR, and MTS performed poorly for estimating count (see Table 1). MTS estimated accurately when the interval size closely approximated the DPO of the behavior (6 s). WIR never provided accurate estimates, and PIR provided accurate estimates differentially based on interval size and actual counts (10–70 occurrences). For example, when using PIR, choosing an interval size of 8 s would result in accurate estimates when counts were high (n = 70) but estimates of more than 1.5 times the actual value when counts were low (n = 10). Two specific data patterns would result in accurate count estimates: (a) When using PIR, if both count and DPO systematically increased and the initial interval size chosen was much larger (e.g., 3 times), the baseline DPO but was only slightly larger than the intervention DPO, and (b) When using MTS, if DPO remained unchanged and the interval size very closely mirrored DPO. Given that we are unlikely to be able to accurately estimate DPOs for most behaviors within 1 s, these scenarios are not likely to be feasible suggestions for use. Because all systems performed poorly for estimating count, further analysis was conducted with duration estimates only.
Average Estimates of Count for Each System and Interval Size.
Note. Darkly shaded cells represent average estimates within 10% of the actual value, lightly shaded cells represent average estimates within 20% of the actual value. DPO = duration per occurrence.
Estimations of duration varied by system, with patterns for summary-level estimations consistent with previous research, including (a) MTS overestimated duration for some sessions and underestimated duration for others, (b) PIR consistently overestimated duration, and (c) WIR consistently underestimated duration. These summary data are shown in Table 2 and show that, on average, MTS produced estimates that were close to actual behavior duration. Many of the average estimates, across interval size and actual duration, were within 90% of actual duration (87 of 91 estimates, 96%). The remaining estimates were within 80% of the actual duration. In contrast, PIR and WIR were almost never accurate, with only 8% and 7% accuracy within 80% of actual duration; all accurate estimations were calculated with the smallest interval size (2 s). Thus, at the summary level, MTS was highly accurate and PIR and WIR were highly inaccurate.
Average Estimates of Duration for Each System and Interval Size.
Note. Darkly shaded cells represent average estimates within 10% of the actual value, lightly shaded cells represent average estimates within 20% of the actual value. DPO = duration per occurrence.
Accuracy of Interval Systems at the Single Session Level
Accuracy of duration estimates at the single session level can be evaluated in part by looking at the lowest and the highest session estimates (e.g., to answer the question: In any given session, what is lowest or highest estimated duration an interval system might produce given X actual duration?). Individual estimates produced using each interval system are shown in Figures 1 and 2. Comparative analysis of the (a) accuracy of all systems for a given actual duration can be done by comparing graphs in a single row; the (b) degree to which estimates for each system change as the interval size increases can be done by looking left to right within a single graph; and (c) the degree to which estimates for each system change as actual duration increases can be done by analyzing the graphs in a single column.

Estimated behavior duration (10%–40%) by interval system (columns) and actual duration levels (rows).

Estimated behavior duration (50%–70%) by interval system (columns) and actual occurrence (rows).
MTS estimates are considerably closer to actual duration (shown as a thick straight line) for all interval sizes and actual durations. With regard to the degree to which estimates change as interval size increases, PIR and WIR become considerably less accurate, whereas MTS becomes less accurate and more variable. Regarding estimations for different actual duration values as interval size increases, MTS becomes slightly more variable; PIR becomes less accurate, less variable, and has ceiling effects (estimates of 100% duration were common); WIR becomes less accurate, more variable, and has floor effects (estimates of 0% duration were common).
Minimum and maximum estimation values for each system, interval size, and actual duration values (10%–70%) are reported in Tables 3 and 4. These results answer the question of the outer limits of accuracy for each interval system (e.g., how inaccurate a system might be in a single session). MTS estimated more accurately than PIR and WIR, with 71% of minimum estimations within 80% of actual duration compared with 11% accuracy with PIR and 2% accuracy with WIR (see Table 3). MTS performed better with smaller intervals (6 s or smaller = 91% accuracy; 15 s or larger = 33% accuracy) and when actual duration was greater (10%, 20%, or 30% actual duration = 51% accuracy; 40% or greater actual duration = 88% accuracy). Again, PIR and WIR only produced accurate minimum estimations with very small (2–3 s) intervals. Similar results were produced for maximum estimations (MTS = 73% of estimations within 80% of actual values; PIR = 4%; WIR = 11%; see Table 4). In summary, single session estimates using MTS were usually accurate, with overestimations and underestimations; single session estimates using PIR and WIR were rarely accurate and changed based on actual duration.
Minimum Estimates of Duration for Each System and Interval Size.
Note. Darkly shaded cells represent average estimates within 10% of the actual value, lightly shaded cells represent average estimates within 20% of the actual value. DPO = duration per occurrence.
Maximum Estimates of Duration for Each System and Interval Size.
Note. Darkly shaded cells represent average estimates within 10% of the actual value, lightly shaded cells represent average estimates within 20% of the actual value. DPO = duration per occurrence.
The Use of Interval Systems in Single Case Design
To demonstrate the accuracy of MTS and PIR in the context of single case designs, simulated data were randomly selected for A-B-A-B, multiple baseline, and ATDs with actual duration data estimated using median values found in published studies. WIR estimates were not generated due to publication space constraints and because it is rarely used (Lane & Ledford, 2014). Figure 3 shows data for MTS and PIR systems in the context of a multiple baseline design with behavior changing from consistent 10% actual duration in baseline to consistent 40% actual duration during intervention sessions. These data show that 6-s and 12-s MTS estimates were both accurate and consistently showed a functional relation consistent with the data. PIR consistently resulted in estimates that resulted in conclusions of a functional relation, but with inaccurate data estimates (e.g., for the 12-s system, baseline estimates are closer to actual duration during intervention than during baseline). In addition, estimates were more inaccurate for intervention than for baseline sessions. Thus, estimates are likely to overestimate intervention effectiveness for increasing behaviors, a highly undesirable trait from a conservative standpoint and one that makes “effect size” estimates invalid. This same problem is likely to occur when WIR is used for studies in which the research intends to decrease a target behavior. In Figure 4, graphs comparing MTS and PIR for an A-B-A-B design with behavior changes from 40% to 70% actual duration show similar patterns. When 12-s intervals are used, 70% actual duration is consistently estimated at ceiling levels (100%). Also in Figure 4, estimates in the context of an ATD design (with data at 20% and 30% actual duration for two conditions) show larger-than-actual differences between conditions for PIR estimates (increasing the likelihood of Type I errors). However, the data for MTS estimates illustrate a different problem: The variability induced by using MTS might preclude the conclusion that a functional relation exists when small changes exist between conditions.

Effects of using estimated duration data in the context of a multiple baseline design.

Effects of using estimated duration data with an A-B-A-B design (top row) and an alternating treatments design (bottom row).
Discussion
In this article, we described a series of steps taken to further explore the accuracy of interval systems for estimating count and duration and to explore the potential effects of interval-based estimates in the context of a single case design. There are several conclusions. First, for estimating counts, MTS was only accurate when the interval size closely resembled DPO; WIR was rarely accurate; and PIR was differentially accurate, with an interaction between interval size and number of behavior occurrences. Second, for PIR, the common suggestion for using an interval size approximately the same size as the DPO was not supported. For estimating count, larger intervals are needed; however, their use is hindered because accurate interval size varies based on the number of behavior occurrences and DPO (e.g., has the potential to result in differential accuracy between conditions). For estimating duration, very small intervals (1/3 of the DPO) are needed. Third, previous assumptions about the average accuracy of MTS, PIR, and WIR for estimating duration were replicated (e.g., on average, MTS results in estimates that approximate continuous recording; on average, PIR overestimates and WIR underestimates occurrence). However, on a session by session basis, variability in single, short-duration sessions makes MTS less desirable because it may result in the appearance of variability that does not actually exist—this induced variability may preclude decisions of a functional relation when one actually exists (especially when between-condition changes are small). In addition, inaccuracies produced by PIR and WIR may allow for similar decisions on functional relation status when compared with actual duration measurements but may considerably distort the size of the effect (differences between conditions). Finally, inaccuracies of PIR and WIR may make effect size estimations, which are increasingly common and deemed highly desirable by many researchers, invalid (e.g., if the estimates of occurrence are differentially accurate across conditions, any calculation of the “size” of an effect would be invalid).
Limitations
Although this study does address some problems with past research, limitations include (a) use of only 6-s DPO, (b) use of generated rather than actual data, and (c) use of a limited data set to generate data in the context of single case designs. The use of 6-s DPOs are not problematic if data are considered in the context of ratios; for example, data for 6-s intervals would be similar to data for 1-min intervals if a specific target behavior of interest had a DPO of 1-min. The generated data were used for practical reasons (coding 1,400 min of video across 140 sessions and estimating occurrence for each of those minutes with 13 interval sizes would have been prohibitive). Simulated data also offer some advantages to actual data—for example, the ability to manipulate the percentage of the session during which the behavior occurs (see Wirth et al., 2014, for a discussion of the advantages of simulated data). The limited data set used to generate data for single case designs was also the result of practical constraints; we acknowledge that these data may not accurately portray “average” single case design data and suggest additional research is needed. In addition, we did not attempt to elucidate the potential interactions between reliability and accuracy of interval systems, although previous authors have done this (e.g., M. J. Murphy & Harrop, 1994; Rapp, Carroll, Stangeland, Swanson, & Higgins, 2011).
Implications and Suggestions for Practitioners
Data presented in this article suggest interval systems may not accurately estimate behavior occurrence, especially when large intervals are used and when estimates of count are needed. Using small intervals may not decrease response effort when compared with continuous recording; thus, may not be practical. It may be preferable to conduct continuous recordings less often or for shorter sessions rather than using interval systems more often or for longer sessions, although research is needed to decide whether this is sufficiently accurate and feasible. If interval recording is needed for estimating duration, these data suggest MTS should be used (it is also likely the system requiring the least response effort); users should be aware, however, that variable data may result as an artifact of the use of this system. Because the error in data estimated with MTS increases with interval size, we suggest using interval sizes no larger than DPO in baseline if you expect DPO to remain unchanged or to increase, and to use interval sizes no larger than expected intervention DPO if you expect DPO to decrease. Additional research may be needed to determine the number of data points needed for MTS to accurately reflect a generalized tendency in single case data. Based on these data, the number of data points will be larger when interval sizes are larger (relative to DPO) and will be smaller when the interval sizes are smaller (relative to DPO). Until more research is available, it may be prudent to plan to collect data for more rather than fewer opportunities when MTS is used.
Implications and Suggestions for Consumers
When researchers use interval-based methods to estimate occurrence of target behaviors, consumers must exercise caution in interpretation. Analysis can be assisted when consumers consider the relation between the system used, interval size, average DPO, and total duration per session across conditions. However, researchers rarely report average DPO. Although most readers of behavioral research and single case design will have familiarity with the bias involved with interval systems, there is a need to consider biases in the context of visual analysis and the manner in which these under/over estimations may appear on a graph—sometimes far divorced from the true values. The most alarming issue from a consumer perspective is the possibility of increased Type I error resulting from overestimation of treatment effects and the possibility with PIR of observing near perfect 100% estimated duration as the result of a ceiling effect.
Implications and Suggestions for Single Case Design Researchers
As researchers design experiments, they make numerous choices to answer their questions while guarding against threats to internal validity. When deciding to use an interval system to estimate duration, current data suggest that researchers avoid PIR and WIR unless very short interval lengths (e.g., 3 s) are used. MTS sacrifices less accuracy than PIR and WIR with longer intervals, but induces variability. This may be less problematic, from a conservative visual analysis standpoint. Furthermore, while a researcher may not consider the systematic bias of the PIR and WIR problematic in general (given that the direction of the bias in estimation is understood), conclusions drawn from these data should be considered suspect and “effect sizes” calculated from them invalid. From a pragmatic standpoint, with PIR in particular, researchers risk terminating treatment after concluding positive effects despite the true value of the response duration not reaching criterion level. To assist readers in interpreting data derived from interval systems, researchers should also consider including the average DPO in baseline and treatment conditions for each participant. Although laborious, this practice will help estimate the amount of error imbued by the interval system.
Future Research Needs
Several interrelated areas of research around interval methods need to be addressed to improve interval-based systems. Although a long history of research exists (see Lane & Ledford, 2014), systematic manipulations of important variables in a series of comparisons have not been conducted. Specifically, studies are needed which assess:
The degree to which DPO in baseline can be used to determine interval sizes that will accurately estimate duration.
The degree to which DPO measurements in both conditions can be used to make post hoc corrections to data to more accurately estimate duration.
Whether there are simple relationships between most accurate interval size and DPO (e.g., Is an interval size of half the DPO consistently accurate or are small intervals needed even with long-duration behaviors?).
Whether using a type of “confidence interval” for MTS allows researchers to illustrate potentially “false” variability and better estimate actual duration across conditions. For example, when a 10% duration estimate for a single session is generated when an MTS system is used, what is the range of likely actual duration? This range may assist in making accurate decisions regarding functional relations when variability exist in data. Additional research is needed to determine whether it is feasible to develop ranges, and what information might be needed for their use (e.g., DPO of baseline occurrences).
The degree to which other alternative measurement systems (e.g., estimating total duration by measuring DPO while sampling a small portion of the session) produce accurate estimations when compared with interval-based systems.
The degree to which practitioners find other alternative measurement systems feasible in comparison with interval-based systems.
The need for research in this area is extensive; previously stated research questions are a sample of questions needed to provide information to researchers and practitioners so they can make accurate decisions about behavior and the degree to which it changes across conditions. In addition to further research, changes in research practice are also warranted; when researchers use interval-based systems, they should explicitly describe their data as estimates of true behavior duration. In addition, they should explicitly state the relationship between their estimations and actual behavior duration (e.g., in the case of PIR, “During both conditions, the interval-based estimates are likely to overestimate actual duration; this overestimation is likely larger during the intervention condition.”). This information will allow consumers of research to more accurately assess the level and degree of change of the target behaviors of interest.
Interval systems can increase feasibility when collecting data, allowing researchers to estimate total behavior duration. However, results from this study show the importance of exercising caution when analyzing data to make decisions and when interpreting results. More research is needed in this area to determine how to generate more accurate estimates of behavior duration using discontinuous systems so both researchers and consumers can make more informed decisions regarding the validity of studies using interval systems.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
