Abstract
National attention on whole-of-school approaches to decrease children’s sedentary behavior and increase physical activity includes movement integration (MI) in classrooms. The purpose of this study was to describe instrument development, reliability, and validity of the System for Observing Student Movement in Academic Routines and Transitions (SOSMART), designed to assess MI in elementary classrooms. An a priori conceptual framework was developed based on existing literature. The framework was expanded/refined using videos from elementary classrooms and a Delphi survey. The survey, sent to 85 experts, yielded a 38% response rate. The final system includes 11 MI variables (three categories of teacher variables, two categories of student variables) and uses a 20-second continuous interval recording format. Reliability and validity data were collected in 12 classrooms across four elementary schools. Instrument reliability was tested using interval-by-interval percentage agreement for each category. Construct validity was tested by estimating multilevel random effects logistic regression models comparing student accelerometer derived activity with the presence/absence of each MI variable. Intraobserver reliability resulted in 97.5% agreement and exceeded 80% on all variables. Construct validity was supported for 8 out of 11 MI variables. SOSMART can provide valid, reliable, and objective data about MI in elementary schools.
Schools are a key setting to promote physical activity (PA) of children and reduce sedentary time (Centers for Disease Control and Prevention [CDC], 2013; Institute of Medicine [IOM], 2013; National Physical Activity Plan Alliance, 2012; Pate et al., 2006; U.S. Department of Health and Human Services, 2008). Recommendations for increasing PA and reducing sedentary time include using a multicomponent approach that includes movement integration (MI) in academic classrooms (CDC, 2013; IOM, 2013). In elementary schools, the academic classroom is where generalist classroom teachers instruct students in academic subjects (e.g., math, language arts) and where students spend the majority of the school day. MI is defined as opportunities that allow for reduced sedentariness and/or increased PA, at any level of intensity, among children during classroom time (Webster, Russ, Vazou, Goh, & Erwin, 2015). Integrating movement into the classroom setting has empirical support for increasing children’s PA (Bartholomew & Jowers, 2011; Beighle, Erwin, Beets, Morgan, & Le Masurier, 2010; Erwin, Beighle, Morgan, & Noland, 2011; Holt, Bartee, & Heelan, 2013; Mahar et al., 2006), decreasing sedentary time (Gortmaker et al., 1999; Robinson, 1999; Salmon, 2010; Salmon et al., 2005), improving on-task behavior (Grieco, Jowers, & Bartholomew, 2009; Howie, 2013; Mahar, 2011; Mahar et al., 2006), increasing positive affect (Howie, Newman-Norlund, & Pate, 2014), and enhancing cognitive function (Donnelly & Lambourne, 2011; Elmakis, 2010; Howie et al., 2014).
Despite these benefits, little is known about the extent or nature of how teachers are integrating movement in schools (Webster et al., 2015). Research to date is limited to teacher self-reports (Bartholomew & Jowers, 2011; Cothran, Kulinna, & Garn, 2010; Cradock et al., 2014; Elmakis, 2010; Evenson, Ballard, Lee, & Ammerman, 2009; Gibson et al., 2008; Holt et al., 2013; Howie et al., 2014; Kohl, Moore, Sutton, Kibbe, & Schneider, 2001; Kibbe et al., 2011; Skrade & Vazou, 2013; Stewart, Dennison, Kohl, & Doyle, 2004; Webster et al., 2013; Williamson et al., 2007; Woods, 2011) and is devoid of objective accounts derived from systematic observation. Systematic observation is a proven method of capturing contextual and behavioral variables that are useful in operationally defining, advancing, and evaluating best practices in teaching (Flanders, 1970, 1976; van der Mars, 1989) and PA promotion in an array of settings (McKenzie, Sallis, & Nader, 1991; McKenzie, Cohen, Sehgal, Williamson, & Golinelli, 2006; McKenzie, Marshall, Sallis, & Conway, 2000; Weaver, Beets, Webster, & Huberty, 2014). The benefits of systematic observation for assessing PA include flexibility, low levels of inference during data collection, the ability to capture information about physical and social environments at the same time, minimal interference with participants, and results that are easily quantifiable and often summarized in a way that is easy for policy makers, administrators, and practitioners to understand (i.e., frequency, duration, percentage of total time; McKenzie & van der Mars, 2015).
The purpose of this study was to describe the development, reliability, and validity of a systematic observation instrument designed to measure MI. The instrument—named the System for Observing Student Movement in Academic Routines and Transitions (SOSMART)—will be useful in future research to determine the extent of MI, describe MI intervention implementation, identify possible limitations of MI, and develop optimal strategies for increasing the effectiveness and sustainability of MI as a key component of school-based PA promotion.
Method
The following four phases were used to develop and establish the reliability and validity of SOSMART. These are detailed below.
Phase 1: Establishing an A Priori Framework
An extensive review of the literature concerning MI (Webster et al., 2015), including research and recommendations, was conducted to establish an a priori conceptual framework. For the initial framework, we conceptualized MI into two major categories: deliberate and incidental, based on the range of MI strategies identified in the literature. We defined deliberate MI as a PA opportunity directed by the teacher and can include morning movements, PA infused into academic lessons, and PA breaks between lessons. We defined incidental MI as a PA opportunity that was not directed by the teacher at the moment it happens and that would usually occur as a result of an established classroom routine/procedure (e.g., a procedure requiring students to walk around the perimeter of the classroom to sharpen a pencil), the type of furniture/materials used in the classroom e.g., using elliptical desks instead of traditional desks), or the way furniture/materials are arranged (e.g., having materials for different academic subjects placed around different parts of the classroom so that students must walk around the room to retrieve/replace materials). Often, deliberate MI might promote more moderate- or vigorous-intensity PA, whereas incidental MI might promote more light-intensity PA. The inclusion of both deliberate and incidental MI strategies in the framework was based on research that has established unique health benefits related to moderate and vigorous PA as well as to light PA and reductions in sedentary time (IOM, 2013).
Phase 2: Expanding and Refining the A Priori Framework
Phase 2 occurred in two steps: (a) videotape analysis and (b) a Delphi survey.
Videotape Analysis
Permission to videotape normal classroom time in four elementary schools in two school districts (two schools per district) was obtained from the lead researchers’ institutional review board, the school districts, and the school principals. Permission from these entities was also obtained to collect student PA data using accelerometers (see section below on SOSMART validity). All schools were in the Columbia, South Carolina, area and were selected as a convenience sample for the study. The two schools in the first district served a combined total of approximately 964 students in Grades K-5 with 58.6% of the students eligible for free and reduced lunch (South Carolina Department of Education, 2013). The two schools in the second district served a combined total of approximately 376 students across Grades K-5. Eligibility for free and reduced lunch data was not available for the schools in the second district at the time of the study.
Purposeful sampling was used to ensure access to classroom teachers demonstrating MI in and across diverse classroom contexts for videotaping. This was achieved through administering a survey to all classroom teachers across the four schools, which asked the teachers to report their use of MI as well as demographic information. The survey was developed and adapted with insight from previous research (Elmakis, 2010; Webster et al., 2013), two MI scholars, and three classroom teachers not at participating schools to ensure content validity. The survey responses were coded, categorized, and then sorted by teacher-reported grade level, number of students, number of assistants, content areas used for MI, frequency of using MI, range of MI strategies used, and the highest combined score for frequency and range of MI use. Out of 80 survey respondents, 20 teachers (N = 20, Mage = 34.9 years, SD = 10.4) were purposefully selected because of their reported use of MI in their classroom.
Before collecting video data, informed consent was obtained from participating teachers and the students’ parents. Videotaping was conducted by trained researchers using a single digital video camera with a tripod in each classroom to record all regular classroom events including teacher and student behavior. To minimize participant reactivity, the camera was set up unobtrusively in a corner of the classroom. Across all classrooms, 32.4 total hours of videotaped observations were collected with an average observation lasting 1.6 hours. The observations were conducted (a) during approximately 2 months in Fall 2014 and at times that did not overlap with state mandated testing or during the first month of the school year, (b) on multiple days of the school week, (c) at multiple times during the school day, (d) across Grades 1 to 5, (e) with teachers whose teaching experience ranged from 1 to 35 years, (f) in classrooms with approximately 11 to 24 students and both with and without teaching assistants, and (g) during academic lessons in multiple subject areas.
As video data were collected, the lead researcher reviewed videos to catalogue examples of MI. The a priori conceptual framework guided initial observations, although the researcher also remained sensitive to unanticipated MI behaviors or opportunities. Video examples and initial categories of MI were discussed with a second researcher whenever unanticipated behaviors or opportunities emerged. In such cases, if the identified behavior/opportunity was not readily catalogued using the a priori conceptual framework, the framework was revised. Consistent with established instrument development procedures, video viewings and discussions continued throughout data collection and afterward to confirm and expand MI concepts until the observations yielded no further insight (Thomas, Nelson, & Silverman, 2011; Weaver et al., 2014).
Delphi Survey
Following the development of initial MI concepts from the literature review and video data, a Delphi survey was used to confirm, refine, and/or expand these concepts. Whereas the concepts and definitions from the a priori framework and the videotaped sessions were used to develop the initial definitions for each MI strategy (i.e., the category system) in the instrument, the results from the Delphi survey were subsequently used to finalize the category system for SOSMART. The survey was sent to individuals considered to have expertise in MI and classroom teaching. These individuals included 46 MI scholars/researchers identified from the published literature on MI and a sample of 39 elementary classroom teachers that did not participate in the Phase 2 of the study.
The survey was administered in two rounds to first explore participants’ conceptions of MI, and then to finalize the category system for the instrument (Thomas et al., 2011; Weaver et al., 2014). In the first round, participants were provided with the definition of MI (6) and then asked to respond to an open-ended prompt: “Classroom movement integration (MI) involves reducing your students’ sedentary time (e.g., sitting) and/or increasing their PA during normal classroom time (i.e., in elementary general education classrooms). Please list all examples and/or strategies you can think of that represent MI.)” A total of 32 individuals (12 scholars/researchers and 20 classroom teachers) responded to the first round of the survey. Slight modifications were made to the instrument based on participant input. In the second round, the category system with the MI concepts, definitions, and examples was sent to all respondents from the first round. Participants were asked to provide any additional feedback about the content of the instrument. The second round yielded no further insights; therefore, no further rounds were pursued.
The final MI concepts and their operational definitions of classroom strategies that represent MI are presented in Table 1 alongside interobserver reliability scores for each MI concept. The instrument uses a two-stage decision-making process focused first on teacher involvement and then on student responses. Teacher involvement is described by three categories: the person giving the directive to be active (i.e., classroom teacher or other), instructional variables (i.e., the teacher led the activity or technology was used to lead the activity), and movement type variables (i.e., deliberate MI as a reward/incentive, opening activity, transition, and/or other movement that was academic or nonacademic in nature). Student involvement is described by two categories: the part of the class that was active (i.e., whole class, part class, or small group) and the reason for it (i.e., in response to the deliberate teacher directive, or incidentally as a result of the physical environment or a non–teacher-directed transition).
Classroom Strategies That Represent Movement Integration.
Note. “—” indicates the behavior was never observed therefore percentage agreement was not calculated.
Phase 3: Devising a System for Coding and Interpretation
SOSMART was designed to be an interval recording system to capture the frequency and types of MI opportunities, which are theorized to lead to physically active student responses. An interval recording system was selected because it allows not only for recording multiple events simultaneously but also for recording both the occurrence and nonoccurrence of a variable (i.e., MI; McKenzie & van der Mars, 2015). Inactive versus active responses are operationally defined as follows:
Inactive: Student(s) engaged in sedentary or low-active behaviors (i.e., lying down, sitting, standing quietly (Marshall & Merchant, 2013; McKenzie et al., 1991; Weaver et al., 2014) Note. This excludes standing and stretching (i.e., performing nonlocomotor movements while sitting and/or standing; these behaviors are included in “active” (see below)
Active: Student(s) engaged in locomotor movement (ranging from walking to running) and/or isolated upper body and/or lower body movements (nonlocomotor) whether sitting or standing Note. Using these definitions, sitting on an exercise ball is not sitting at rest; therefore, it is active
Coding Procedure
For each interval, decisions must be made about teacher involvement and student response. The first stage requires a decision to be made about the involvement of the teacher by answering the following question: Did the teacher give a direction to be active? If the answer is “Yes,” the observer moves on to code teacher involvement behaviors (teacher directive variables, instruction variables, and movement variables), then proceeds to Stage 2 (student response variables). If the answer is “No,” the observer moves on directly to code Stage 2 (student response variables).
The second stage requires a decision to be made about the response of the class by answering the following question: How did students respond? If the answer to the previous stage was “Yes,” the observer records what portion of the class is active (whole class, part class, or small group). Context variables identify how much of a student’s body is active (upper body only, lower body only, or full body) and off-task behavior. If the answer to the previous stage was “No,” the observer records what part, if any, of the class is active and the observable reason for that movement (as a result of something in the physical environment or as a result of a non–teacher-directed transition, like getting supplies or using the bathroom). Within these categories, context variables identify the presence of added activity and/or off-task behavior. A flow chart illustrating the two-stage decision-making process is presented in Figure 1.

SOSMART decision flow chart.
On prepared coding forms (Figure 2), trained observers list all relevant codes present during continuous observation for 20-second intervals. The decision for 20-second intervals was based on trial and error with various interval lengths. For instance, a 30-second interval was too long, with too many variables occurring in one interval in order to be coded; a 15-second interval was too short to record all of the variables occurring in one interval. When coding, the observer lists the appropriate code(s) in the appropriate 20-second cell as soon as evidence is observed. The observer should only list the code only once in a given 20-second cell on the coding form, even if it is present more than once during that interval. Context codes should be written as a subscript to the major variable code. Coding a — is acceptable for consecutive cells when the movement continues across multiple consecutive intervals.

SOSMART coding sheet.
Interpretation Procedure
SOSMART is designed to capture observable MI variables and translate findings into an easily quantifiable format. The summary sheet (Figure 3) provides space to calculate the total number of intervals for each category. Total percentage of occurrence can be calculated as follows:

SOSMART scoring summary.
A percentage of occurrences can be calculated for each code, as well as a tally mark for each unique instance of the code (i.e., each time the code appears, not including the dashes). There is no benchmark for high versus low MI frequencies or percentages of total time. SOSMART should be used to document the frequency and types of MI strategies evidenced in academic classrooms. Continued research with this instrument may provide information that can help to establish an appropriate standard for MI in the classroom setting (Webster et al., 2015).
Phase 4: Reliability and Validity testing
Observer Training and SOSMART Reliability
Reliability training followed a specific sequence of steps: (a) orientation to systematic observation and the SOSMART instrument, (b) committing behavior categories/codes to memory, (c) video practice, and (d) reliability testing (McKenzie & van der Mars, 2015; Pope, Coleman, Gonzalez, Barron, & Heath, 2002). Five observers not involved in developing the instrument were trained until 80% agreement was reached against a criterion score for a video with multiple examples of MI. Approximately 5 hours (across 2 days) of training were required to achieve this reliability standard. The standard was also achieved for intrarater reliability using the same video and interrater reliability in a live classroom setting.
SOSMART Validity
In addition to the previous steps used to develop content validity in Phases 1 and 2 of the study, statistical analysis was used to test the hypothesis that the presence of MI variables (teacher directives, instructional, and movement types) would contribute to student activity and/or decrease student inactivity. Construct validity of the instrument was evaluated by examining the presence/absence of teacher MI compared with students’ activity and/or sedentary behaviors measured with accelerometers (ActiGraph wGT3X-BT). Twelve of the video observations conducted in Phase 2 of the study were selected for analysis. Ten of these observations were randomly selected within and across each grade level at each school to provide a representative sampling of MI. Two additional observations were purposefully selected because they provided the greatest likelihood of observing a variety of MI concepts from the SOSMART category system. Table 2 reports the number of students in each class observed (wearing accelerometers) and the MI summary across the subsample of observations (n = 12). Two of the participating classes experienced low return rates of the parental consent forms leading to low number of participants in those classes.
SOSMART Number of students Observed (Wearing Accelerometers) and MI Summary Across Subsample of Observations (n = 12).
Note. SOSMART = System for Observing Student Movement in Academic Routines and Transitions; MI = movement integration; Obs = observation; Std = number of students in the class on day of observation; StdAcc = number of students wearing accelerometers on day of observation; Int = number of intervals in the observation; CT = classroom teacher; O = other individual; N = no directive given; T = teacher-led; C = technology-led; R = reward; O = opening activity; TT = teacher-directed transition; OMna = other movement, nonacademic; OMa = other movement, academic; E = environment; NT = non–teacher-directed transition. MI variables reported as percentage of total intervals.
Statistical analyses were completed using STATA (Version 13.0, College Station, TX). Reliability for SOSMART was calculated for agreement with the criterion score, interobserver reliability (IOR), and intraobserver reliability. All reliabilities were calculated as interval-by-interval agreement for each category (MI concept) using the following formula (Howie et al., 2014; Mahar, 2011):
A 2-week interval was used to obtain scores for intraobserver reliability (Thomas et al., 2011). Validity of SOSMART was conducted by examining the presence/absence of MI variables compared to the activity counts per minute from the accelerometers using unconditional multilevel random effects logistic regression (Guo & Zhao, 2000). The activity data were aggregated at the class level (i.e., class median counts/minute) and matched minute by minute to the completed SOSMART coding sheet for the same observation. To match the coded 20-second SOSMART intervals with the 30-second accelerometer epochs, each outcome was summarized by minute and then matched at each minute of the observation. The 20-second SOSMART intervals were identified as the ideal interval length because longer (e.g., 30-second) intervals reduced interobserver reliability and shorter (e.g., 15-second) intervals were too short to observe and record all of the SOSMART variables. Separate models were estimated for girls and boys on each of the 11 MI variables. A cut point of 100 counts/minute was used (Matthews et al., 2008), where greater than 100 counts/minute was considered active (i.e., total activity, regardless of intensity) and 100 counts/minute or less was considered inactive.
Results
Reliability
Interobserver reliability agreement and total reliability exceeded 80% in live and video reliability testing (Table 1). Intraobserver agreement across 2 weeks resulted in 97.5% agreement. Three MI variables were not observed (i.e., reward, other movement [academic], physical environment); therefore, reliability was not calculated for these variables.
A total of 3,123 intervals were coded, averaging 260.3 intervals per observation. Across all observations, the MI variables with the greatest median percentage of occurrence were “no directive given” (76.5%) and “non–teacher-directed transition” (22.8%). The second greatest median percentages of occurrence were MI “directed by classroom teacher” (22.8%), “teacher-led” (22.7%), and were “teacher-directed transitions” (21.2%). Table 2 reports the full summary of MI variables across observations.
Validity
Logistic regression models of MI variables related to total activity (i.e., activity counts/min) are presented in Table 3. Results support the hypothesis that students were more likely to be active when MI variables were present with 8 out of 11 variables achieving statistical significance (see Table 3). The odds ratios for these variables show that the odds were higher that students would be physically active when the variables were coded. For example, the strongest predictor of student activity was the presence of “other movement, academically infused,” with an odds ratio of 2.3. Therefore, the odds were 2.3 times higher that students would be active when this variable was coded versus when this variable was not coded. Put another way, students were more likely to be active when MI that included teaching or reviewing academic content was present. When comparing boys to girls, girls were more likely to be active when MI was technology-led or when MI was a movement integrated with academic content.
Construct Validity of the SOSMART Based on Multilevel Random Effects Logistic Regression Models.
Note. SOSMART = System for Observing Student Movement in Academic Routines and Transitions; OR = odds ratio; CI = confidence interval. — indicates too few observations to estimate. Statistically significant relationships are in boldface.
Figure 4 visually represents a sample demonstrating construct validity. When MI was coded, student activity (class median) was more likely to be present. These data were purposefully selected from a teacher demonstrating the greatest frequency and variety of MI strategies during observations. As expected, when a teacher directive to be active occurred, students were more likely to be active; similarly, in the absence of a teacher directive, students were typically not active (i.e., registered <100 counts/minute on the accelerometer). The exceptions to this were instances where no teacher directive occurred but activity counts still exceeded 100 counts per minute. In these cases (i.e., minute 8:13 a.m.-8:18 a.m.), the SOSMART variable coded was NT (no teacher directive), indicating at least one student was engaged in incidental types of movements (i.e., getting a supply or going to the bathroom) with no teacher directive given.

Sample illustration of SOSMART construct validity.
Discussion
To our knowledge, SOSMART is one of the first systematic observation tools for measuring MI in the academic classroom. This instrument fills the need for objectively measuring MI, which is a key strategy in coordinated and comprehensive approaches to PA promotion through schools (CDC, 2013; IOM, 2013). While SOSMART was found to be valid and reliable overall, three MI variables (reward, opening activity, and physical environment) were not observed often enough to establish construct validity. The use of established literature, video observations from elementary classrooms, and the Delphi survey to derive SOSMART variables suggests that while these three MI strategies might not be as commonly used as others, they should contribute to children’s daily PA. Further research is needed to investigate the effects of these and other MI strategies on children’s PA/sedentary time in the academic classroom setting.
The variables occurring most frequently included instances where no directive was given, non–teacher-directed transitions (i.e., incidental movements occurred), and teacher-led transitions. Many of the recommendations for MI focus exclusively on teacher-directed opportunities, particularly movement breaks and active lessons (Webster et al., 2015). The relatively high frequency of incidental types of MI observed in this study gives reason to consider focusing on classroom management/organization strategies that teachers can use to increase students’ PA. For example, strategically arranging classroom furniture and materials so that students must walk around the classroom to get what they need or exchanging traditional seats for exercise balls and other alternative options are simple strategies that do not require daily planning or much time to implement in classroom routines. Adopting such strategies may reduce teachers’ perceived barriers to using MI, many of which relate to time constraints (Webster et al., 2015).
Our results showed that girls were more likely to be physically active during MI that was either technology-led or academic-infused but boys were not. These activity response differences could be related to the types of MI activities teachers use. For example, it was common for teachers to play dance videos using websites like YouTube or GoNoodle. Girls, particularly in the upper elementary grades, may be inclined to enthusiastically participate in such activities, while boys are not. In addition, educational research indicates that girls tend to be more on-task, attentive, and motivated in academic tasks than boys (Cornwell, Mustard, & Van Parys, 2013). Girls’ higher PA levels during activities that integrate movement with academic content may be a reflection of a general tendency for girls to be more engaged in classroom learning. Future research should examine gender differences in children’s classroom PA and identify reasons for these differences, so that teachers can tailor MI strategies and activities to the interests and needs of both girls and boys.
This study has several limitations. The relatively small sample size (i.e., 12 classes in 4 schools) in Phase 2 of the study limits the generalizability of the results. While contextual variables were considered in selecting classrooms for observation, it is possible that the SOSMART category system does not fully represent the range of MI strategies that would be found across diverse classroom settings. In addition, based on the relatively lower reliability results of the student response variables “whole class” and “part class” and the global focus on whether students are active or inactive (i.e., no distinction between PA intensities), we suggest that SOSMART should be used with greater emphasis on documenting MI strategies than on determining the effects of these strategies on children. Finally, while SOSMART captures different types of MI, it does not provide detailed information about the specific MI activities and strategies implemented in a given classroom. Researchers should consider collecting additional contextual information through field notes or videotaping when using SOSMART or should revise the instrument to increase its sensitivity to such information.
Despite these limitations, SOSMART can find valuable application in both scholarship and practice. For example, researchers can use the instrument as an objective measure of classroom-based PA promotion, school administrators can use the instrument as a way to assess and evaluate teacher performance, and teacher educators can use the instrument to help current and future teachers learn to use MI. SOSMART will enable the development of a descriptive research base for MI and will enhance evaluation of classroom PA interventions irrespective of the specific MI programs or packages implemented. We expect that the information SOSMART generates will also reveal how to maximize the value of routine classroom practices to align academic and health goals in schools. Ultimately, research incorporating SOSMART can help build the evidence base needed to establish clear benchmarks for MI and advance the recommendations for best practice in classroom teaching and school-based PA promotion.
Footnotes
Acknowledgements
We acknowledge the contributions of Kristen Crawford, Savannah Radenbaugh, Savannah Starling, and Parker Smith for being the first four trained SOSMART coders.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
