Abstract

Although most scholars identify Wolf’s (1978) seminal article as describing the origins of social validity, a more careful analysis reveals earlier precursors. An inductive process of research began in 1967 that culminated in a set of goals and procedures that defined the concept of social validity (Fixsen, 2019). This process began when Wolf and his doctoral students established The Achievement Place Model in Lawrence, Kansas, which was a group home for six teenagers who were referred from the juvenile justice system. Lonnie and Elaine Phillips were the first “teaching home” parents and served a dual role as teachers and surrogate parents. Wolf and his students—Lonnie was one of his students—adopted single-case research methods as a mechanism to systematically address questions that arose during the daily treatment of the six youths. In the late 1960s and early 1970s, many interventions being evaluated focused on problem behavior because the individuals who were the focus of the interventions resided in restrictive environments that often occasioned problem behavior. Many of these interventions harbored an aversive component leveraging the principle of punishment and produced the goal of behavior reduction.
Wolf and his students questioned the ethics, as well as the efficacy, of these interventions. Their response, as reflected in a prolific program of research, was to relinquish some level of control to allow the youths “to make decisions about their own lives and those of their peers” (Fixsen, 2019, p. 2) that resulted in a form of self-government. As part of this self-government component, the youths’ opinions and ratings were sought to evaluate how they were being treated (i.e., consequences for rule compliance and infractions) and whether they liked the teaching-parents and found them to be fair, consistent, and humorous. As an aside, these evaluations led to a provocative and fundamental quest to understand the concept of relationship or how well the teaching-parents related to the youths (Kirigin, Braukmann, Atwater, & Wolf, 1976). In addition to asking the youths for their feedback, Wolf and his students explored systematically the opinions of other consumers, relevant in the lives of the youths—parents, teachers, court personnel, police officers—to determine the social validity of the Achievement Place goals, methods, and outcomes. This now familiar concept was chronicled and documented in an extensive program of research lasting more than a decade.
Although the concept of social validity was pioneered more than 40 years ago and identified as a quality indicator for single case research almost 15 years ago (Horner et al., 2005), the practice of social validity has not yet delivered on its promise. Too often investigators either omit it from their studies (Snodgrass, Chung, Meadan, & Halle, 2018) or they assess it in such a routine manner that its yield rarely informs the field about how to move forward. My goal in writing this brief piece is to critique two features of social validity and to recommend options to address them. My hope is to motivate researchers in early childhood special education to delve more deeply into the social validity of their investigations.
In a recent manuscript submission to a journal for which I am a reviewer, the authors conducted an intervention study in which an investigator taught peers strategies for interacting with their fellow students who had severe disabilities. They conducted the typical, almost formulaic, social validity assessment by asking consumers (i.e., the students with disabilities, the peer partners who interacted with the students, and school staff) whether they liked spending time with each other, whether they enjoyed participating in the study, and whether the staff would continue using the intervention? The responses reflected their strong satisfaction with the intervention and the outcomes. However, before unquestionably accepting these results, as skeptical evaluators, we need to understand the context in which these data were gathered. Consumers’ verbal responses are subject to distortion (Skinner, 1953) due to conditions in which they are gathered (e.g., pleasing the investigator, positive bias due to having participated in the study). This does not imply that all social validity assessments are biased; however, it ought to motivate us to reduce as much as possible potential sources of bias. Please refer to Barton, Meadan-Kaplansky, and Ledford (2018) for a detailed discussion of recommendations.
To improve the objectivity of the social validity assessment in the research study mentioned in the previous paragraph, an investigator might recruit experts or relevant consumers who were not involved in the study to view randomly selected video clips drawn from baseline and intervention sessions and presented in random sequence. Without knowing which clips were drawn from baseline and intervention, consumers would be asked to rate each one on a global measure of student/peer interaction to determine whether their ratings reflect differences that distinguish baseline from intervention clips. Wolf and his collaborators often conducted this type of social validity evaluation in the 1970s. Admittedly, this would be at some expense in terms of both time and effort. However, if social validity assessments are to be meaningful and to offer direction for future research and practice, then we need to expend resources to improve their yield. Akemoglu et al., this issue, provide an example of this type of measurement. In my opinion, social validity assessments have become an expected but often humdrum, hollow, and formulaic addition to our intervention research.
Another question, germane to social validity, has plagued me for much of my professional life. Do the changes resulting from an intervention maintain when the study has ended (i.e., the contingencies of the study no longer exist)? Kennedy (2002) aptly referred to maintenance of behavior change as a potential measure of social validity. Adopting a scenario similar to the study mentioned above in which a teacher is tasked with training peers to interact socially with a student with severe intellectual disabilities, what happens when the study ends? That is, the teacher who was the interventionist is no longer tasked with training the peers; the observers no longer come to the school to gather data on student–peer interaction; and the peers’ obligation to participate in the training and interact with the student has ended. Will the teacher continue to prompt the peers to interact with the students with disabilities? Will the peers, with or without prompting, continue to interact? Will the student with disabilities continue to respond to peers’ overtures and, on some occasions, continue to initiate interactions? We almost never know the answers to these questions and yet they are fundamental to the success of an intervention (see Lucyshyn et al., 2007 as an exemplary exception). Can we produce changes that extend beyond the confines of the study itself?
It may be important here to digress and distinguish between maintenance of behavior change within a study (i.e., the intervention is withdrawn to determine if behavior endures in the absence of the intervention, but in the presence of ongoing conditions that define an experimental investigation) and behavior durability after the study has ended (i.e., the conditions that define the study are completely withdrawn, including research personnel and data gathering). This distinction is fundamental because many features of participating in a study are unique to that context and findings may not transfer to practice (natural) settings where conditions vary from those present in an experimental investigation.
Although many applied studies include assessments of maintenance within a study, often these efforts are fraught with problems. After a week or a month or 6 months, when we (as investigators or observers) return to the setting in which the intervention was conducted to collect maintenance data, our very presence is obtrusive and potentially reactive (Kazdin, 1979). That is, in the example above, if observers suddenly reappear in the setting of the study to gather data, the three consumer groups likely will know (i.e., see or hear the observers) they are there. The presence of the observers may trigger behavior from the teacher/interventionist, the peers, and/or the student with disabilities that would not have occurred in their absence. In fact, this phenomenon of reactivity has been observed and demonstrated (Halle, Baer, & Spradlin, 1981). If reactivity is occurring during a maintenance check, then the investigator’s conclusion about behavioral durability may be flawed. That is, the presence of observers may be an essential feature producing the intervention effect or, in other words, without the presence of observers, the behavior change observed in the context of the study will not maintain—it is not durable under conditions that differ from participating in the study such as naturally occurring conditions.
An experimental example of reactivity with which I am most familiar is my dissertation (Halle et al., 1981). An essential feature of this study contained an analysis of the effects of obtrusive recording. Our primary focus, however, was to examine the effects of my training two preschool teachers to implement a delay procedure with the children in their classrooms. After observing teacher–child routines, we identified a set of opportunities in which a brief delay or wait by the teacher might encourage the children to use words that they already knew, but rarely emitted. We also identified a set of generalization opportunities that were similar to the training occasions, but were never divulged to the teachers. During the final 2 weeks of the formal study (i.e., the teachers were led to believe the study ended), each teacher was using delays in almost 90% of the training opportunities and approximately 80% and 70%, respectively, of the generalization opportunities (see Figure 1).

This is a reprint of a figure that originally appeared in Halle, Baer, and Spradlin (1981).
To conduct the analysis of obtrusive recording, I thanked the teachers for their participation and assistance with the study, leaving the impression the study had ended. Furthermore, the observers stopped coming to the classrooms to record teacher and child behavior. However, unknown to the teachers, the observers began recording again at three intervals—1, 2.5, and 5 months—from an observation booth with a one-way mirror. The data are captured in Figure 1 and reflect a reduction of about 20% in teacher use of delays in training opportunities at the 1-month maintenance check with further decrements at the 2.5- and 5-month checks.
We interpreted these findings as a positive outcome because the two teachers were implementing delays on many more occasions than they were during baseline. However, it became obvious that some occasions were not convenient, motivating, or compelling for the teachers and these were omitted over the 5-month unobtrusive recording from the observation booth. To complete the analysis of the effects of obtrusive recording, I asked the two teachers for permission to allow the observers to return to their classrooms to record maintenance data (they believed that it had been 5 months since we had observed them). Figure 1 reveals that both teachers increased their use of delays in training opportunities by about 40%. This surge likely reflects reactivity to the observers’ presence. That is, the teachers increased their use of delays in training opportunities when observers were present in their classrooms. The analysis was similar to a B-A-B single-case research design where B = obtrusive recording and A = unobtrusive recording. It would not meet contemporary evidence-based standard which requires three demonstrations of a basic effect, but I would argue the data are quite compelling.
Although I will not elaborate details here, some readers may be wondering about the ethics of observing teachers without their knowledge or permission. As part of the Institutional Review Board approval process, we were required to (and entirely supportive of) fully debrief the teachers about the deception we perpetrated and to offer the option of eliminating their maintenance data if they so requested. I am raising this issue because I am advocating the gathering of unobtrusive data both during a study and after its termination to determine the influence of obtrusive recording and to reduce the likelihood of reactivity as a factor affecting the generalization or external validity of research findings.
After collaborating with the editors of this issue of Topics in Early Childhood Special Education, I have identified three strategies for gathering unobtrusive data and, for two of these strategies, I am citing studies that contain exemplars of their application. One strategy, as just described, capitalizes on settings in which observation booths with one-way mirrors are available. In addition to the Halle et al. (1981) study, Barton and colleagues leveraged this strategy in two recent publications (Barton, Pokorski, et al., 2018; Barton, Rigor, Pokorski, Velez, & Domingo, 2018).
A second strategy capitalizes on recruiting individuals to observe who already are members of the setting. Due to their status, they likely can record unobtrusively. One example from my past research (Phillips & Halle, 2004) includes recruiting university supervisors, as clandestine observers of student interns who were the recipients of an intervention to teach the interns to implement environmental arrangement and delayed-prompting strategies. The data, collected weekly by two university supervisors, were considered unobtrusive because the interns and cooperating professionals in the classroom were not informed that university supervisors would be gathering data for the study. Before and during the study, the university supervisors were present weekly to assess interns’ progress in their practicum—a typical schedule of intern supervision. Therefore, their data gathering for the purposes of the study presumably went unnoticed by those in the classroom. The results revealed that all four interns employed the instructional strategies at lower rates during maintenance when observed unobtrusively. However, reactivity was not the only possible explanation for the findings. For example, competing expectations when a supervisor is present or subtle differences in the ongoing activities could be factors that influence the recording.
Another example of employing an indigenous member of the focal setting occurred in a study by Martin, Drasgow, and Halle (2015). In this case, we engineered the situation by recruiting a doctoral student in school psychology, Jack, to assist us. He visited the school 3 months before the study began and introduced himself as a future professional who wanted to learn more about young children with disabilities. In consultation with the director of special education, he negotiated an agreement to volunteer at the school on a regular basis throughout the year. The school staff considered Jack’s presence to be entirely independent of our study.
In this study, the investigator taught four teachers to encourage play skills in children with developmental disabilities by embedding instructional interactions during outdoor playtime. Although observations during both baseline and intervention were obtrusive, in baseline the teachers were unaware of the precise strategies that defined the intervention. We hypothesized that the most likely occasions in which reactivity might be operating would be after the investigator was paired with the intervention and the teachers knew what was expected of them. Jack covertly recorded data in both the presence (to assess interobserver agreement) and absence (to assess reactivity) of the primary observer on the number of target teachers present on the playground and on whether they were engaged in interactions with children. Reactivity did appear to be operating in this study, but this conclusion was drawn with qualifications.
A third strategy for gathering unobtrusive observational data leverages the ubiquitous presence of cameras that capture video in settings throughout our communities—in homes, schools, commercial establishments, roads, parking lots, and so on. I cannot cite studies in our literature that have included this type of observational recording, but I am familiar with traffic studies in community psychology in which traffic patterns and use are recorded with the assistance of cameras. In computer science, new technologies are creating opportunities for innovation in unobtrusive observation and recording (Kientz, Goodwin, Hayes, & Abowd, 2013). Privacy and ethics will be essential challenges in the future as we endeavor to extend the horizons of our observational recording strategies to permit us a window into the world of maintenance and durability of behavior change.
Almost 50 years ago, Wolf and his group of doctoral students generated strategies for assessing the social validity of the goals, procedures, and outcomes of their Achievement Place Model by inviting consumers to provide them with feedback. One of the fundamental lessons they learned was to invite feedback to monitor progress (formative evaluation) from the perspective of those who were affected by their Model. Feedback can be delivered in many different forms, some unsolicited. One form that had a particularly substantial impact on Wolf was delivered when he attempted to replicate the Achievement Place Model in a new community: “Before we really knew that they had complaints about our program they had ‘fired’ us” (Wolf, 1978, p. 206). This outcome (and a combination of other factors) caused Wolf to consider methods for obtaining consumer feedback before being “fired.” As I have re-read Wolf’s seminal article multiple times, I have been impressed by the creative manner in which he and his doctoral students innovated strategies to obtain feedback from the consumers of Achievement Place. I believe many of us have become somewhat rigid in our efforts to assess and examine the social validity of the dependent variables, the methods, and the findings of our research and I am hopeful that the ideas recommended here might function as a stimulus for more creative, thoughtful, and penetrating efforts in the future.
