Abstract
One quality indicator of intervention research is the extent to which the intervention has a high degree of social validity, or practicality. In this study, I drew on Wolf’s framework for social validity and used qualitative methods to ascertain five middle schoolteachers’ perceptions of the social validity of System 44®—a phonics-based reading intervention for secondary students. Findings derived from teacher interviews and classroom observations conducted during the course of one school year indicate that the ways in which teachers make decisions about social validity are complex and predicated on the interaction of several factors related to an intervention’s goals, outcomes, and procedures. By using qualitative methods and applying Wolf’s framework to an academic intervention, I expand the social validity construct and delineate its sub-components.
The concept of social validity emerged in the 1970s with seminal articles by Wolf (1978), Kazdin (1977), and Van Houten (1979). They urged scholars in the field of behavioral sciences to ensure interventions were important to clients’ lives and could be sustained in community settings. Since the 1970s, the importance of social validity has been accepted widely and is now considered an imperative aspect of intervention research in special education (Horner et al., 2005). This is in part because experts posit a relationship between social validity and intervention fidelity (Heckaman, Conroy, Fox, & Chait, 2000; McDuffie & Scruggs, 2008). When the social validity of an intervention is low, teachers are less likely to implement it as it was intended, if at all. The importance of social validity has also entered conversations about the longstanding research-to-practice gap. When teachers do not consider interventions to be feasible, acceptable, or relevant to their work, they may be less likely to adopt and sustain them over time (Greenwood & Abbott, 2001).
A Framework for Social Validity
Social validity rests on the idea that consumers of an intervention and other stakeholders apart from researchers should participate in the evaluation process (Storey & Horner, 1991). In 1978, Wolf proposed a three-part framework for validating the social importance of interventions. The framework can be used to ascertain the social validity of an intervention along three dimensions: goals, procedures, and effects. Wolf posited that intervention goals must be what society wants; procedures must be considered acceptable and feasible; and consumers should be satisfied with the effects, both intended and unintended. Under this framework, social validity is a multidimensional construct and should be evaluated along a continuum (Wolery & Gast, 1990). In other words, an intervention cannot be evaluated dichotomously as in it does or does not “have” social validity (Foster & Mash, 1999).
Horner and colleagues (2005) added to the discussion by establishing additional criteria for evaluating social validity. They asserted that social validity is enhanced when an intervention can be implemented with fidelity in authentic contexts by “typical intervention agents (e.g., teachers or parents)” (p. 172). Interventions should also be feasible given available resources. Finally, intervention agents should choose to continue to use interventions even after formal support is removed.
The Use of Qualitative Methods in Social Validity Research
Typically, social validity is examined using questionnaires, rating scales, or direct observations by trained raters (Finn & Sladeczek, 2001), and there are several benefits to using such measures. Surveys or questionnaires, for example, are relatively inexpensive and easy to administer, particularly when working with large sample sizes. They generate quantitative data, which can be used in descriptive and inferential statistical analyses, and are thought to be an objective way of measuring subjective data (Wolf, 1978).
Quantitative rating scales and surveys, although advantageous in many ways, are limited in the amount and type of information they provide. When constructing these instruments, the researcher is the arbiter of what information is considered relevant and ultimately measured; thus, potential data deemed important by participants might be overlooked or only examined superficially. Questionnaires and other similar measures often ask participants to rate interventions according to categorical (i.e., yes/no) or ordinal (i.e., Likert-type scales) variables. Unless researchers include open-ended questions, this process does not allow for “Yes, but . . . ” kind of statements and other qualifiers, in essence, oversimplifying matters that may be quite complex from participants’ perspectives.
Qualitative designs, however, allow for systematic, in-depth, holistic examinations of phenomenon in natural settings with participants’ voices at the forefront of the study (Brantlinger, Jimenez, Klingner, Pugach, & Richardson, 2005; Creswell, 2007; Denzin & Lincoln, 2005; Merriam, 2009). Some such designs are well-suited for examining social validity as it is based on ascertaining consumers’ perceptions in authentic contexts (McDuffie & Scruggs, 2008). Qualitative methods also facilitate open-ended investigations that can help researchers uncover unanticipated findings or avenues for further exploration. According to Patton (2002), qualitative methods “can tell the stories behind the numbers, capture unintended impacts and ripple effects, and illuminate dimensions of desired outcomes that are difficult to quantify” (p. 152). This strength aligns with experts’ recommendations of examining social validity over time rather than at the end of study (Schwartz & Baer, 1991). Qualitative methods can also be useful when verifying self-report data. For example, in a study of the social validity of an early reading intervention, even though participants rated themselves as having a high understanding of the intervention, qualitative interview data indicated they actually had low levels of understanding (Lyst, Gabriel, O’Shaughnessy, Meyers, & Meyers, 2005). Finally, as presented in this special issue, qualitative methods are instrumental when attempting to understand enduring challenges and issues that are multifaceted and complex.
The extant literature includes some examples of researchers who used qualitative designs to conduct in-depth investigations of social validity. For example, in one study, researchers ascertained consumers’ (i.e., students and parents) perspectives on the co-teaching service delivery model to support students with disabilities in the general education classroom (Gerber & Popp, 1999). The researchers conducted focus groups with 123 students and their parents across elementary, middle, and high schools in Virginia. At each school, a member of the research team conducted a focus group comprised of students with and without disabilities and a separate focus group for parents. After conducting inductive thematic analyses, the authors learned the majority of students and parents were satisfied with the model and thought students with disabilities were excelling socially and academically. They were unhappy with conflicting messages that arose because of having two teachers. Some parents reported the model was not working for their children and there should be alternative options. An interesting finding that arose during the parent focus groups was that parents of students without disabilities, although supportive of co-teaching, were upset by the lack of communication about it, particularly the ways in which decisions were made about who would be placed in co-taught classrooms.
Another study that used focus groups was conducted by Copeland and colleagues (2004). The research team conducted focus groups with 32 students without disabilities who served as Peer Buddies in the Peer Buddy Program—a social and academic support program for secondary students with moderate or severe disabilities (Hughes et al., 1999). The Peer Buddies reported numerous benefits of the program for themselves and students with disabilities. Perhaps more importantly, the participants provided several insights into why little interaction occurs between high school general and special education students. The students identified physical and social segregation, differential expectations, communication differences, behavioral challenges, negative attitudes, and insufficient support as key challenges to including students with disabilities in general education social and academic settings. This study’s qualitative findings not only provided evidence of social validation for the Peer Buddy Program but also shed light on a persistent and complex problem in special education—inequitable educational and social outcomes for students with significant disabilities.
Broer, Doyle, and Giangreco (2005) interviewed 16 students with intellectual disability about their experiences working with paraprofessionals to determine the social validity of this support. Using thematic analyses, the research team identified four perspectives students held about paraprofessionals. Although some students viewed working with paraprofessionals positively, many students reported feeling lonely, disenfranchised, embarrassed, and stigmatized; thus, the authors raised concerns regarding the social validity of paraprofessional support. By conducting in-depth interviews with students, the authors were able to report rich, detailed accounts of the various relationships between students with disabilities and paraprofessionals, thereby uncovering hidden complexities regarding a special education support that is used widely and assumed to be positive.
Lyst et al. (2005) conducted an ethnographic study of teachers’ and caregivers’ perceptions of the social validity of the Check and Connect with Early Literacy (CCEL) Support. The CCEL intervention was an early reading program that included weekly literacy tutoring, progress monitoring, and relationship building between home and school. In addition to evaluating the social validity of CCEL, the researchers explored the relationship between social validity and treatment acceptability as theorized by Reimers, Wacker, and Koeppl (1987). Eleven caregivers and six teachers participated in focus groups, in-person interviews, and interviews over the phone. The research team coded data deductively and inductively which resulted in a five-factor model that replicated and expanded the model proposed by Reimers et al. The model’s factors included reasonableness, understanding, disruption, effectiveness, and ecological characteristics. The authors concluded that high levels of reasonableness and effectiveness coupled with low levels of disruption resulted in high social validity ratings. This study highlights another merit of qualitative research—its ability to advance theory. In this case, theory about social validity.
In-depth qualitative studies such as the four described above are atypical in the social validity and intervention literature base. Given that social validity emerged from the field of behavioral sciences with strong roots in positivism, qualitative research may be discounted based on ontological and epistemological differences. Moreover, qualitative research is often thought to be unfeasible in terms of time and resources, particularly when working with large sample sizes.
Although qualitative methods are not used extensively in intervention research, such methods are not entirely absent either. Particularly in recent years, scholars increasingly use mixed-method designs in which social validity data are collected from interviews with teachers, students with disabilities, their family members or peers typically post-intervention. Such study designs span disability types, service delivery models, grade levels, and content areas. See for example, (a) Carter, Moss, Hoffman, Chung, and Sisco’s (2011) study of peer support arrangements for secondary students with low-incidence disabilities in inclusive classrooms; (b) Kamps et al.’s (1998) study of elementary students’ perceptions of social skill groups for students with autism; or (c) Danoff, Harris, and Graham’s (1993) study of strategy instruction in writing for fourth and fifth graders with and without learning disabilities. A few other studies made use of focus groups as opposed to individual interviews (e.g., Jimenez, Browder, Spooner, & Dibiase, 2012) and an even smaller number collected social validity data at multiple points in time (e.g., Bouck, Bassette, Taber-Doughty, Flanagan, & Szwed, 2009).
The emergence of qualitative methods in the aforementioned studies is promising but limited in scope. Qualitative methods can offer many benefits as researchers engage in research on the social validity of interventions. The aim of this article is to (a) illustrate the ways qualitative methods can be differently yet equally valuable to researchers and (b) examine the construct of social validity when situated in secondary special education contexts. To accomplish these aims, I present findings from a qualitative research study that examined teachers’ perceptions of the social validity of the System 44® reading intervention (Scholastic Incorporated, 2009b). The research questions were as follows: (a) To what extent do teachers perceive System 44® to be socially valid? and (b) How do teachers who implement System 44® reach decisions about its social validity?
System 44® Intervention
System 44® is designed to address the decoding needs of older struggling readers who are unable to succeed in Scholastic’s Read 180® program (Scholastic Incorporated, 2005). System 44® and Read 180® are designed to address an area of critical concern and national significance—the poor reading achievement of many secondary students. The most recent data from the National Center for Education Statistics (NCES, 2011) indicate that roughly a quarter of the nation’s eighth graders are reading below a basic level while 62% of eighth graders with disabilities are reading below a basic level.
System 44® follows the tenets of direct instruction to teach students the 44 phonemes in the English language. The program includes three primary components: computer-based instruction, small-group instruction, and independent reading. The computer component directly teaches the 44 English phonemes and sight words using adaptive technology, which adjusts instruction based on individual students’ progress. In small-group instruction, teachers use manipulatives and workbooks to support students as they practice decoding and spelling. Finally, during independent reading, students practice their oral reading fluency in connected texts by reading high interest, low readability books and listening to companion CDs. The program is best implemented in a 90-min instructional block but can be modified to fit into 60 min. A daily lesson begins with the teacher delivering 5 to 10 min of whole group instruction followed by students rotating among computer, small group, and independent reading stations.
To enter System 44®, students take the Scholastic Phonics Inventory™ (SPI; Scholastic Incorporated, 2009a). Based on students’ SPI scores, the program determines which skills they are missing and creates an individualized program plan for every student. As students progress through their designated skill modules, the computer collects progress-monitoring data and provides a series of reports to help teachers group students for differentiated small-group instruction. Once students have completed all of their assigned skill modules successfully, they graduate out of System 44®, ostensibly moving onto Read 180®.
Method
This study is part of a larger collaborative endeavor with a large mid-Western school district that aimed to increase the reading achievement of its middle and high school students. District administrators asked me to assist them in a pilot study of System 44® to determine its effectiveness when implemented in the district. In this larger evaluation study, the research team, which included me and a cadre of my graduate and undergraduate students, used a quasi-experimental mixed-method design to investigate the effect of System 44® on students’ reading achievement and engagement. We also examined the influence of the program on teachers’ knowledge, beliefs, and practices over the course of one school year. As part of our evaluation, we also examined the social validity of the program. The research team’s role was that of external evaluator. We collected and analyzed the evaluation data and then presented our findings to district administrators who took the information into account in making decisions about scaling up the district’s implementation of System 44®.
Although a lengthy discussion of the quasi-experimental study of students’ reading achievement is beyond the scope of the present article, a critical aspect of intervention research is determining whether the intervention results in desired outcomes (in this case, improvement in students’ decoding skills). Determining the social validity of an intervention without also determining its effectiveness is misguided in intervention research. Thus, it is important to note that results from the quasi-experimental study indicated there were significant differences in decoding and sight word recognition between students who received System 44® instruction and students who received business-as-usual reading instruction, with System 44® students outperforming the comparison students (Leko & Roberts, 2013).
Positionality
In qualitative research, it is important for researchers to acknowledge their stance in relation to the research study as this enhances credibility (Patton, 2002). I approached this study from a constructivist perspective. Tenets of this perspective are that there are multiple realities that are socially constructed (Denzin & Lincoln, 2005; Glesne, 1999). A researcher’s task is to understand and interpret participants’ construction of the world (Glesne, 1999).
As my role was to interpret participants’ constructions of social validity, it is important for me to present my background and potential biases. I am a White, monolingual female faculty member at a large mid-Western public university. Prior to becoming a faculty member, I was a classroom teacher with specific expertise in reading instruction for secondary students with disabilities. I believe high-quality reading instruction for secondary students with disabilities should be age appropriate, research-based, and aligned with guidelines from the National Reading Panel (NRP; 2000) and the Reading Next Report (Biancarosa & Snow, 2006). Briefly, such recommendations include (a) explicit and systematic instruction in the five areas of reading; (b) collaborative learning opportunities; (c) the incorporation of technology; (d) high-quality professional development for teachers; (e) intensive, individualized interventions; and (f) the use of diverse texts. When conducting intervention research, I believe intervention agents (in this case, teachers) are critical to the success of the intervention and its evaluation; thus, I believe an important responsibility of researchers is to partner with teachers and solicit their opinions throughout a research study. This stems from my experiences as a classroom teacher when I was often “handed” new programs or innovations and expected to make them work with my particular group of students without much follow-up guidance or support.
Setting
This mid-Western district serves more than 24,000 students enrolled in Grades Pre-K–12 (WINSS, 2011). Approximately 45% of the students come from culturally and linguistically diverse backgrounds, 16% are identified with a disability, and 49% come from families who are economically disadvantaged (WINSS, 2011). District administrators selected four middle schools to pilot System 44®. These schools were selected because they had (a) similar demographic and contextual characteristics, (b) administrators willing to devote the necessary resources to the pilot, and (c) veteran teachers with experience in reading interventions. Table 1 provides information on these schools.
Teacher, School, and Classroom Characteristics.
Note. Years Exp. = years of teaching experience; FRL = free/reduced lunch rate; CLD = culturally/linguistically diverse; SWD = students with disabilities; SR = struggling reader; ELL = English language learner; LA = Language Arts.
Master’s degree in reading.
Participants
For the present study, I chose to focus on teachers’ perceptions of the program as they had the most intimate working knowledge of the program during the pilot study. The teachers implementing the program were Ben, Jill, Monica, Pam, and Stacy (all pseudonyms), and they had been teaching for between 10 and 36 years in various roles including special education teachers, reading intervention teachers, and language arts teachers. Jill and Pam were certified in special education, and Monica and Ben had master’s degrees in reading/literacy. Table 1 provides more information on these teachers and their classrooms.
Data Sources
After obtaining participants’ informed consent, members of the research team conducted three semi-structured interviews and three classroom observations with each teacher at the beginning, middle, and end of the school year.
Interviews
Interviews were designed to understand (a) teachers’ perceptions of the program, (b) challenges they encountered implementing the program, and (c) their evaluation of students’ progress. Illustrative examples of interview questions include (a) What do you consider to be strengths of System 44®? (b) What has been challenging about implementing System 44®? (c) How have you adapted the program? and (d) How have students benefitted from the program? Interviews lasted between 45 and 80 min, were audio recorded, and transcribed verbatim.
Observations
Members of the research team went in pairs and took detailed field notes on a laptop computer. Observers attended to a number of details about the classroom generally as well as teachers’ implementation of System 44® including (a) physical set up of the classroom, (b) timing of various classroom activities, (c) aspects of System 44® teachers implemented, (d) students’ reaction to various System 44® instructional tasks, and (e) teacher and student interaction with System 44® materials and resources. The observations spanned the length of the class period, generally 50 min. Following each observation, the observers debriefed and compared field notes to discuss their interpretations, make note of important events, and record topics to be further explored during interviews. They also completed the In-Classroom Observation and Action Plan tool—an informal measure developed by Scholastic that assesses teachers’ implementation of System 44®. The measure includes 20 items focused on classroom setup and management, instructional support, and data interpretation that are rated using a 4-point Likert-type scale with a score of 1 indicating novice, 2 apprentice, 3 practitioner, and 4 expert. Individual item scores are summed to provide an overall rating of teachers’ implementation. For this study, the tool was most important in judging whether teachers’ perceptions of the program’s social validity were based on implementing it as it was intended. Finally, whenever available, we collected artifacts from teachers including lesson plans, assessment printouts, and student work.
Data Analysis
Although data collection was a team effort, as the principal investigator of the program evaluation, I was chiefly responsible for analysis and used methods espoused by Miles and Huberman (1994). Miles and Huberman propose a three-step method including (a) data reduction, (b) data display, and (c) drawing conclusions. During data reduction, I engaged in two levels of coding. First, I assigned descriptive or first-level codes to the interview data inductively and deductively using the Wolf (1978) framework. For the observation and artifact data, I memoed as a means of triangulation. In the second round of coding, pattern coding, I grouped descriptive codes into more abstract, interpretative units and looked for themes. While conducting data reduction I also engaged in the second step of analysis—data display, whereby I used tables, charts, and figures to help me make sense of the patterns. A key component of this step was for me to graphically represent the interpretative codes and their interrelationships. For example, displaying the theme of “goals” was comprised of two sub-units: micro- and macro-goals. Finally, I set forth key conclusions and then verified them across the five participants, making note of exceptions or negative evidence.
Promoting Credibility and Trustworthiness
Members of the research team and I took several steps to promote credibility and trustworthiness. While conducting interviews, we embedded member checks—instances in which we verified our understandings and interpretations with the teachers. We also engaged in peer debriefing following observations and interviews. Drawing from multiple data sources across time enabled us to triangulate data. During data analysis, I remained reflexive and actively searched for exceptions and negative evidence in the data. Finally, I presented the social validity findings to other research team members to obtain their feedback.
Findings
The ways in which the teachers evaluated the social validity of the System 44® intervention was multifaceted—evaluating the goals, procedures, and outcomes according to multiple components. In the end, all of the teachers expressed a strong desire to continue to use the program in the future. Of most importance to this decision was that the program goals and outcomes served the needs of their population of older students. In fact, although teachers viewed aspects of the intervention procedures negatively, they overlooked this in favor of wanting to sustain the program. In the following sections, I describe the social validity components and relationships among them as identified by the teachers.
Goals
Teachers evaluated the System 44® program according to two categories of goals—macro and micro. At the macro-level, System 44® is designed to help older struggling readers improve their reading skills, and all of the teachers believed this to be an important goal. Pam said System 44® “is kind of the missing link” because middle schoolers who are nonreaders are “outliers because we have this assumption that they come to school with at least some reading skills and a couple of kids we saw in the sixth grade class, really, not so much” (Interview [Int.] 3). Similarly, Jill felt the program was useful because, “it’s hard to find programs that meet the needs of these kiddos that are in middle school and not reading” (Int. 1).
At the micro-level, the program specifically emphasizes decoding by helping students learn the 44 phonemes of the English language. Such skill-based instruction aligned with the teachers’ beliefs about quality reading instruction for students. Moreover, the teachers thought this was the most important attribute of the program, as they did not feel they could provide such instruction to students without the program. According to Jill, “it’s hard in middle school to work on phonics, but I think that’s what these kids need, and I can say that I wouldn’t have been able to provide that without the program” (Int. 2). Similarly, Stacy said, “the fact that it’s breaking it down to real basic skill levels is very helpful because it’s hard to find something that could interest middle school kids and still cover the basic skills” (Int. 2). Despite teachers’ overall support of the micro-level goal, Ben and Pam were adamant the program should just be one part of a larger picture of reading instruction. Ben talked about how the program does not emphasize comprehension and to provide a comprehensive approach to reading instruction, he “works with students in a more holistic way” using other techniques like reading workshop (Int. 2). Pam perhaps said it best: “it’s [System 44®] not the answer to everything in the whole world but it’s a huge tool on our workbench so to speak” (Int. 3).
Outcomes
Although the primary intended outcome of System 44® was for students to become more skilled decoders, the teachers talked about additional outcomes—both intended and unintended—that resulted from implementing the program: instructional quality, student achievement, socio-emotional development, student engagement, and stigmatization.
Instructional quality
Teachers believed their instructional quality improved by using the program. All spoke about how the computer program provided the individualized, direct instruction students needed—instruction that would have been otherwise impossible for the teachers to deliver by themselves given the wide range of student skill levels within a class. Monica relayed how with so many students needing direct instruction for specific phonemes, she “can’t sit and work with having them [students] say the sounds,” but the computer can (Int. 2). With the computer doing the bulk of the direct instruction, teachers are freed up to work individually with students—something that Ben commented was “great” (Int. 2). Moreover, some of the teachers acknowledged that without the program, they would not have known how to provide age-appropriate instruction to the older students they taught, especially given the large amount of repetition that students needed. Both Pam and Jill talked about how repetition is so important for their students with disabilities with Pam expanding further by stating, I like that the computer obviously is more consistent. I always tell the kids, the computer is a meaner teacher than I am because they complain often, “it’s making me do it again!” and I said, “Yeah, the computer is meaner than me.” The computer makes you do it until you’re practically perfect, whereas sometimes teachers are like alright, move on . . . the computer is probably more consistent than most human beings are.
Although as a group the teachers thought the program was essential in providing high-quality reading instruction, Ben and Pam noted a drawback of relying so much on the computer—“the lack of personal touch” (Ben, Int. 1). Ben went on to discuss how the computer cannot read students’ body language whereas a teacher can sense a student’s hesitation or pick up on partial knowledge the student may have (Int. 1). Similarly, Pam did not want to lose sight of the “humanity” aspect and reconciled her concern by stating, “System 44® will give them the skills, I’m hoping that I’m the one that infuses the confidence” (Int. 1).
Student achievement
Based on informal and formal assessments, teachers discussed how students’ reading achievement had improved because of the program. Based on formal measures including the SPI and the Scholastic Reading Inventory™ (SRI; Scholastic Incorporated, 2000), the teachers spoke about how their students’ scores had increased over the course of the year. Some of the students even graduated out of System 44® and moved on to the partner program Read 180®. Informally, teachers collected data based on running records and having students read aloud to them. Several teachers, however, noted that at times the program moved too slowly, thus obstructing students’ progress. Ben described situations in which students only missed one question on a computer assessment, but had to complete the entire module over again explaining, and it’s frustrating to kids because they’ll keep track and they’ll have one mistake and it will hold them back. And it’s like wait a minute, you might have just clicked on the wrong thing. You might have moved your mouse down too far. And we’ve all made those mistakes. So if it has to be perfect to move you along, that’s frustrating. That doesn’t fit into my philosophy of if the kid knows 95% of it, that’s fluent, let’s move on. (Int. 3)
We witnessed this during our first observation in Ben’s classroom. One student who had completed a computer module was confused as to why she had to start the module from the beginning. Ben explained that she must have made a mistake. The student became upset and essentially shut down for a several minutes. Pam talked about it in terms of “fatigue factor” with the computer, but tried to put a positive spin on the situation by equating the computer with video games. She told students, “you know when you’re doing a video game and you’ll do the same thing over and over again until you get your best score? That’s a little bit how this computer works” (Int. 2).
Socio-emotional development
Although the teachers acknowledged the primary goal of System 44® was to help students acquire phonics knowledge, they were more excited about the positive socio-emotional changes they witnessed in their students. All five teachers talked about how their students were becoming more motivated readers. Monica described her students as “owning it [reading]” whereas Pam was ecstatic to note that one of her especially reluctant readers was finally taking some reading risks in front of his peers, and Stacy said her students were “blossoming.” The students were beginning to see themselves as competent readers. Pam said, kids have finally figured out that if they are consistent and worked hard that they get bumped up, they test out of a level and it’s really kind of sweet. They actually cheer and carry on and go crazy because they see that as a huge success. (Int. 3)
Observation data reinforce Pam’s statements. During one observation, she asked whether someone would volunteer to read aloud from one of the workbooks. We watched as students enthusiastically waved their hands and shouted out hoping that she would pick them.
Student engagement
The final outcome teachers reported having improved was students’ overall engagement in instruction. Jill said students “buy into it [the program]” (Int. 1). The teachers believed the variety of activities and the incorporation of technology kept students engaged in reading instruction. Pam described the “electronic buzz” of the program with the computer, headphones, micro-phones, and CD players as being exciting for students. During our first observation in her classroom, Pam asked students what they liked about the program. One student responded the computer and another said the videos that are part of the computer’s instruction. Observational data from several teachers’ classrooms indicated students were reluctant to leave the computers and often asked teachers whether they could stay on longer.
Most notably, teachers noticed how the students were motivated to track their progress on the computer. Pam said, the feedback they [students] get from the computer telling them, watching those numbers, besides teaching kids metacognition to have them understand what the data means to them. I don’t think they totally understand it but in the instance of bumping up it really is powerful and I’m not sure in my old fashioned teaching days that we did enough of that. They got a grade, they got comments. We edited, we had little conferences but including them in the data, if you will, is another powerful way to help kids invest in what they’re learning. (Int. 3)
The students also enjoyed the high-interest books, and teachers described these books as “phenomenal,” “awesome,” and “relevant” with topics and pictures that piqued the interests of older learners. Monica said, I do also really enjoy those System 44® books. All the kids like them. They’re engaging. Kids will reread them without any problem. Kids will take a test on them, pass the test, and then a couple of months later pick up the books and read them again. That’s phenomenal. (Int. 2)
Students were not, however, always engaged in instruction. One aspect of the program that led to students disengaging was the built in repetition and teaching to mastery. Students would express discontent by sighing, putting their head down, or getting up out of their chairs when they were forced to repeat a module or when the computer functioned too slowly. The lack of direct teacher supervision could also lead to students being off task. With two of the three rotations being independent (computer and independent reading), there were many opportunities for students to get distracted or misbehave if there was not a strong behavior management plan in place. All of the teachers discussed how students must be able to work independently at certain times for the program to be successful. Monica expanded by talking about how this is particularly difficult when working with middle schoolers who may exhibit immaturity and need a lot of redirecting. Observational data confirmed these assertions. For example, during our second observation in Jill’s classroom, a student was supposed to be reading a high-interest book independently. Instead of reading the book, the student flipped through the pages quickly and announced she was done. Jill had to stop work with her small group of students to reinforce that the student should be reading the book and needed to follow directions.
Stigmatization
One negative and unintended outcome was the stigmatization some students experienced. The teachers who taught System 44® in the same class as Read 180® had to work hard to ensure that System 44® students did not feel self-conscious or stigmatized because they were not in the more advanced Read 180® program. Ben said, and then it becomes hard again, if I’m working with magnetic letters over in System 44® in a small group and the Read 180 kids are looking at that going “Really?” That becomes another way to stigmatize those kids. I’m stupid. I’m dumb. We’ve had a hard enough trouble getting [across] that this room is not for stupid students. (Int. 1)
Monica described how she put the consumable worksheets in unmarked binders so it was difficult for students to know who was in System 44® and who was in Read 180®. Pam, however, felt differently. Unlike Ben, Monica, and Stacy, Pam did not teach System 44® and Read 180® in the same class period and felt the high-tech equipment that is part of System 44® actually decreased the stigma that students in her class experienced. She said, What thrills me about teaching System 44® is that other kids peek in the room and they get all excited by the equipment they see. I had a kid show up at my door last week saying, “Why can’t I be in this class?” (Int.1)
Procedures
When evaluating the System 44® procedures, teachers focused primarily on the feasibility of implementing the program in relation to their school contexts and available resources. In some ways, they believed the program expedited their work. At the same time, they reported it could be overwhelming and difficult to navigate, especially in the first year of implementation. Such findings align with the results of the implementation fidelity ratings—ratings that are reported in-depth by Leko, Roberts, and Pek (2013). In general, the teachers implemented the program with a high degree of fidelity. Ben and Monica consistently received ratings at the expert level, whereas Stacy and Jill were at the practitioner level. Pam’s ratings were the lowest at the apprentice level. As described in the following sections, when teachers implemented the program in ways that were not intended, it was usually due to factors outside their control or because they were navigating the “learning curve” that accompanies initial implementation of a complex, multifaceted intervention.
Planning
The teachers presented mixed views about the feasibility of the planning and preparation required by the program. On the positive side, teachers appreciated the varied reports they could run on students’ progress. Such reports allowed them to “keep a finger on the heartbeat of every student” according to Monica (Int. 1) and pinpoint “when a student is struggling” (Stacy, Int. 1). The reports indicated skills students needed more instruction in and suggested potential ways to group students and organize instruction. Running the reports and analyzing the data, however, was time-consuming and often left teachers feeling overwhelmed about which reports were most useful or necessary. Stacy said, “there’s a lot of valuable data we can pull up, but finding time to read it all and put it into practice using it to design my lessons is a challenge” (Int. 1).
Although the teachers talked about how it was great that the program “tells you what to teach” (Ben, Int. 1) and provides “a research-based developmentally planned program” (Pam, Int. 2), they described the requisite preparation as complicated. Ben summed the teachers’ sentiments when he said, “the prep is crazy.”
Instruction
When discussing the feasibility of providing instruction using System 44®, teachers all reported that it would take them a couple of years to learn and use most effectively. Monica said, “there are a lot of different pieces to the System 44® . . . It’s almost too much to digest in one year” (Int. 1). Teachers mentioned several times they hoped to have the program the following school year, because they believed by then, they would be expert implementers and would be able to use all of the materials as intended. Pam said, I figure next year, after I’ve cycled through the program, my learning curve will have lessened a little and I’ll be more familiar with the materials . . . I think the first year you teach anything, it’s not necessarily your best. (Int. 1)
During the study, teachers talked about how they had difficulty following the program as specified because it was so complicated, and they were low on time and personnel. As a result, teachers selected certain aspects to implement first while leaving other components aside. For example, Jill said, one of the struggles has just been getting familiar with the program and I know that I’m not implementing the program as it should be currently and so just trying to learn how to make everything work has been a challenge. (Int. 1)
Based on the observational data, like Jill, other teachers had to make decisions about how to make everything work. In our observations, few of the teachers were able to fit the three rotations daily and so they divided the three components over 2 or 3 days.
Teachers also talked about the compromises they had to make because of lack of personnel. All teachers said the program would be best implemented with multiple adults in the classroom. Some teachers were assigned special education assistants (SEAs) who were helpful in implementing the program. Other teachers were resourceful and recruited volunteers or worked with university practicum students. Without such extra help, teachers had trouble managing small-group instruction, monitoring students’ progress on the computers, and listening to student read individually. On top of this, teachers had to navigate how to use the program within the confines of their schools’ schedules. When asked what she disliked about the program, Stacy replied, “the challenges with managing time. The programs were designed for 90 minutes classes and we have a 45-minute class. So managing time is difficult” (Int. 1).
Most difficult for teachers was implementing combined classes of System 44® and Read 180®. Although the two programs were designed to be implemented together, all of the teachers who were asked to do so reported how this was nearly impossible. Ben said, one thing is my concern that we’re trying to mash the System 44® and Read 180 in one classroom. That becomes very hard to do. It becomes confusing for the kids, one set of kids is on System 44® and one is on Read 180 . . . and the different philosophies. Where Read 180, I’m doing most of the teaching where System 44® is where the computer is doing most of the teaching and I’m supporting. Then just the planning. You have tons of prep just for those two classes. (Int. 1)
Although teachers viewed System 44®’s incorporation of technology as an incentive for students, they noted that from an instructional standpoint, it could cause major complications. Technology glitches commonly arose, such as when micro-phones, earphones, or CD players were broken, and these technology-related issues could bring instruction to a halt. Even more disruptive was when a computer froze or when the school’s server was down, because the computer was the primary instructional component. When asked what she disliked about the program, Pam responded, “it’s so reliant on the computer. Yesterday when we had a bad server day, it kind of throws kids off their groove” (Int. 1). A common occurrence we observed was teachers having to stop their instruction with a small group to troubleshoot technology, which resulted in a breakdown of instruction and an elevation of problem behaviors. Our third observation in Stacy’s class shed light on how disruptive technology problems could be. In her 50-min class period, we counted eight different instances of technology problems either with the computer, headphones, or CD players. Each time Stacy or one of the SEAs had to stop instruction with other students to solve the technology issue. Such disruptions were observed at one time or another in all five of the teachers’ classrooms.
Assessment
The bulk of the student assessments in System 44® is computerized and done automatically as students progressed through the skill levels. This aspect was helpful to teachers who knew that a hallmark of good instruction is continual progress monitoring but who often felt pressed for time when trying to administer such frequent assessments. As reported earlier, teachers also liked the various data that the System 44® assessments generated, but to run and interpret all of the possible reports seemed unfeasible because of the time it took. As part of the independent reading component, students must read the high-interest texts multiple times aloud, but teachers voiced frustrations about not having enough time to conduct these informal assessments for students individually multiple times. According to Stacy, “the one thing that’s hardest to implement is listening to them read individually and that’s due to the size of this class and the fact that it’s a blend of Read 180 and System 44®” (Int. 2).
Materials
The aspect of the program teachers emphasized the most was the instructional materials. As mentioned previously, teachers reported that the high-interest books and computer engaged and motivated students, but the teachers also appreciated the teaching guides and consumable workbooks. Jill, who taught students with more significant disabilities, was grateful to have resources that would meet the needs of her students. In fact, all of the teachers wanted Scholastic to produce more of the high-interest books as so many of their students, even ones who were not in System 44®, enjoyed reading them.
All of the materials combined, however, did not always align well. Monica said, “they [materials] don’t jive as smoothly as I’d like them to” (Int. 1). Pam talked in more specific detail when she described, “it’s aligning their print work with where they are on the computer and that does not go perfect” (Int. 1). Observation data provide more detail. We observed some students progressing through the computer and high-interest books faster than teachers could keep up with in the small groups given the limited amount of time they could spend with each small group. Another aspect some teachers mentioned was that the materials could be too complex both in their layout and text level. Monica talked about how the program’s self-monitoring sheet had a lot of tiny print and the words used on it were at reading levels that were too high for some students. In her opinion, the program developers “overshot” and made some of the materials and program components more complicated than necessary. Ben felt similarly as he observed some of his students having difficulty completing tasks because the directions used language that was too sophisticated.
Discussion
Results of this qualitative study expand Wolf’s (1978) framework by applying it to an academic intervention and delineating sub-components of the social validity construct. When assessing the social validity of academic interventions for secondary students, it is important to evaluate (a) macro- and micro-goals; (b) procedures for planning, delivering, and assessing instruction; (c) intervention materials; and (d) outcomes related to instructional quality, stigmatization, and students’ achievement, socio-emotional development, and engagement.
Based on these sub-components, teachers perceived the System 44® goals and outcomes to have a high degree of social validity. The social validity of the procedures, however, was lower as teachers reported several aspects of the program to be unfeasible. Despite this, all of the teachers expressed a strong desire to continue implementing the program. I believe this can be attributed to three things. First, teachers believed the benefits to students outweighed the implementation difficulties. Second, they perceived the implementation difficulties to be temporary, and the answer to such difficulties was time—more time to use the program and become comfortable with it. Third, the thought of trying to provide basic phonics instruction to older struggling readers without System 44® seemed more aversive than overcoming implementation difficulties posed by the program. This finding, in particular, is not surprising as secondary teachers report feeling ill-prepared to deliver basic reading instruction (Greenleaf, Schoenbach, Cziko, & Mueller, 2001) especially considering the dearth of age-appropriate materials available for such instruction (Rog & Burton, 2001). These findings indicate teachers do not make decisions about social validity based solely on whether they simply like or dislike an intervention. Nor do they evaluate the social validity of academic interventions based only on whether the intervention leads to improved student academic achievement. For teachers working with secondary students, it seems particularly important to teachers that interventions address a broader range of outcomes including those related to students’ affective needs. Teachers’ evaluation process is much more complicated, somewhat resembling a cost-benefit analysis.
The identification of the aforementioned sub-components and teacher evaluation process corroborates other scholars’ assertions that social validity is not a static and finite construct but is much more complex than is reflected in intervention literature (Finn & Sladeczek, 2001). Thus, continued research on the construct of social validity is warranted. Not only will such research help scholars in the field better understand and measure the social validity of interventions—a quality indicator of intervention research (Horner et al., 2005)—but also it may prove to be a critical element in several important and longstanding discussions in special education including the research-to-practice gap, long-term sustainability of interventions, and implementation fidelity.
There are several unanswered questions and directions for future research based on this study’s findings. For example, in what ways would the themes I presented be similar or different depending on various characteristics of the intervention, students, and teachers? Would there be differences between an intervention like System 44®, which is considered to be a curricular program and something like collaborative strategic reading (CSR; Klingner, Vaughn, & Schumm, 1998), which is a specific strategy designed to be used across curricula and texts? What if students were in elementary school where ostensibly, the stressors of adolescence are not as prevalent, or what if they were in high school? Finally, how would the results be different if teachers implementing System 44® were novices as opposed to teachers with extensive classroom experience? It would also be important to conduct more longitudinal research to understand how participants’ perceptions of social validity change over time. A limitation of the present study is that I did not follow the teachers into their second year of implementation to see whether their evaluation of the program procedures improved. Last, how would the findings be different if students and their parents were included in the study? Although I was able to infer how students perceived System 44® based on teacher interviews and classroom observations, not interviewing students, their parents, and other stakeholders is another study limitation.
The study I presented is just one of many possible designs. Conducting observations and interviews with all study participants over the course of a year is probably unfeasible in the eyes of researchers who work with large sample sizes. In this instance, conducting in-depth interviews and observations with a few key informants might be a more effective and economical design. Key informants could be selected because they are representative of the larger sample or because it has been determined that they have particular characteristics of interest. For example, if researchers are studying the effectiveness of CSR (Klingner et al., 1998) when implemented in middle school science classrooms, researchers could evaluate the intervention’s social validity by selecting two teachers from each grade level to interview and observe. This would keep the sample size small to enhance feasibility but would provide a closer look at how the intervention functions across the grade levels represented in the overall sample. Other criteria for selecting key informants could be teachers’ or parents’ level of intervention fidelity (i.e., expert vs. novice implementers) or changes in student achievement or behavior.
Focus groups, such as those used by Copeland et al. (2004) or Gerber and Popp (1999), present another alternative for understanding individuals’ perceptions of social validity. Focus groups can be particularly useful when studying social validity because they (a) document individuals’ shared experiences, (b) capture multiple perspectives, and (c) may make students more comfortable than individual interviews (Brotherson & Goldstein, 1992; Eder & Fingerson, 2003). Reflective journals completed by intervention agents are yet another alternative to individual interviews. Using this method, participants are asked to keep a written log of their experiences and perceptions so that researchers can later conduct content analysis.
Qualitative methods could also be used early in a study to inform the development of summative quantitative rating scales. By interviewing teachers and conducting observations soon after an intervention is implemented, researchers can tailor post-intervention surveys and rating scales to include items of interest that resulted from the qualitative analyses.
In closing, my thoughts return to the broader intervention literature base and the contributions qualitative methods can make to it. My argument is not that intervention researchers should abandon quantitative measures to evaluate social validity. Rather, my goal was to illustrate how qualitative methods can be differently informative yet equally valuable and should be an integral part of researchers’ repertoire of tools. The use of qualitative methods to examine the social validity of interventions is particularly important in special education research, because as a field, we are grounded in individualized educational approaches. Therefore, we need tools that can capture and accommodate outcomes that may vary widely from one student to another. Qualitative methods can “show the human faces behind the numbers . . . and provide critical context when interpreting statistical outcomes as well as make sure that the numbers can be understood as representing meaningful changes in the lives of real people” (Patton, 2002, p. 152). To understand and evaluate social validity comprehensively, the best research designs most likely do not reside in “either or” camps but capitalize on the strengths of both quantitative and qualitative traditions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Wisconsin Alumni Research Foundation (WARF; Grant Number PRJ35IJ).
