Abstract
The study reported in this article examines how teachers read and respond to their students’ Stanford Achievement Test 10 (SAT 10) scores with the goal of investigating the assumption that data-based teaching practice is more “objective” and less susceptible to divergent teacher interpretation. The study uses reader response theory to frame teachers’ responses to their students’ SAT 10 test scores as interpretations of a text shaped through unexamined assumptions and political interests related to accountability, rather than strictly statistical “official” interpretations of “objective” data. The findings illustrate that teachers’ interpretations of SAT 10 data that inform their data-based practice are vulnerable to the pervasive influence of local school responses to accountability pressures. More specifically, the findings reveal how moral and discursive texts imbricated in accountability discourses mediate the ways in which teachers read and respond to their students’ SAT 10 scores.
Over the past 30 years accountability in education has gained international currency, holding educators responsible for student achievement primarily as measured by regularly administered standardized tests (Riffert, 2005). In the United States, as a focus of emphasis of the No Child Left Behind Act (NCLB; 2002) and the more recent program of school reform, Race to the Top (2009), standardized testing for accountability purposes exerts weighty influence in schooling contexts from school district central offices to the smaller settings of thousands of classrooms. Standardized test scores as “the coin of the realm in public education in the United States” (Haladyna, Nolen, & Haas, 1991, p. 2), bring widespread ramifications not only for schools but also for communities and businesses:
Student achievement scores are now the barometer of student, teacher, principal, school, and district effectiveness. Student performance on standardized tests also affects the community, business and industry, real estate values and the overall vitality of a state and community. (Shen & Cooley, 2008, p. 319)
The emphasis on test scores as indicators of student achievement, school effectiveness, and community prosperity brings with it increased efforts and urgencies to improve those scores, especially in schools that struggle to meet state-determined goals. Based on the assumption that student test data and accountability requirements bring about changes in teaching practice that lead to higher test scores and better student achievement, policy makers, lawmakers, and educators have directed attention to looking at how to use data (primarily test scores) more effectively to inform data-based teaching practice. More specifically, school administrators and teachers have been encouraged to use test score data to implement instructional practices based on what Ingram, Louis, and Schroeder (2004) identify as the “unexamined assumption that external data and accountability systems will lead to positive change in the daily interaction between teachers and students” (p. 1258), which results in higher test scores. Toward this end, districts and schools have instituted professional development programs in which teachers are asked to examine test score data and make decisions about students and about instruction.
But when measures of and decisions about progress in student learning, teacher quality, and school effectiveness rely primarily on improved test scores, it seems inevitable that attention will narrow to using test score data strategically and superficially to raise test scores rather than address more complex issues related to student learning, educational resources, and social inequities (Haladyna, 2006; Schildkamp & Kuiper, 2009). Such narrow interpretations of test score data lead to the appropriation of educational strategies and practices that have been called “educational triage” (Booher-Jennings, 2006; Gillborn & Youdell, 2000). In professional development, this can lead to a focus on teacher learning about and analyzing data that precludes other learning that might otherwise be appropriate. At the classroom level, this strategic use of test data in the name of data-based practice can constrain teaching and learning to a form of “target practice” that has troubling repercussions not only for students but also for teachers, as suggested in the study reported in this article. This project examined how teachers read and respond to their students’ Stanford Achievement Test 10 (SAT 10) scores with the goal of investigating the assumption that data-based teaching practice is more “objective” and less susceptible to divergent teacher interpretation. The study used reader response theory to analyze teachers’ responses to their students’ SAT 10 test scores as interpretations of a text shaped through unexamined assumptions and political interests related to accountability, rather than strictly statistical “official” interpretations of “objective” data. The findings illustrate that teachers’ interpretations of SAT 10 data that inform their “data-based practice” are vulnerable to the pervasive influence of local school responses to accountability pressures. More specifically, the findings reveal how moral and discursive texts imbricated in accountability discourses mediate the ways in which teachers read and respond to their students’ SAT 10 scores.
SAT 10
The SAT 10 is the current version of a norm-referenced standardized measure of student achievement first written in 1926. During the years since then, it has been used to compare individual student achievement with norms established by representative national samples of student achievement from the referent group, usually by grade level. Although the SAT series had long been used to measure student achievement in various places nationwide, its use grew with the NCLB legislation mandating achievement testing in reading and math during Grades 3 to 8 and once in Grades 10 to 12 (Ding & Navarro, 2004). The federal government does not require the administration of the SAT 10 specifically. It has become more commonly used for NCLB-related accountability purposes, especially in those states that did not already have a state achievement test. It is used in the state where this study took place.
The SAT 10 instrument is offered in two forms, traditional paper/pencil or electronic, and at 13 levels from kindergarten to Grade 12. It consists of an arrangement of multiple choice, short answer, and “extended response” items that may require students to show work, write several sentences, or draw a graph (Pearson Education, 2011). Pearson, the publisher of the SAT series, offers a range of types of score reports that schools can purchase. These include four different versions of individual student reports, class reports, and reports by school, which were used by these schools. The student report displaying national percentile rankings, stanines, and subject area clusters of content and skills tested for that grade level was the version used in all of the schools in the study.
The Interpretive Context of Test Score Data
This project grew from the recognition that test score data alone “provides no judgment or interpretation, and no sustainable basis of action” (Schildkamp & Kuiper, 2009, p. 482). In order for test score data to be informative, it must be read, interpreted, and transformed into information, processes that involve making meaning of the data within the context of the teaching and learning situation. This makes the interpretation of test data vulnerable to multiple interpretations and competing interests, especially in the context of high-stakes accountability. At the classroom level, the inevitable processes mediating how teachers interpret test score data into information used for instructional practice are often overlooked by policy makers, making commonly held assumptions that data-based decision making is more “objective” and somehow free of human interpretation seem naïve. Attempts to bypass or direct teachers’ interpretations of quantitative data because of faith in the objectivity of data discount the mediated nature of teachers’ assessments of knowledge.
Even before the heavy-handed mandates of NCLB’s accountability requirements, research into the effects of standardized testing for accountability warned about attempts to separate the context in which teachers enact their knowledge of practice from numerical representations of their students’ learning outcomes (Haladyna et al., 1991; McNeil 2000; Smith, 1991). In her 1991 study of the effects of testing on teachers, Smith concluded, “numeric test scores mean little to the teachers we studied, particularly without the interpretive context that teachers alone possess” (p. 9). Her findings suggest that it is simplistic to assume that teachers separate their knowledge of students and the schooling context from their students’ test scores. In addition to any numerical indicators of statistical significance, the meanings teachers ascribe to students’ test scores reflect the teachers’ knowledge of the microcontext for testing—classroom conditions and students’ individual experiences on the days of testing, implications of poor performance for students’ futures as well as teachers’ sense of their own professional expertise and job tenure. Teachers’ knowledge of the macrocontext for testing—repercussions for poor performance at the community and state levels, and newspaper publication of test scores and school rankings—also mediates how teachers’ interpret their students’ test scores. The point here is that the processes of turning data into information inheres in teachers’ interpretive processes that are socially, culturally, historically, and professionally informed in irreducible ways that have real consequences for students as well as teachers. So, examining how teachers make meaning of and interpret student SAT 10 score data calls for a more robust theory of interpretation than that put forward by more simplistic assumptions undergirding testing policy.
Reader Response Theory
Reader response theory (Fish, 1980; Iser, 1974, 1978; Rosenblatt, 1978), also known as the transactional theory of the literary work suggested by Rosenblatt (1978), offers an approach to looking at texts and readers’ responses to texts that emphasizes the reader’s role in constructing the meaning of a text. Rosenblatt emphasized the give and take between the reader and a text as one of “Transaction . . . [that] permits emphasis on the to-and-fro, spiraling, nonlinear, continuously reciprocal influence of reader and text in the making of meaning” (p. xvi). Although reader response theory includes a variety of approaches to analyzing a reader’s relationship with a text, there is general agreement that neither the text nor the reader exerts sole authorship of a text’s meaning, but that meaning is transacted through their mutual engagement (Fish, 1980; Iser, 1974, 1978; Rosenblatt, 1978).
Iser (1974) and Fish (1980) suggest that the reader’s transaction with a text produces a “virtual text,” a reconstruction of the text as mediated through her interpretive resources. This is not restricted to literary texts, although reader response theory has been located primarily in a range of literary criticism and theory, including those influenced by feminist and poststructuralist theory (Derrida, 1974; Fish, 1980; Iser, 1974, 1978; Sheriff, 1989; Tompkins, 1980). Fish asserted emphatically, “linguistic and textual (scientific) facts are not objects of interpretation, but its products” (p. 9). Barone (2001) and others extend the use of reader response theory as an approach useful for educational research, especially that concerned with narrative (Atkinson & Rosiek, 2008, 2010; Zeek, Foote, & Walker, 2001).
Furthermore, the respective “interpretive communities” (Fish, 1980) of which readers and writers are members mediate the meaning of texts. Fish described interpretive communities as groups of people who share common experiences such as traditions, habits, practices, and attitudes that provide semiotic resources for interpretation of human activity. Specifically, these interpretive resources mediate the work of writers in constructing texts as well as that of readers in receiving, reconstructing, and appropriating responses to texts. The concept of readers and writers as members of various communities draws attention to the production of meanings for texts as communal as well as individual. So when teachers read their students’ test scores, they appropriate interpretive discourses shaped by their experiences as educators in their schools and communities characterized by social class, race, ethnicity, local history, and politics.
More specifically, the concept of interpretive communities suggests that the politics of federal and state accountability policy, prevailing educational ideologies and discourses, and the particular urgencies and status of the local school and its district mediate meanings and responses teachers construct for their students’ test scores. Local influences in particular create constraining conditions in which teachers interpret test score data following the direction of their local school and district leadership. The meanings they transact with their students’ SAT 10 scores and put into place through a variety of instructional practices produce discursive and strategic texts by which students are read as particular kinds of learners and teachers as certain kinds of teachers. This study illustrates how teachers’ responses to test score data reveal the pervasive influence of local school interests and stereotypical discourses of cultural and moral deficits related to low-income students on how they interpret test scores and make decisions based on those interpretations.
Teachers’ Interpretations of SAT 10 Test Scores: Methods and Materials
This qualitative study consisted of 18 interviews, 2 interviews with each of the nine third- through fifth-grade teachers from two different school districts who agreed to participate. The nine participants were located through connections with the university’s College of Education partnership with the three schools represented in the study. I went through official channels first requesting permission to carry out the study from the two school boards involved and then obtaining permission from the school principals, who initiated the request for participation to the teachers. I then met with those who had expressed interest in participating to explain the study and their involvement which, in addition to responding to interview questions, required that they bring de-identified class and individual student SAT 10 reports to the interviews as reference points for the interview questions. Those who chose to participate were guaranteed confidentiality for themselves and their students, and signed consent forms. Participants received transcriptions of their interviews and preliminary summaries of findings accompanied by requests for revisions or corrections.
School Demographics and Teacher Participants
The teachers worked in three different elementary schools within a county that encompasses a university-centered midsize urban area located in the southeastern region of the United States. All three schools are Title I schools in primarily working-class and economically disadvantaged communities. The schools have populations composed primarily of African American and European American students, with small percentages of Latino/a students. South, a city school, is the most homogeneous of the three schools with a majority population of 90% African American students. East and Southeast, both county schools, are located in what is commonly referred to as the urban fringe of a midsized city. A total of nine teachers participated in the study, four from South, two from East, and three from Southeast. Three African American teachers, all faculty at South, participated in the study. All of the other participants were European American (see Table 1).
School Demographics and Teacher Participants
Note: Data source—http://schooltree.org
Although all three schools have struggled in the past with making “Average Yearly Progress” (AYP), a state-determined measurement of school improvement mandated by NCLB, all three had achieved “All Clear” status for the past few years and were aiming to maintain that status. However, the possibility of failing to make AYP consistently surfaced as a subtext in teachers’ conversations during the study, as SAT 10 scores figured in the determination of AYP for the state.
Data Collection
I interviewed each of the nine teachers twice, using a semistructured interview protocol with follow-up questions (see appendix). After opening questions about the teachers’ educational background and length of teaching experience, succeeding questions fell into three broad categories: (a) using data, (b) interpreting data, and (c) significance of testing to their teaching. The interviews lasted from 45 min to an hour; they were audiotaped and transcribed. I also took field notes about the timing and conditions of each interview. The interviews all took place in each teacher’s respective classroom at the end of the school day. In each case, the teacher was currently working with the class whose scores we were discussing but had not been the students’ teacher for the previous year when they took the test. For the city school, South, and County School 1, East, the interviews took place in January, February, and March before that year’s SAT 10 testing. The interviews for County School 2, Southeast, took place in May after the SAT 10 testing had been completed and at the conclusion of the school year. (Please note that school names are pseudonyms.)
Data Analysis
The 18 transcriptions of the interviews provided the major source of data, with field notes used minimally. The unit of analysis was the teachers’ responses to the interview questions. The reader response theoretical frame informing the study suggested that readers’ responses to texts can be understood in three ways according to how readers use texts to construct meaning, including conventional, visionary, and critical (Barone, 2001; Fish, 1980; Iser, 1974). As reader response theory has only recently been extended to educational inquiry, I was interested in investigating the constraints and possibilities of these analytical categories and how they could initially map the data. Data analysis began with a deductive-like examination of the individual teachers’ interview transcripts to code elements in each that illustrated the response types.
Conventional
Readers respond to texts in conventional ways that conform to prevailing discourses, common practices, and dominant ideologies that reinforce their own beliefs and experiences. In this study, teachers’ responses that described common school practices and confirmed stereotypical discourses and deficit ideologies related to low income and student achievement were labeled as conventional. For example, comments attributing low test scores to students or their families not valuing education were labeled conventional in that they reflected deficit discourses associated with low-income students who quite frequently are also students of color (Lynn, Bacon, Totten, Bridges, & Jennings, 2010).
Visionary
Visionary responses are transacted by readers who, in Barone’s (2001) words, “pragmatize the imaginary” (p. 178); they are open to generating new ideas (the imaginary) for practical considerations (pragmatize), or think reflexively on taken for granted assumptions. Teachers’ expression of new ways to think about practice stimulated by SAT 10 results would be labeled visionary, as would new ways to think self-reflexively. For example, one teacher described how he looked at state reading achievement scores “as a reflective piece . . . on what I need to do to change my teaching.”
Critical
Critical responses, like visionary responses, involve new thinking about unexamined assumptions. The difference is that critical responses emerge from an ideological, political, or professional stance that the reader brings to the text to reveal taken-for-granted assumptions in the text. However, visionary responses illustrate a reader’s awakening to or realization of a different perspective or practical strategy suggested by the text, which may address the reader’s taken for granted assumptions.
Critical responses can take three forms. Some offer an informed and political critique of the text, drawing informally or formally on critical theory to challenge sociopolitical implications in the text’s propositions. This type is illustrated by comments such as one teacher’s observation that the SAT 10 measured not what students had learned but the “have’s and the have not’s.”
A second critical type response points to the silences, voices, and experiences not represented in the text but nonetheless present as “surplus of meaning” (Derrida, 1974) that brings to the surface unacknowledged meanings for texts by virtue of what is not said nor included. The teachers’ comments about their experiences of feeling silenced as well as their silences about the misuses of testing data exemplify this type of critical response.
A third form of critical response includes those that dispute the premise of the text; readers may offer a counter narrative of an explanation in the text or redefine terms. One teacher’s comment that the SAT 10 was basically “a test of reading,” rather than a test of multiple skills and achievement, illustrates this third type of critical response.
Inductive Analysis
I did not expect individual teachers’ transcripts to sort neatly into the suggested response types but for each one to manifest a messy and complicated mix of responses. Although the initial approach mapped out particular elements of response types within each transcript, I conducted a more inductive analysis of the coded data, grouping them by response types to support a more nuanced and robust analysis (LeCompte & Schensul, 1999).
I then completed multiple examinations of the grouped data fragments, coding again for recurring ideas, constructs, and processes—such as “targeted instruction” and “target student”—as well as repeated patterns—such as references to students by labels as “my 2,” “bump kids,” or “special ed student.” These emerging codes represented meanings associated with how the teachers understood and interpreted test scores within their school context. Working with the newly coded data within each response type, I generated categories and where needed, subcategories, based on shared meanings such as “targeted.” Then I completed axial coding by looking for relationships among categories to see which could be collapsed into larger categories (Strauss & Corbin, 1998). For example, I examined how “targeted” and “student labels” could be collapsed into “using data with students,” and then how “using data with students” expressed a conventional response to SAT 10 scores.
Identifying themes
The purpose of the study was to see if there were variations in how teachers interpreted SAT 10 data and what influences mediated those variations. Asking questions about how the larger categories and their respective response types could be connected enabled me to identify the political, professional, social, and cultural discourses the teachers appropriated in their explanations (Ely, Anzul, Friedman, Garner, & Steinmetz, 1991; Hatch, 2002). What discourses did the teachers draw on to appropriate conventional responses to student SAT 10 data in explaining their actions and decisions? Similarly, how did professional, political, and social discourses mediate the teachers’ critical responses that acknowledged the inequities between student “have’s” and “have not’s” but did not question the validity of the SAT 10 and their schools’ misuse of data? Following this process of inquiry, I derived themes and subthemes for each response type that connected the meanings, practices, and discourses present in the categories of coded data associated with each of the response types.
The analysis of the conventional responses yielded two themes across all of the schools: “target practice” and “personal responsibility.” Critical responses suggested the presence of three subthemes and an overarching theme of “constraints and contradiction.” I did not find any visionary responses across all the schools, although two teachers at East offered visionary responses suggesting a theme of “utilitarian reflection.” The critical and conventional response themes play across and through each other to uncover discursive and moral texts energized by deficit discourses and classist ideologies operating through educational accountability policies to position students and teachers as subjects in need of control and regulation in the name of equity and accountability.
Findings and Discussion
The teachers’ responses evidenced a complex divergence of interpretations of SAT 10 data not anticipated by policy makers who assume that reading “objective” data results in “objective” interpretation and information dissemination. If teachers’ interpretations of test score data were as straightforward and “objective” as is commonly assumed, the teachers would have all offered conventional responses similarly infused with statistical language related to content and/or performance standards, and unrelated to whether a child “tried” that day, or “didn’t care.” The occurrence of multiple interpretations indicates that the teachers read and constructed more complex meanings in the test scores than just numerical comparisons or indications of learning achievement; they read them within “the interpretive context” (Smith, 1991) of their knowledge of prevailing educational ideologies, local context, and their experiences with students. The following sections present findings organized by themes and the three response types: conventional, critical, and visionary. Then I discuss differences by school and individuals.
Conventional Responses: Target Practice and Personal Responsibility
The majority of the teachers’ comments evidenced awareness of the implications for achievement test scores associated with students from poverty and low-income communities who attend Title I schools. Their responses appropriated mainstream discourses of cultural deficits related to low-income students and achievement to explain low test scores. That, along with accountability interests in assigning blame and leadership pressures to raise test scores, produced a form of schooling from which I derived the theme “target practice.”
Target practice
The recurrence of the word target in various forms distinguished its meanings as a highly significant concept to the teachers across almost all of the interviews. The theme of “target” or “targeting” was used to describe strategic processes directed by the schools’ priority of maintaining their “All Clear” AYP status. I repeatedly heard phrases like “target instruction,” “target students,” and “targeted for” particular intervention strategies or special placement. The SAT 10 and the many different standardized tests taken by the students enabled an increasingly individualized and focused accumulation of data that fostered a sort of typology of students. Students whose SAT 10 scores ranged in Stanines 1 to 3 and/or were quite a bit below grade level were labeled as the “low, lows.” The “really really high” students whose scores ranged in Stanines 7 or higher were seen as students, according to two teachers from Southeast, “who can teach themselves” or “the ones who will get it anyway.” “Bump kids,” in other literature referred to as “bubble kids” (Booher-Jennings, 2006; Madaus & Russell, 2010), were those scoring in the low-average to average range who were most likely to raise or “bump up” their test scores with “targeted” instruction. The primary focus for teachers’ “data meetings” was to identify where individual students fell in these categories so as to locate the “bump kids.” A South teacher remarked, “We get those ones in the middle so we can bump those up.” As one teacher pointed out, “average kids are the ones we like because they will show growth.”
Teachers at each school reported distributing time according to these attributions to students’ potential to improve their test scores. One teacher commented that she was told “not to worry about the ‘low low’s’ or the ‘really really high’s’” but to “get the middle to bump up.” Several teachers at different schools observed that they met every day with their “bump kids” but only twice a week with the “low” and “high” kids because, as a South fourth-grade teacher put it,
[bump kids] are the ones you focus on because you want the kids in the middle to be up here at the top, . . . the middle kids who are almost there (grade-level average) . . . you focus on them more. You want them to move on up. And then ones that are at the bottom, you do the best you can because it [SAT 10] doesn’t measure progress.
A Southeast teacher commented similarly, “I pull the middle group every day . . . because those are those bump ones and that’s what we’ve been told—is to get them . . . and then to pull the lower ones and the high ones about twice a week or three times a week.”
Personal responsibility
During the interviews, the teachers at all the schools insisted that student test scores reflected student effort as well as student achievement. In response to questions about individual students’ test scores, all spoke of the importance of students caring about doing well. A fifth-grade teacher at East said, “I feel like 80% will do well and 20% will fall short whether they can do it or not because but they are not willing to try and do any better than what they are doing.” This seemed to express the commonly held meaning of “caring”—“to try and do better than they are doing”—implying motivation and effort to score higher. Explanations for students’ low scores centered around the importance of a student’s “caring” or “trying.”
Teachers attributed lack of effort to individual student apathy or laziness, usually identified as
“those students aren’t working as hard, you know, they aren’t trying their best . . . ”
“it’s because of their attitude towards this test and they didn’t really care about the test.”
“some of them don’t like school; they don’t care . . . ”
“It’s attitude for him . . . I think he can do better than what he does, if he would try.”
Some teachers saw lack of effort as the consequence of economic background or family values and circumstances:
“But when you’re in the classroom and you know this child knows this information, they just told it to you and they are sitting here just half asleep or they didn’t eat breakfast, or they are tired taking the test, . . . they don’t care, and some of our kids don’t know if they are going to have a meal when they get home, so what are they worried about bubbling in a test?”
“Some of them are trained . . . you are going to work or you are going to live off the government, you are not going to college most likely, or if you do you are going to grow up and be a wrestler, . . . ‘I am going to be a rapper,’ ‘I am going to work at Mercedes (local production plant).’”
“This child is a behavior problem and I don’t think anybody really cares about him at home. He’s my child (who) will come in with snot across his face—dried snot—and I’m just thinking, ‘did nobody look at this baby before he left?’”
“They’ve been told by people in the past or they think they can’t do it, so they give up.”
Efforts to encourage students to “take ownership” of their test scores was the tonic prescribed by the school leadership to address students’ apparent lack of effort. The recurring phrase taking ownership of their test scores underscored the deficit discourses appropriated by the teachers and school leadership that placed responsibility for low test scores on the students and their families rather than on structural inequalities or schools’ indifference. A South teacher expressed this stereotypical thinking:
I think because it is [a] lower socioeconomic school and you don’t have the parental economics that plays a big role . . . at some of my students’ houses, there are not any books to read. They don’t go to the library . . . they don’t take their textbooks home like they are supposed to. You call them if they [students] get into trouble or sick and you can’t get anybody, so I mean that this is not important to them.
The fact that these schools are Title I schools in low-income communities coupled with the schools’ ascription of deficits to students and their families reinscribes commonly held stereotypes that low-income communities do not value education (Hale, 2001; Lynn et al., 2010; Noguera, 2007). A 1st-year teacher at Southeast shared that her African American university supervisor commented that the parents of the students at the primarily African American low-income school where she did her student teaching “don’t care . . . because they don’t want their children to do better than they did.” Both comments speak to the racialized overtones of deficit discourses associated with low-income students, who are disproportionately students of color. This was no more dramatically symbolized and embodied than by the practice of “coloring in the bars.”
“Coloring in the bars.” Teachers at all three schools described similar practices intended to guide students to “take ownership” of their SAT 10 scores. In meetings with each class and its teacher, the counselor/coordinator distributed blank grids resembling bar graphs similar to those displayed in SAT 10 student reports. Each student was also given his individual SAT 10 score report. Students were directed to use markers to color code their percentile ranks and stanines in each of the content and skills areas on the blank grid. Students colored below-grade-level scores red, grade level and just below yellow, and above grade level green. One teacher from Southeast explained,
The counselor came in to go over the SAT scores with the class so that they could color in their bars and see where they fail. So that they would know, “I made a one on these or I made a two on these and I have to be responsible for my own learning.”
Another teacher at South, where the same activity occurred, said that students with scores landing in the “red and yellow” bars learned they needed to “be” in the “green bar.” The act of coloring in the bars embodied the schools’ transfer of responsibility for achievement to the students. Students inscribed their own learner subjectivity into small, colored bar graphs that located their distance from the norm and learned how far off “target” they were, or as one teacher said, “where they fail.”
The “coloring in the bars” activity was followed with what seemed to be intended as a “pep talk” by each school’s counselor. In a private conference with individual students to guide them to “take ownership” of their SAT 10 score, the counselor and the student together looked at the charts with the colored-in bars and discussed what the student needed to do to raise her scores.
Critical Responses: Constraints and Contradiction
Critical response findings proved to be messy indeed. I analyzed the three types of critical responses separately for subthemes but also considered them collectively as dimensions of a theme. This collectivity provided an overarching theme of “constraints and contradictions.” The subtheme of “Testing as a Technology of Power” for Type 1 responses reflects the teachers’ awareness of the sorting and stratifying uses of the SAT 10 and the consequences for students and themselves. The subtheme for Type 2 responses, “Silenced Voices,” speaks to teachers’ marginalization as professionals held accountable but given no voice or respect for their professional knowledge and experience. It also reveals the teachers’ silence concerning the validity and misuses of the SAT 10 data. Type 3 responses suggest the subtheme of “Real Teachers” by which the teachers redefined “real teachers” as those with experiences “in the trenches.”
Critical Response Type 1: Testing as a technology of power
Each teacher’s comments revealed awareness of how the SAT 10 scores were used as a mechanism of power and control over students, which reinforced differences in income levels and foreclosed opportunities for more stimulating curriculum and pedagogy. They reported that students doing well at school invariably represented families with higher incomes. A fourth-grade teacher at Southeast commented, “These scores, I see . . . I look at it more as a have and a have not kind of thing.”
Test scores determined curriculum content, quality, and depth, and instructional approach for many students. All of the schools used homogeneous small groups based on test scores for within-classroom instruction. At South, the highest scoring students, Stanines 7 to 9, were often, but not consistently, placed in “advanced” type of classes that offered more challenging learning experiences not bound to scripted programs nor subject to curriculum pacing guides. Classes at the other schools included the range from above to below grade level and adhered to the various scripted curriculum programs required by the state and local district. As another teacher observed in reference to the curriculum resources offered with the required programs, “It’s supposed to be a level playing field because you get resources and all that, but it’s not.” A South teacher who worked with “average” or “struggling” students offered that highest scoring students and their teachers “get to go to another style of teaching and you know and do something fun and I hate that because my classroom can be the same as the gifted class . . . if I didn’t have to adhere . . . to follow every program.”
This points to how testing policy leverages educational opportunity and teacher autonomy in ways that maintain social inequities under the rhetoric of educational accountability. Engaging teachers in the regulatory work of “target practice” through testing’s technology compels teachers to act as enforcers that maintain the boundaries separating the “have’s from the have not’s.”
Critical Response Type 2: Silenced voices
Type 2 critical responses offer experiences and voices not represented in a text but present in its margins as leftover or “surplus meaning” (Derrida, 1974). Considering testing policy as the text, the teachers find themselves pushed to the margins. Regularly monitored, they are required to speak publisher’s scripted words because theirs are seen as inadequate. The fifth-grade teacher from East observed, “I can’t bring anything inside the program.” A teacher from South expressed frustration at being directed to change his teaching but “(they’re) not giving me room to change because I have to follow a program.” Scripted data meetings, intended to be collaborations among professionals sharing data, also silence teachers’ voices, as noted by a Southeast teacher, “Most of the time they [data meetings] were scripted—she [Reading Coach] knew what she wanted to say.”
Furthermore, the surplus of meaning in the margin of the test scores was the students’ progress made possible by the teachers’ work with them. Several teachers expressed frustration at witnessing how their students had progressed, realizing that if they did not perform well on the SAT 10 or the state achievement test, both the teachers’ and students’ work would not be valued. A male third-grade teacher at South expressed concern that the SAT 10 did not show the progress of students who
came in writing on the wrong side of the paper . . . This test doesn’t show where they are when they leave the classroom. I mean that’s what upsets me. This test doesn’t know my kids . . . and this test doesn’t show what that child has learned over a period of time being with me.
I did not hear any critiques of the validity of the SAT 10 or how the schools used the data, a silence that provokes troubling questions about teacher political disengagement, fatigue, disempowerment, or surrender.
Critical Response Type 3: Real teachers
Critical Type 3 responses develop alternative definitions for terms or concepts proposed in a text or offer counter narratives to a text’s “plot.” The teachers asserted that “real teachers” are found “in the trenches,” not monitoring fidelity to curriculum or checking up on pacing guides. As a third-grade teacher at South observed, “We’re fed up with people coming in who couldn’t handle the trenches and telling us how to deal with the trenches. Only real teachers understand this struggle every day.” Policy makers and state department consultants are seen as outsiders who left or were never in the trenches, which makes their authority questionable in the eyes of many of these teachers. A fourth-grade teacher from Southeast shared, “I get ill because a lot of the times the people that are telling you how to do your job have never done your job . . . the people who make the policy have never taught in a classroom.” A third-grade teacher from East expressed aggravation at the irony as “the tests are pushed by people who have not been in a classroom in a very, very, very, long time . . . don’t tell me how to do my job if you are not in the trenches with me.”
Even differences in school contexts call into question a consultant’s authority to “tell us what to do,” according to a South fourth-grade teacher,
If you . . . taught all your . . . teaching career in a school where the kids had enough money, they had food at home, they have nice clothes, they don’t have to worry about where they are going to sleep—you have no idea what I go through every day, because my kids don’t have that and I’ve never taught at a school where kids had that.
The awareness that teaching in Title I schools presents challenges that teachers in middle class or affluent schools rarely encounter speaks to the teachers’ thinking that somehow their schools are “deficient,” or not “normal.” When asked how they thought the Educational Testing Services arrived at the norms established for the average percentile rankings, several teachers indicated a disjuncture between their own students and the students in the imaginary who “are the norm.” One observed, “I know there’s a norm out there . . . Some kids in a different school could probably handle that type of question or that type of test or whatever. But here, you know, some of ours just can’t.” Most of the teachers shared the opinion that they did not feel that their students were represented in the normed rankings. Of course, the irony of the situation is that even though these teachers felt that their schools and students were not represented in the norm, they “enforced” the norm on their students and urged them to “take ownership” of test scores calculated by that same norm.
Considered collectively, these findings reveal contradictions between the teachers’ recognition and criticism of the reductive processes and constraining conditions to which they are subject and their shortsightedness relative to similarly reductive processes to which students are subjected. As dimensions of the overarching theme “constraints and contradictions,” themes from the three response types illustrate how the constraints of accountability testing contribute to contradictions imbricated in teachers’ critical interpretations of their experiences with test data. Teachers’ demands for respect for their professional judgment and knowledge contradict their lack of professional and knowledgeable critique of the validity and use of SAT 10 data. They critique the applicability of SAT 10 norms to their students but compel their students to “take responsibility” for not performing to those norms. Finally, there is the contradiction in teachers assigning responsibility for low test scores to students and their families without reflection on their responsibility to reflect on how their teaching may be contributing to students’ low sores.
Visionary Responses: Utilitarian Reflection
Visionary responses indicate that a reader has generated new ideas to apply to practical considerations in response to a text. Two teachers at East described how they responded to state achievement scores that evoked a theme of utilitarian reflection on personal efforts to raise them. The third-grade teacher offered that the scores were “a reflective piece,” that caused him to reflect generally on how he needed to change his teaching. The fifth-grade teacher recounted how the high incidence of the below-grade-level score of “2” on state math achievement tests led her to analyze patterns in her students’ answers. Based on her analysis, she spent time “training” students to fill out the required answer grids and saw fewer errors, despite her frustration at having to devote time for that.
These examples illustrate how visionary responses could range from practical analysis of the testing data to a more general reconsideration of one’s teaching. Both involve a certain amount of reflection. Both show responses to test data that do not blame the students. Each teacher’s interpretation of the test data drew from imagining reasons for low test scores outside of the deficit thinking discourses that characterized the other responses.
Differences by Schools and Individuals
Presenting findings grouped by the three response types highlights the numerous commonalities shared by the individual teachers across schools, specifically the similar ways in which deficit discourses are appropriated to explain school practices and variations in student achievement. This section presents differences by school and individuals mostly distinguished by differences in years of teaching experience and racialized discourse.
South
South teachers, three African American and one European American in an almost homogeneous African American Title I school, offered stronger and more numerous critiques of the effects of accountability policy on their practice and their sense of themselves as professionals. This seemed mostly directed toward the state department monitors, as noted in the third-grade male teacher’s response, “you’re regulating from the state department because you couldn’t handle it” and may reflect South’s previous years of “Needs Improvement” status. Their discontent was directed to accountability pressures and state monitors, not to the school leadership.
The one European American teacher in the group appropriated racialized discourses to explain students’ low test scores in terms of “their culture.” In response to a question about students’ “caring” about education, she said, “I have had kids say ‘I don’t like White people and I don’t have to listen to what you say.’” However, she also spoke of her perception of students’ sense of entitlement as a reason many didn’t care about school, “you owe me something because one, I don’t have any money, two I used to be a slave.” She saw the students’ objection to a White teacher as an obstacle, but not her own attitude, a deeply disturbing finding.
East
The visionary responses offered by two European American teachers from East, a predominately European American Title I school, distinguish them from the others. The fifth-grade teacher, who had to deal with the SAT 10, reading tests as well as three different kinds of state achievement tests, was more outspoken about the state’s disregard of teachers’ input on testing and curriculum. By contrast, the third-grade male teacher, who had fewer tests to prepare and account for, offered mild critiques of the district’s guidance.
Southeast
The inexperience of the three European American female fourth-grade teachers at this majority African American Title I school distinguished their responses from those of the other schools. Whereas the more experienced teachers at South and East referred to previous years’ teaching when they exercised more autonomy, these novices made no such comparisons. Their concerns about staying on track with the pacing guide and weekly collaborations on teaching strategies were not tinged with the fatigued and/or angry overtones I heard in the more experienced teachers’ comments at South or the fifth-grade teacher at East.
Implications for Data-Based Practice
In asking what the findings of this study suggest, several implications and questions emerge for: (a) reader response analysis, (b) teachers and the text of the test, (c) teacher education and research, and (d) the accountability of school leaders to teachers. After offering reflections on limitations and possibilities of reader response analysis suggested by this study, I discuss implications for teachers and the text of the test using the reader response framing of the text as the regulator and co-constructor of its meaning. That is followed by questions for teacher education and research suggested by the study. Finally, implications for educational leaders conclude the article.
Reader Response Analysis
In doing this, or any analysis, the researcher is reading and responding to the data and co-constructing a virtual text. Keeping that in mind, I found that beginning data analysis by coding for the three response types may have narrowed my initial examination. It left open the possibility of excluding significant data that might have been included by initial inductive analysis. Reader response theory encompasses a broad range of theoretical frameworks from social theory to feminism, any of which could be used to inform an inductive analysis from which multiple responses and perspectives could be generated and put alongside each other for deeper understanding.
Teachers Reading the Text of the Test
Rosenblatt (1978) suggests that the product of a reader’s transaction of meaning with a literary text is a poem. The poststructuralist semiotics claim in reader response theory used in this analysis generalizes that concept to propose that the products of the teachers’ transactions of meaning with their students’ SAT 10 scores are texts. Findings from this study suggest that teachers read much more in test scores than just decontextualized numbers and charts on the page. They read their students’ SAT 10 scores as a moral and a discursive text that together prescribe classroom practices and teachers’ expectations in an economy of merits and deficits.
However, the teachers did not expand the text on their own. Rosenblatt (1978) reminds us, “the text regulates what shall be held in the forefront of the reader’s attention” (p. 11), suggesting that the reader does not construct meaning independent from the text he or she reads. Rather, she co-constructs meaning regulated by what the text puts in front of her. So if teachers interpreted moral and discursive meanings in SAT 10 scores, the resources for those meanings were already in place in the interpretive context shaped by accountability policy and deficit discourses associated with low-income students. SAT 10 scores act metonymically for the moral and discursive texts inhering in and imbricated with the percentile scores and stanines on students’ test reports.
The moral text
The term accountability implies moral intent, so it should come as no surprise that teachers read and produce a moral text in their transactions with SAT 10 scores. In a democracy that funds public education for all, so the thinking seems to go, it is the moral responsibility of students to do their best, teachers to teach well, and schools to effectively use resources so that public funds are not wasted and students are produced as contributing participants in society. SAT 10 scores interpreted as measures of achievement and effort are products of meritocratic ideologies that inscribe these moral and discursive meanings into testing and test scores.
Research since the Coleman report has supported the relationship between lower social economic status (SES), minority status. and lower educational achievement (Borman & Dowling, 2010; Ladson-Billings, 2006; Oakes, 2005). The complex relationships among SES, race, ethnicity, and immigrant status muddy understandings about how school achievement relates to these variables. However, many educators influenced by popular discourses of cultural deficits linking low income, race, low achievement, and disregard for the value of education blame students and their families for low test scores (Hale, 2001; Lynn et al., 2010; Noguera, 2007). In this way, low test scores seen as a consequence of students’ disregard for the value of education or lack of effort are read as a moral weakness, even failure.
This perspective sustains the “culture of low expectations” (Landsman, 2004) associated with deficit thinking that was clearly evidenced in the schools’ practices of “coloring in the bars,” and the face-to-face conferences with counselors. These activities enact the “public spectacle of failure” identified by Lipman (2004), which embarrasses and humiliates students publically. But even more insidiously, these practices symbolically transfer the failure of the structures of schooling and its agents in the form of the teachers, principals, and counselors, to the vulnerable shoulders of children.
The “target” language wove itself into teachers’ explanations normalizing the attributions of blame for low scores to students’ and their families’ behaviors and attitudes. Conversely, that same action points to the personal and professional responsibility avoided by the teachers, school leaders, and ultimately by policy makers as they assign ownership solely to students. This fosters a morally bankrupt climate of self-interest and self-protection at the expense of the most vulnerable. Whether intentional or not, the moral text is deployed as a politically expedient text by passing the blame for low scores to students and their families.
The discursive text
These teachers also read their students’ SAT 10 scores as a discursive text that “told” them “what was wrong,” or “what we needed to work on.” Test scores “told” the teachers what kinds of students they had and located students in certain learner subject positions. This prescriptive emphasis on what the test scores “tell” constructs a passive teacher as the listener and reader who receives this information uncritically, as evidenced in the absence of teachers’ critique of the SAT 10 data and how it was used.
The SAT 10 scores alone were not the only testing outcomes shaping the discursive text. With almost every teacher, questions about their students’ SAT 10 scores invariably evoked mention of students’ other achievement scores. The repeated comments about the student performance on these other tests suggest that each test result informed the teachers’ thinking in the context of the other scores. When explaining a student’s SAT 10 score, I often heard a teacher refer to a student as “my 2,” or “she’s an intensive,” meaning she receives frequent reading intervention work. Students were ranked comparatively to each other and to a discursively regulated norm, their identities produced as “differentiated identities,” broken, and in need of fixing and/or intervention (Lipman, 2004, p. 63).
Furthermore, one of the many consequences of these school practices is the discursive production of teachers’ practice as strategic “target practice” directing teachers to hunt for “bump kids” and take aim at higher scores. That leaves the “low low kids” as “collateral damage” (Nichols & Berliner, 2007), and “really really high kids” to fend for themselves when placed in mixed ability classes. In this way, teachers, leaders, and schools enact moral and discursive texts that mutually support and inform each other to uphold a kind of meritocratic imperialism that maintains the status quo of social and economic inequities.
Teacher Education and Teacher Research
The examples of teachers’ thinking about and interpreting SAT 10 scores presented in this article illustrate that teachers’ data-based practice is vulnerable to a variety of contingencies and interests. Data-based decision making did not prevent students from being targeted as merits or deficits, and actually seemed to be the mechanism by which students were “branded” into specific learner subjectivities. This study reminds us that schools are contested sites of control between federal, state, and local leaders. As “docile bodies” (Foucault, 1977/1995) in the hierarchy of accountability, students and teachers often become the leverage point for these conflicts.
This presents a challenge to teacher education and research to support new teachers in “dealing with data” in productive ways that foster their own learning and growth as well as that of their students. When asked, all of these teachers said that their teacher education programs did not include training in how to read and interpret standardized test scores. Without that, teachers are left vulnerable to training and direction provided by the local district, which may not be correct, and may foreclose opportunities to think in more reflective or innovative ways. Data-based practice can be more generative and productive than that reported by this group of teachers, especially if allied with critical reflection (Joyce, Calhoun, & Hopkins, 1999; Mid-continent Research for Education and Learning, 2010).
The other challenge is to provoke teachers’ realization that schools are sites of controversy and control. They should realize that their engagement and understanding of the political processes that produce scenarios such as these teachers spoke of require strategic dialogue and collaborative conversations that challenge prevailing political expediencies overshadowing good teaching practice.
The Accountability of Educational Leaders to Teachers
Finally, it is important for principals and other school leaders to recognize their own complicity in evading responsibility for data-based practice that finds itself transformed into “target practice.” How do principals and leaders actually facilitate the generative and critical reflection on student data that appeared to be absent from the experiences reported by the teachers in this study?
It is clear that the achievement tests that currently supply the data on which data-based practice is based are highly questionable in their composition, validity, and the uses to which they are put (Popham, 2010). However, standardized testing as a weighty presence in schools is here to stay. This is the time when educational leaders themselves need to develop their own data analysis skills and knowledge, what Popham (2010) calls “assessment literacy,” so that they can speak out about the “instructional insensitivity” of most accountability tests. This needs to be accompanied with commitment to critical reflection to create conditions in which faculty can exercise their powers of critical thought in their deliberations on their students’ test scores. Leaders must be accountable for how they support and provide for their teachers’ engagements in an activity with so much potential to harm and that is so vulnerable to political exigencies. Teachers and leaders in the field need to develop critiques of accountability testing and policies that counter attitudes of resignation to the inevitability of testing. A waiting game sustained in the hope that accountability testing will go away is counterproductive.
In conclusion, we need further conversations about reconceptualizing teacher education and educational policy making in such a way as to recognize the irreducible and unguarantee-able nature of interpretation inherent to human practices such as teaching and learning, and more specifically, reading and interpreting data. Such a teacher education and practice need the data-informed information important to evaluating student learning as authentically as possible, as well as the resilience and reflexivity of teachers’ informed and critical reflection for evaluating its motivations and consequences.
Footnotes
Appendix
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received the following financial support for the research, authorship, and/or publication of this article: a research grant from the College of Education at The University of Alabama, and a Faculty Research Grant from The University of Alabama. The grants paid for the transcription of the interviews conducted for this study.
