Abstract
Value-added approaches for attributing student growth to teachers often use weighted estimates of building-level factors based on “typical” schools to represent a range of community, school, and other variables related to teacher and student work that are not easily measured directly. This study examines whether such estimates are likely to be accurate in “outlier” schools where building-level characteristics, such as demographics and faculty qualifications, are at the outer edges of the distribution of schools on which the “typical school” estimates are based. We examined whether building-level factors correlate with grade-level ratings in one of the most widely used approaches to value-added modeling, thus impacting interpretation of value-added ratings of teachers. Urban schools may be particularly affected by findings that reliable interpretation of a model using typical school estimates is affected by aspects of the school, even when using a weighted model. More correlations were found than would be expected by chance, many fairly large. Correlations tend to cluster around particular variables, possibly an effect of system accommodations for demographic or economic factors. A greater range and number of correlations were found for mathematics than reading. Finally, correlations and strength of relationships increase with grade level.
Introduction
Accountability for public schools’ performance has been a topic of interest for decades. Public awareness of an achievement gap affecting minorities and lower socioeconomic students, as well as the increased numbers of school levy campaigns, continue to energize the demand for accountability. Citizens want to know that their tax dollars are being spent wisely and that their children are growing academically. States have responded by adopting statewide content standards and proficiency tests that align with the state standards (Linn, 2004). The rationale is that all children should be able to pass the state proficiency tests if all classroom teachers are using the state standards to guide high-quality teaching.
In response to this increased interest in student performance, longitudinal statewide student performance databases are being developed to study how individual students have grown over a specific time period. The percentage of students that pass an annual proficiency test is a status measure rather than a growth measure. Status answers the question of how high students score on state assessments (Gong, Perie, & Dunn, 2006). A growth measure answers the question of how much learning took place while a student was in one class or one grade. The problem in determining growth is that all students start at different places on a scale. Differences in scale units also complicate interpretations of growth (Braun, 1988).
“Status versus growth” is a popular topic among policy makers, educators, and educational researchers; current accountability interests demand some measure of growth. In a comparison of status and growth measures over a 3-year period in mathematics, school factors (i.e., resources, policies, leadership) were significantly correlated to growth scores whereas nonschool factors (i.e., community characteristics) were significantly correlated to status scores (Stevens & Zvoch, 2006). This indicates that the two measures reflect different effects of an educational system.
Although it has been shown that social, economic, and family characteristics outside the control of educators predict a great deal of student academic performance at a given point in time (see, for example, Coleman et al., 1966; Jencks, 1972), more recent research that analyzes data at multiple levels (such as student, classroom, and school) and research that focuses on student progress over time as opposed to single-point in time “status” test scores indicates that schools and teachers can make a large difference in student achievement. A comprehensive overview of this research by Wenglinsky (2002) concludes that “the effects of classroom practices, when added to those of other teacher characteristics, are comparable in size to those of student background, suggesting that teachers can contribute as much to student learning as the students themselves” (p. 1).
As urban schools are of particular interest to the teacher-preparation institutions and educators working to close achievement gaps among student groups, and as those schools often have student demographics and faculty characteristics which are very different from the average school, it is very important for researchers and policy makers relying on value-added ratings to understand any possible interactions of building-level variables with grade- and teacher-level results.
Literature Review
As a result of the variety of value-added models being implemented, the term value added does not have a universal definition. The term is used by states throughout the nation to refer to an alternative way to provide feedback on an educational establishment’s effectiveness based on analysis of individual student growth. Some models involve multivariate statistical methods that take into account socioeconomic status, prior testing results, and student factors such as race, gender, native language, and mobility. Some models include the effect of previous teachers as well. To further confuse the issue, it is not customary for states to specify the model being used in computing value-added scores. A value-added score of, say, 1.3 has a very different meaning in Ohio and in California as they use different methods to determine scores.
As stated earlier, a growth score reflects student change over time. A value-added score is a specific type of growth score. A value-added score represents student change attributed to a specific time, agent, or experience (Gong, Perie, & Dunn, 2006). Sanders’ Educational Value-Added Assessment System (EVAAS) is just one of the models that is available for calculating a value-added score (Sanders & Rivers, 1996). California, Texas, Arizona, and New York have each adopted their own unique growth/value-added models based on gains in test scores over more than 1 year (Linn, 2004). Policy makers cannot agree on a standardized value-added model. Each of these models is slightly different in its approach; one preferred model for computing value-added scores has not surfaced. The commonality in all models involves the use of the difference in test scores (the gain) in determining a value-added score. The state of Ohio uses the EVAAS model, and this model was the source of data for this research study.
Value-Added Analysis and the EVAAS/Ohio Model
Value-added models are specialized growth models, designed not only to analyze student achievement progress over time but also to understand the amount of student progress that can be reasonably attributed to the student working with a particular teacher and in a particular school building, relative to other teachers and schools. EVAAS accommodates several sources of influence on individual student progress.
The first group of variables influencing any student’s progress consists of student characteristics, such as his or her prior achievement level and motivation, and socioeconomic factors that are strongly linked to achievement score results such as poverty level and parental education. The EVAAS value-added model 1 takes student-level characteristics into consideration by calculating each student’s progress individually and looking at his or her assessments over time, so that the effects of the student characteristics are attributed to the student and not to the teachers or school. These are called student effects.
By taking advantage of the longitudinal aspect of the data, each student serves as his or her own “control.” In other words, each child can be thought of as a “blocking factor” that enables the estimation of school system, school, and teacher effects free of the socioeconomic confoundings that historically have rendered unfair any attempt to compare districts and schools based on the inappropriate comparison of group means. (Sanders, Saxton, & Horn, 1997, p. 138)
The second group of factors that most influence a student’s progress is, naturally, the work that the student does with his or her teacher. These are called teacher effects.
Each student’s test data are accumulated over time and are linked to that student’s teacher(s), school(s), and school system(s). TVAAS [Tennessee Value-Added Assessment System, the earlier name of EVAAS] utilizes the scaled scores students make over time to model their learning patterns. By taking advantage of the longitudinal aspect of the data, it is possible to note when the normal pace of academic growth deviates. [F]ollowing growth over time . . . enables the partitioning of school system, school, and teacher effects free of the exogenous factors that influence academic achievement and that are consistently present with each child over time. (Sanders & Horn, 1998, p. 249)
The findings presented in this report are an analysis of the third group of influences, building-level effects, which affect both student and teacher effects. These effects include a number of variables that describe the context in which all the students, instructional faculty, and staff in the school must work while learning and teaching. This includes not only the physical and cultural working environment but also aspects of delivering education such as resources, curriculum offerings, school policies, and leadership. Such school-level variables are often also highly affected by and/or representative of community characteristics. As it would be impossible to identify and accurately measure every single variable in a school that might affect student progress, the EVAAS value-added model estimates building-level effects statistically by calculating a weighted effect using data from the population of school buildings across the entire state.
. . . in EVAAS applications a projection is the score that a student would be expected to make assuming that the student has the average schooling experience in the future. The means should therefore be those of an average school within the population of schools of interest. (Wright, Sanders, & Rivers, 2006, emphasis in original)
Often the term typical school effect is substituted for the “weighted effect” and/or “average schooling experience” when this aspect of the value-added model is described (Yeagley, 2007).
Given the enormous variance among schools, districts, and classrooms, and the difficulty collecting the appropriate data, accounting for the average schooling experience by using a statistical estimation is probably the only realistic approach. Sanders and colleagues summarize the impact of each of the three groups of factors on student growth as follows:
If the variability in student academic progress is partitioned into three “buckets”—among Districts, among Schools within Districts, and among Teachers within Schools within Districts—what is the relative amount of the variability that will go into each bucket?
Among Districts about 5%,
Among Schools within Districts (the building-level effect) about 30%,
Among Teachers within Schools within Districts (the teacher-level effect) about 65%. (Sanders, 2004)
But what if the school is not average? Reviewing the value-added methodology with regard to the context in which new teachers work—an important aspect of understanding the effects of teacher preparation—prompted these researchers to wonder if there might be schools where some building-level aspects—faculty, policies, resources, the characteristics of the student body as a whole—are so different from the weighted average school effect that statistical estimates become unstable or inadequate. As the EVAAS model purports to represent building-level effects in its model using weighted average estimates, we propose that discovering residual correlations of building-level variables with grade-level effects would indicate that the model must be interpreted with extreme caution for “outlier schools.” We extend our reasoning to propose that “atypical” schools may face a substantially different educational task, and thus the usual expectations of apportionment of building and teacher effects may be quite different. We will suggest in discussion of our findings that these schools may need a modified use of value-added ratings to ensure that district- and school-level effects are adequately interpreted, so that expectations for student achievement growth are not unrealistically placed at the classroom level. We do not suggest reducing expectations for teachers’ work, but rather ensuring that these expectations are accompanied by adequate building- and district-level supports when developing value-added policies.
Research Questions
For this analysis, teachers’ work is represented by grade-level value-added ratings and the average expected school/building effect is represented by student body characteristics and known building-level variables for faculty as a whole. Specific research questions are as follows:
Research Questions 1: Are the effects of building-level variables on teachers’ impact on student achievement adequately accounted for by use of the weighted estimates in the value-added ratings model?
Research Questions 2: Can a school be sufficiently outside the “typical” parameters such that interpretation of the value-added rating becomes suspect?
Research Questions 3: Are there school placement settings [urban/rural/suburban, SES, % student Individual Education Plan (special education plan), size, etc.] where value-added rating results may need cautious interpretation, either because of building-level effects or because the underlying building-level data are missing or unstable?
Method
This study was conducted by members of the Teacher Quality Partnership (TQP) research team when developing a large-scale, longitudinal study of novice mathematics and reading teachers in Ohio. All 50 institutions that prepare teachers in Ohio were partner participants in the larger study. More information about TQP and other personnel involved is available online at http://www.tqpohio.org. Ohio legislation requires the use of value-added progress measures as part of the accountability system for all schools and districts in the state. On a wider scale, the same value-added approach used in Ohio (and other models very similar) are at work in a number of states to evaluate the work of thousands of schools. The EVAAS model, used in Ohio and explained in more detail above, relies in part on estimates of an average, weighted expected building effect to permit final calculations of the “value added” by a particular school and, in the case of this study’s data, by a particular grade level of teachers. The terms building level and school level are synonymous in this article.
The TQP research team worked collaboratively in study design and pilots with the staff and organizations most closely connected to the Ohio accountability measures, to make sure that value-added ratings were used appropriately and effectively. William Sanders, a pioneer in value-added methods, and the SAS EVAAS Group 2 are providing Ohio with value-added analysis. Also involved are key data personnel from the Ohio Department of Education (ODE) and Battelle for Kids (BFK), an Ohio nonprofit organization that has been working to implement value-added progress measures in the state. An agreement between ODE and SAS limits availability of value-added scores to researchers. Because of this, TQP did not have access to actual value-added ratings or the model that calculated the ratings. This research was possible with the full cooperation of the SAS institute. Analyses were performed at the institute site in North Carolina in March 2006, conducted by SAS personnel with the authors’ direction during a visit to the institute.
Although TQP researchers acknowledge and share concerns about value-added progress measures for accountability purposes (see, for example, Ballou, 2002; Kupermintz, 2003), there is reason to believe that this study can shed light on the work of teachers and schools from a research perspective. Value-added and growth measures have been shown to be much more robust than single-point-in-time status assessments. The authors are aware of the criticisms of Sanders and the EVAAS methodology (see, for example, Amrein-Beardsley, 2008; Bracey, 2004, 2006; Lockwood, 2002; McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004; Raudenbush, 2004; Tekwe et al., 2004). It is not the purpose of this article to further discuss such criticisms. Numerous schools are being rated by these models so it is important to examine concerns where possible, despite the fact that specific EVAAS calculations are proprietary and therefore cannot be directly manipulated in traditional ways (e.g., the states of Tennessee, Ohio, and Pennsylvania. See Hershberg, Adams Simon, & Kruger, 2004; Schaeffer, 2004). The present study examined a concern regarding how broadly the typical school weighted estimates can be interpreted. We take a practical approach in these analyses by using the EVAAS value-added scores that schools are receiving annually, along with building-level data available from the ODE. This article is an extension of analysis work conducted in part for a broader dissertation by one of the authors; further details can be found in Franco (2006).
This is an exploratory analysis of a fairly large data set available through the 4-year Project SOAR pilot program of value-added measures in the state of Ohio, led by BFK, an organization working with the SAS EVAAS Group and the ODE. This article reports the findings of an analysis of value-added ratings for a large number of Ohio schools, for reading and mathematics Grades 3 through 8. The data mining goal was to identify any characteristic related to the context of the students’ education that might support or hinder the progress of students and teachers as they work together. Causal inferences related to findings could be speculated in both directions—variables to student achievement as well as student achievement to the variables—as, for example, some teachers may select which schools they must work in based, in part, on student characteristics and prior achievement, and some parents may choose where to live and send their children to school based, in part, on building success. It is possible that multiple causalities in various directions may be equally applicable to different schools among the group that lies sufficiently outside the typical estimates. It will be important to examine the range of potential cause and effect in the areas identified as unstable by our findings of residual correlations. As schools are expected to succeed with every child regardless of how the school is affected by economic and community realities, it is important to understand where policy choices, building-level supports, leadership climate, culture and other systemic work may potentially affect student achievement.
Data and Participants
Data variables related to student learning, teacher and classroom effects, and school building characteristics were used at building and grade levels:
Grade-level value-added ratings
Calculated from multiple achievement scores over time for each individual student, residual scores (actual vs. predicted) are aggregated and analyzed at grade level to estimate how much of the progress that students made can be reasonably attributed to the work of the teacher(s) at each grade level. These grade level value-added ratings were used from 928 of the 1,138 schools in 78 districts that participated in the BFK SOAR pilot project. These ratings were for Grades 3, 4, 6, 7, and 8 in mathematics and Grades 3, 4, 5, 6, and 8 in reading. A total of 9,135 grade-level ratings in all are part of this study. Value-added ratings were calculated by the SAS EVAAS group in late 2005, using the EVAAS longitudinal value-added growth model with multiple years of student assessment data culminating in 2004-2005 end-of-year Ohio Achievement Tests (OAT) and Proficiency Tests (OPT). The characteristics of the districts and the schools participating in the pilot SOAR project were statistically very similar to the state of Ohio as a whole having been intentionally selected to be representative for purposes of the pilot project.
The EVAAS model divides students into quintiles based on their prior performance to allow schools to determine if there are gaps in performance among all levels of students, regardless of whether a student was previously a poor or high performer. Similar to the idea of disaggregating data on minority and underserved children for accountability purposes in No Child Left Behind (2001), the quintile analysis approach often uncovers results that might otherwise be hidden in an average of all performances. For example, a group of grade-level teachers who are working very effectively with the lowest quintile students (the 20% of students who had the lowest prior achievement) may not be working effectively with the students in the highest quintile. Such a discrepancy would be hidden if all students’ progress results were averaged together. Also, as schools are receiving value-added scores by quintiles and using the results to inform improvement plans, we felt it was important to look at quintiles as well.
Building-level data variables
These data include student body characteristics, faculty characteristics, and school performance and adequate yearly progress (AYP) ratings. These data were obtained from the 2004-2005 Educational Management Information System of the ODE. Details on how these data variables are measured, aggregated, and reported in the state can be accessed at the ODE website, http://www.ode.state.oh.us, under “Data and Statistics.”
The student body characteristics, or building-level demographics, are individual student characteristics grouped at the whole building level. These include
The percentage of students in poverty as represented by free/reduced-price lunch status (Pct FRL).
The percentage of students classified by the school as “disadvantaged” (Disadv)
The percentage of minority students (PCT Minority).
The percentage of Black students (Black, n.b.: Black students are the state’s largest minority group).
The percentage of disabled students (Pct Disability).
The rate of mobility of the students in the school (percentage of students attending classes in a building for less than 1 year [Long 1]), for 1 to 2 years (Long 1 to 2), and for more than 3 years (Long 3+).
Faculty characteristics are teacher-related data variables, grouped for the entire building’s instructional staff, novice and experienced alike, and they include
The percentage of faculty certified to teach the grade and content to which they are assigned (Certified).
The percentage of faculty that have a master’s degree (With_MA).
The percentage of faculty that are NOT classified as “highly qualified (HighQual)”. Highly Qualified teachers (HQT) are those that have met specific elements of professional development. HQT requirements only apply to faculty in high-stakes content areas such as mathematics and reading.
The attendance rate of the faculty (Tchr Attd Rate).
The average years of teaching experience of the faculty (Yrs_Teach).
The reader will note that a number of these variables are related (e.g., HQT and certified teacher placements; percent minority students and percent African American students). Correlational analyses among the building-level demographics and faculty characteristics were all significant, and these correlations can consider different emphases on similar aspects of the school building; for example, HQT staff are usually certified to teach their assigned content area, and therefore percentage of HQT is highly correlated with percentage of certified. All building demographic and faculty characteristics were included in the initial data mining as distinctions among these colinear variables related to value-added ratings may be important to policy, and because the subtle differences between the variables might also shed light on interpretation of findings. Moreover, for this study of potential building-level impacts on grade-level value-added ratings, findings related to the building and faculty characteristics are likely to be proxies for larger issues of school policies, financial and leadership supports, community characteristics, and the like.
Procedures
All analyses were performed at the school building level by content (mathematics and reading) and grade. Analyses were performed to uncover and understand the relationships among the data variables. The correlation of each building-level variable (all of the building demographics and faculty characteristics, listed above) and each grade-level value-added estimate was analyzed. The grade-level data also included mathematics value-added ratings for the entire grade and for quintiles (students separated into five groups, lowest 20% to highest 20%, by their prior achievement) for Grades 3, 4, 6, 7, and 8; reading value-added ratings for the entire grade and for quintiles (students separated into five groups, lowest 20% to highest 20%, by their prior achievement) for Grades 3, 4, 5, 6, and 8.
In addition to general data mining and testing correlations, general linear model (GLM) regression analyses were conducted (a) to examine the interactions of identified building-level factors with value-added results for grade levels and for quintiles of students within grade levels and (b) to examine the relationship of several key variables with value-added ratings (percentage of FRL, percentage of Black students, teachers’ attendance rates, percentage of teachers with master’s, average teacher experience, average teacher experience squared [to correct for the known nonlinear relationship between teacher experience and student achievement, Tabachnick & Fidell, 2001, p. 114], and percentage of certified teacher placements). The key factors emerged from the initial data mining (Franco, 2006). GLM was used due to the fact that some of the characteristics are categorical. GLM is preferred when some independent variables are categorical and some are continuous (Kirk, 1995; McNeil, Newman, Kelly, & McNeil, 1996). The GLM confirmed statistically significant correlations, indicating a likely school effect in some schools that was not fully accounted for in the grade-level value-added model.
A correlation was deemed strong enough for further consideration in the findings if r2 ≥ .04. This level was chosen as a slightly more broad interpretation than the rule of thumb offered by Cohen that “the estimated effect is medium (and could have some importance) if r2 is in the general vicinity of .06, and the estimated effect is large if r2 is in the vicinity of, or exceeds, .14” (Cohen, 1988, as cited in Witte & Witte, 2001). Even if not a large relationship, the potential impact (should the relationship be a true effect) depends on the situation. For some real world examples, see Newman and Newman (2000, pp. 6-7), who discuss a variety of small effects that when consistent over time add up to a huge impact. A .04 correlation representing a true direct effect would mean much in terms of education for real students.
Findings
With regard to the research questions, we found more strong correlations than one would expect if indeed building-level variables are adequately accommodated in the EVAAS value-added calculations, at least for the state of Ohio. For most schools—those sufficiently within the parameters of the average schooling experience—the value-added ratings seem stable. For those schools where the context of the educational experience differs from the average in the state, the value-added ratings at the grade level may not be reliably interpretable. We believe it is likely that the educational needs presented by groups of students in some schools contribute to a school effect that is well outside the average schooling effect estimated in the value-added model. However, as the discussion below emphasizes, it is neither appropriate nor statistically reliable to address these school effect anomalies by adjusting the value-added model for socioeconomic factors at the building level. To do so risks differentiating expectations of education for groups of students based on their socioeconomic circumstances. There is no reason to believe from this study or from a review of data available to the state that such an adjustment could be done effectively even if one wished to make it.
More Strong Correlations Than Expected
Contrary to what one might expect if the approach of accounting for school-level effects using weighted estimates was truly adequate, we found a number of residual correlations between student body and faculty variables at the school- and grade-level value-added ratings. In Grade 4 mathematics, weak (r2 between .02 and .05) but significant (p < .001 or less) correlations were found with percentage of student body on free/reduced lunch and mobility, as well as with faculty certifications and teachers attendance rates. In Grades 7 and 8 mathematics, correlations were even stronger (r2 ranging between .03 and .31; p < .001 or less), with the largest being r2 = .31 for the negative relationship of a high percentage of minority students in the school with eighth-grade mathematics value-added scores. Having a very high percentage of minority students goes beyond the average school in Ohio.
Grouping of Significant Correlation Variables
We also find that certain variables are more consistently related to value-added results than others, for example, percent of disadvantaged (Disadv) or free/reduced price lunch eligible students (Pct Free Lunch) in the school. As there is reason to believe that the longitudinal analysis of the students’ academic growth adequately captures their individual poverty status in the value-added model (see Page 6), it is likely that the strong correlation of buildingwide poverty to value-added ratings is a proxy for other causal factors directly related to student progress. The patterns of correlations found are also consistent when we looked at students disaggregated by quintiles of prior performance (low through high, the way that the EVAAS system breaks student groups into quintiles), and altogether raise the suspicion that there is an “institutional level” or building effect related to accommodating high levels of poverty and traditionally underserved populations of students. As the discussion section will detail, the grouping of the relationships found with certain variables may indicate that these variables are all proxies for how the school interacts with socioeconomic and family factors known to predict student achievement.
Variables related to teacher qualifications—teacher certification, a master’s degree, more than 1 year of teaching and the percentage of HQT status teachers in a building—are positively correlated to both mathematics and reading value-added results. The analysis did not use a composite index for these data variables representing teacher qualifications. As teacher experience and other qualification “proxy” measures have a positive relationship with student growth up to a point and then level off, we included a squared term (teacher experience squared) to correct for the nonlinear relationship (see Tabachnick & Fidell, 2001, p. 114). These teacher related findings are consistent with those of Darling-Hammond, Holtzman, Gatlin, and Heilig (2005) in their examination of certification and teacher effectiveness. They
found that, relative to teachers with standard certification, uncertified teachers and those in most other substandard certification categories generally had negative effects on student achievement, after controlling for student characteristics and prior achievement, as well as teacher experience and degrees. (p. 15)
We believe our findings extend this by identifying the correlation of the overall qualifications of the entire faculty with the success of the individual teachers as represented by grade-level value-added ratings. To simply have highly qualified teachers in tested areas may not be enough—something which makes sense when realizing that the students’ educational experience is comprised of all faculty teaching the range of subjects across the full school day.
Building-level teacher attendance rate is significantly related to building-level mathematics and reading value-added scores, and the relationship is stronger for those buildings having lower value-added scores. Although it may be that regular teachers are more effective than the substitutes who replace them when they are absent, it is unlikely that this correlation is such a direct relationship. It is more likely, as discussed below, that low teacher attendance is an indicator of larger problems in the building.
Sensitivity to Type of Achievement Test
We also note that correlations changed for different types of tests given in different grade levels, leading us to suspect that the value-added calculations are more sensitive to test types than purported. The correlations between student body variables and Grade 6 reading value-added scores were positive (percentatge of minority students, r2 = .04, p < .01 or less) whereas the correlations for Grade 5 and 8 are negative percentage in poverty, r2 = .04, p < .001 or less). The Grade 6 standardized reading test for 2004-2005 was a proficiency test whereas Grade 5 and Grade 8 tests were achievement tests, so called as they had been redesigned to focus on higher order skills. At this period of time, Ohio was in the process of migrating from proficiency testing to achievement testing. Schools may have had time to teach to the test for the proficiency test. The difference in the direction of the correlations between those grades administering proficiency and those administering achievement tests indicates that the value-added scores may be more sensitive to the type of test than previously reported.
A Clear Difference Between Mathematics and Reading
Many more statistically significant correlations of building-level factors with grade-level value-added scores were found for mathematics than for reading, and the mathematics correlations tend to be larger than those found for reading. Correlations of schoolwide faculty characteristics (such as the percentage of teachers certified) with value-added results increase in magnitude and frequency for both mathematics and reading, but much more so for mathematics. It is interesting to note that for mathematics, the percentage of significant correlations with teacher-related variables increases as the grade level increases; student-related significant factors disappear after Grade 6.
Correlations Increase in Number and Strength as Grades Increase
There are a greater number of relationships identified as grade levels increase, and these become stronger with the increase of grade levels. The following Figures 1 and 2 illustrate 3 this fairly drastic change of relationships as grade levels increase. Figure 1 shows the relationship of facultywide certification with value-added results for each of the five quintiles of students, becoming much more positive and pronounced as grade levels increase. Correlations are almost at zero for all five quintiles at Grade 3, and correlations are between .2 and .3 for all five quintiles in Grade 8. Figure 2 shows a similar effect for a related data variable, the NEGATIVE relationship of value-added results with a LACK of highly qualified teachers on the faculty. Again, note the much stronger correlation at the higher grades. On the whole, the correlations become much stronger as one goes from third to eighth grade. Also worth noting is the fact that, although there is variation among quintiles, the pattern across the grade span is quite similar from Q1, the 20% of students with the lowest prior achievement, up through Q5, the 20% of students with the highest prior achievement. In addition to what we have shown here, we found that the pattern of more, stronger relationships in the upper grade levels also tends to be greater for buildings that have a larger number of students in the lowest two quintiles when students were grouped for analysis by their prior achievement results (Franco, 2006).

Mathematics ratings correlated with percentage of faculty certified, by quintiles (y-axis: R2; x-axis: grade).

Mathematics ratings for quintiles correlated with percentage of faculty NOT highly qualified (y-axis: R2; x-axis: grade).
Discussion
The nature of the data examined requires that interpretation of the findings must proceed with caution; however, the study did accomplish its goal of identifying areas at the building level in need of closer study to be sure that one is interpreting value-added ratings appropriately for some schools. The fact that strong correlations were not found across the full range of grades and variables provides some assurance that, for most schools, the value-added model is adequately “capturing” the effects of exogenous variables in the grade-level ratings. The presence of more strong correlations than one would expect by chance would indicate that for some schools there are building effects that warrant close scrutiny. Unfortunately, it is the schools that are farthest from the mean school effect used in EVAAS that are least represented by the weighted estimate in the model, and these are the schools most likely to be impacted by policy and reform efforts based on value-added scores. Our findings show that at best, interpretation of value-added scores is difficult in such schools.
Another possible interpretation deals with residual error randomly associated with the value-added scores. However, it is highly unlikely that these patterns are random error. The findings resonate with existing literature about school and teacher effects. As the adage goes, where there is smoke, there is fire. The existence of these building/grade relationships with value-added ratings points to the existence of some institutional-level impact on grade-level growth in some schools that is persistent, outside the regular calculations of value-added impacts, and not trivial. The relationships indicated could come from several sources—for example, teacher and/or parent selection of schools, building choices about curriculum, hiring, required instructional practices, climate and culture issues—and it is impossible to know without more in-depth data just what real world situations these significantly correlated building-level variables represent. It is likely that cause and effect runs in multiple directions, and that these particular building variables are proxies, alone and in combination, for a range of policies and practices.
Implications of Numerous Strong Correlations
Even Sanders and colleagues have found residual correlations of socioeconomic factors with value-added ratings in some circumstances (Sanders, 2006), but at the same time they present compelling evidence that the EVAAS model used in Ohio results in far fewer and smaller correlations of building-level variables—what one might call “school-effect residuals”—than other Hierarchical Linear Model (HLM) and growth-model approaches. The pattern of relationships found in this study indicates that schools must have a sufficiently different “educational task” or building educational need to see building/grade correlations; that is, they must have a very high population of mobile or poor students, or they must have possible indicators of difficult building climate or lower faculty expertise (as indicated by attendance rates, certified teacher placements, etc.). Finally, it is important to note that this study found the patterns to be consistent across quintiles. This is hopeful for schools and resonates with other examinations of the EVAAS model. In relationship to the Tennessee Value-Added Assessment System (TVAAS) model, the following is documented:
For grades three through eight, the cumulative gains for schools across the entire state have been found to be unrelated to the racial composition of schools, the percentage of students receiving free and reduced-price lunches, or the mean achievement level of the school. These consistent findings have verified the contention that by allowing each student to serve as his or her own control… the inclusion of exogenous co-variables to ensure fairness in the estimates of system, school, and teacher effects is not necessary. Schools, systems, and teachers who do best under TVAAS are those who provide academic growth opportunities for students of all levels of prior academic attainment. (Sanders & Horn, 1998, p. 250)
It may be that Ohio has more schools which fall sufficiently outside the average schooling experience used in the EVAAS model than did Tennessee. This study did find a pattern of relationships that supports the contention that impacts are similar across all levels of prior academic achievement (the quintile analyses). We also agree that including additional adjustments via exogenous covariables in the value-added model would not adequately address the correlations found. Adjusting for socioeconomic and buildingwide faculty variables at the institutional level would create differentiated expectations for certain groups of students in certain schools. Secondly, the positive and negative fluctuations in some of the correlations found indicate that it may be very difficult to decide the exact statistical adjustment or policy to apply—the underlying effects may be very idiosyncratic to each school in question. It is also possible that for these “atypical” schools the data are unstable, or that, given the pressures of NCLB and AYP expectations, each school is focusing limited resources in unique ways—more on this below. As Sanders et al., articulate:
The unfairness of these simple comparisons . . . has led some educational researchers to propose a methodology that would eliminate these biases by including a number of covariables into an analysis to adjust for socioeconomic differences. However, this approach immediately creates another huge problem. It is a hopeless impossibility for any school system to have all the data for each child in appropriate form to filter all of these confounding influences via these more traditional statistical approaches. (Sanders et al., 1997, p. 138)
Implications of Variables Persistent in Correlations
The significant correlations indicate that there are a number of interactions among school-level factors related to the socioeconomic status of the student body, teacher attendance, teacher placement, and student achievement which are not being fully accounted for in the value-added calculations for some school buildings. One interpretation that is supported by the extant research is that the relationships between these variables and student growth are indicators of the effects of institutional policies, practices, and resources (or lack thereof) where there is a large group of traditionally underserved students in the building (Dronkers & Robert, 2008). As schools cannot and should not refuse these students, it is reasonable to consider what these relationships may mean for policies and practices that will better impact educational progress.
The proliferation and strengthening of these correlations as grade levels increase may be an indication of some troubling trends that have been found elsewhere in school improvement research using value-added approaches. These schools may be making short-term choices that are negatively impacting students’ long-term progress. This has been found before using EVAAS results:
In the short run by restricting the focus to students perceived to be near proficient, while overlooking those who are very low or high achieving, this strategy (consciously or sub-consciously adopted) may result in increasing the percent proficient in the short term, but in the longer run may be a detriment . . . Not only will those students at the lower end of the achievement spectrum fall farther behind, but also the higher achieving students who consistently experience suppressed growth will profile closer to the proficient/nonproficient cut, decreasing their probability of demonstrating proficiency on a subsequent academic milestone. (Sanders, 2003)
The building-level faculty-related variables also show some interesting patterns that are consistent with the literature. Teacher preparation and professional development (represented in the study by HQT status, as professional development is required to be designated HQT) are related to value-added ratings. More importantly for this study, an adequate mass of teachers who are certified and/or HQT within the building is strongly correlated with grade-level results. The larger effect of certified, HQT, and master’s degree teachers at the higher grades may be the result of compounding the teacher effect with the more advanced and specialized curriculum options offered to students. It is likely that these variable correlations are also proxies for hiring and resource practices, as well as for some teacher-school selection. Other studies with the EVAAS model have shown that the discrepancies in value-added ratings related to student socioeconomic variables are not inevitable, however:
African American students and White students with the same level of prior achievement make comparable academic progress when they are assigned to teachers of comparable effectiveness. However, at least in the system studied, Black students were disproportionately assigned to the least effective teachers. Regardless of race, students who are assigned disproportionately to ineffective teachers will be severely academically handicapped relative to students with other teacher assignment patterns. (Sanders & Horn, 1998)
All of this may have implications for the effectiveness of “quick fix” approaches utilizing volunteer tutors or other amateur teachers, especially if there is no specialist overseeing the tutoring and/or amateur teaching process. It is also very possible that the lack of HQT or certified teachers is really an indicator of something else, such as a school with few resources (i.e., they can’t afford to keep highly qualified educators) or poor working conditions. Only the next stages of the longitudinal study will permit possibly untangling these variables. The current findings are very consistent with other recent research examining student progress and schools, as Wenglinsky summarizes:
Schools that lack a critical mass of active teachers may indeed not matter much; their students will be no less or more able to meet high academic standards than their talents and home resources will allow. But schools that do have a critical mass of active teachers can actually provide a value-added; they can help their students reach higher levels of academic performance than those students otherwise would reach. Through their teachers, then, schools can be the key mechanisms for helping students meet high standards. (Wenglinsky, 2002)
Nye, Konstantopoulos, and Hedges (2004) also noted that teacher experience effect was not negligible, with “magnitudes of the estimated positive effects . . . ranging from 0.06 to 0.19 standard deviations, or from about one half to slightly less than two times the small class effect on achievement gains found in previous analyses of these data” (p. 249, emphasis added). Zvoch and Stevens (2006) found that teacher education related more strongly to growth than status results, so the findings may echo this, assuming that certification is confirmed to be an adequate proxy measure for preparation in the state of Ohio. As Darling-Hammond et al. (2005) note,
certification is, of course, only a proxy for the real variables of interest that pertain to teachers’ knowledge and skills. These include knowledge of the subject matter content to be taught and knowledge of how to teach that content to a wide range of learners, as well as the ability to manage a classroom, design and implement instruction, and work skillfully with students, parents, and other professionals. (p. 23).
With regard to other variables having residual correlations with grade-level ratings, it may be that poor faculty attendance is representative of a poor climate and/or poor attitude and a lower level of commitment to the students, which negatively impacts student performance, or it may be a reflection of poorly designed school attendance policies. Research unanimously supports the contention that school climate affects job satisfaction on the part of staff personnel (Hoy & Miskel, 1996; Patrick, 1995; Taylor & Tashakkori, 1994). When job satisfaction is positive, staff personnel are motivated toward serving the organization and goal achievement. Such an attitude leads to improved attendance. Poor faculty attendance may also be a proxy measure for the academic expectations in the school for the faculty and students. This effect overlaps with high-poverty school status as well. Goddard, Sweetland, and Hoy (2000) are very specific about both aspects:
Our multilevel analysis demonstrates that a 1-unit increase in a school’s academic emphasis score is associated with a 16.53-point average gain in student mathematics achievement and an 11.39-point average gain in reading achievement. In other words, an increase in academic emphasis of 1 standard deviation is associated with a gain of nearly 40% of a standard deviation in student achievement in mathematics and more than one third of a standard deviation in reading achievement. The magnitude of the effect of academic emphasis in comparison with that of the within-school student-level variables is also noteworthy. For example, although students receiving a free or reduced-price lunch scored on average 2.41 points below their schools’ mean reading scores . . . the school means averaged 11.39 points higher where there was a strong academic emphasis. (Goddard et al., 2000, p. 698)
Still others (e.g., Dworkin et al., 1990; Norton, 1998) have hypothesized that faculty absence is a reflection of an unpleasant building climate, which also has a negative effect for students. Related to climate, stress, and leadership, Dworkin et al. (1990) found a low but statistically significant relationship between job stress and reported stress-induced illness among urban public school teachers. A second study hypothesis, stress-induced illness is lower among teachers assigned to schools where the principal is seen as supportive, was supported by a test of significance.
Implications of Differences Between Mathematics and Reading and Lower and Higher Grades
Some have contended that it is to be expected that one would find stronger value-added ratings for mathematics than for reading because value-added results are designed to identify the portion of student progress that is attributable to the school. Mathematics is usually only taught in schools, but students learn some reading skills outside the school (for example, from parents). Nye et al. (2004) suggest:
It is also interesting that across all grades, the variance of the teacher effects in mathematics is much larger than that in reading. In fact, in Grades 1 to 3 the variance in mathematics is nearly twice as large. This may be because mathematics is mostly learned in school and thus may be more directly influenced by teachers, or that there is more variation in how (or how well or how much) teachers teach mathematics. Reading, on the other hand, is more likely to be learned (in part) outside of school and thus the influence of school and teacher on reading is smaller, or there is less variation in how (or how well or how much) reading is taught in school. (p. 247)
This difference between the content areas is reflected in the strength of the value-added results. The correlations of these results with building variables as found in this study may be an indication of greater variation in the “active” involvement of the school-level policies and resources with mathematics teaching than with reading. Factors warranting further examination related to the discrepancies between the content areas include the curriculum expectations and offerings in the schools as well as the instructional expectations. In Ohio, for example, many schools have implemented Reading First (see http://www.readingfirstohio.org/page/districts) curriculum but there is no similar statewide curriculum preference for mathematics. Furthermore, mathematics offerings greatly increase in number and variety around the sixth-grade level (sixth-grade Math, pre-Algebra, Algebra), whereas explicit reading instruction tends to disappear or to be incorporated into more comprehensive language arts coursework. The fact, however, is that all students are NOT proficient in reading (see Lee, 2006, comparing NAEP and NCLB results nationwide, as well as the Ohio specific data at http://www.ode.state.oh.us), so common sense might tell us that a drop in reading effects in the higher grades means that less is being done to support student reading achievement in those grades than is needed.
Finally, it is possible to interpret the proliferation of variables correlating with value-added results in higher grades as a good thing. Although it is conventional wisdom that the work of closing achievement gaps should focus first on early grades, “catching up” students who are at risk early, the patterns found in this study might be interpreted to mean that focus on maintaining support for early achievers (rather than focusing on “bubble kids”) and continuing efforts into higher grades may all have a strong impact. It may even be that student achievement results are more responsive in higher grade levels to building-level systemic choices. A recent study of Ohio’s “turn around” schools, those that moved from high to low state ratings or from low to high in value-added ratings in a short period, presents some evidence that building-level efforts can make an impact.
. . . findings suggest that many of the schools with positive turnaround incidents were working within the model of three types of building-wide activities needed to coherently manage external and internal demands. On the other hand, many of the schools with negative turnaround incidents pointed to one or two factors only, and the most significant explanatory factor was a change in personnel. Among schools with positive turnarounds . . . principals reported to be not merely implementing changes, but to be striving toward constructing coherency or consistency within an infrastructure that, to varying degrees, managed the intersection of internal and external demands. These actions were largely missing from the principal interviews of schools with negative turnarounds. If ‘noise’ in the measurement was the primary reason for the observed categorical changes in school quality, then there would likely be no difference in the quantity and quality of school factors observed across both groups of schools studied. (BFK, 2006)
Next Steps and Further Research
The correlations of school-level variables that potentially affect grade-level teacher work as represented by value-added ratings identified in this exploratory study provide focus for follow-up investigations into school building support for reading and mathematics achievement related to the study of teacher preparation. These areas may also be potentially fruitful for school leaders and policy makers who desire to close achievement gaps and improve educational results for all children.
The larger relationships in the upper grades may mean that placing properly trained and experienced teachers in the higher grades could contribute to closing the achievement gap that currently exists. This could be tracked over the life of a longitudinal study. It may also be that curricular options given to seventh- and eighth-grade students are related to the relationship patterns found. The choices could be affected from a building level (i.e., what is being offered) as well as from a student level (what is requested). It is also possible that reading value-added ratings will always drop after fourth grade as long as it is not taught as a skill in addition to an application across the curriculum. Future findings related to these correlations may well have implications for school resources, curriculum planning, and different alignment of tests used for accountability (as there may be a disconnect between the test measurements and the actual applications of reading in later grades).
By looking across these multiple dependent variable measures and considering teacher performance over time in relationship to the aspects of teacher preparation, we will be better able to fit a model that controls for unobservables in the evidence chain, from incoming teacher education student, through teacher preparation, to school context and early career impact. By considering new teacher preparation along with early career support and building-level variables, we hope to be able to identify some promising policies and practices that will allow schools to more positively impact both mathematics and reading achievement.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
