Abstract
Achievement outcomes for students taught by recent program completers of Louisiana’s teacher preparation programs (TPPs) are examined using hierarchical linear modeling of State student achievement data in English language arts, reading, mathematics, science, and social studies. The current year’s achievement in each content area is predicted using previous achievement data, student characteristics, classroom characteristics (e.g., percentage of students with disabilities), school characteristics, and attendance of teachers and students. The contribution of a teacher having recently completed a specific TPP is modeled at the classroom level as an indicator variable for each TPP. Results for programs with 25 or more new teachers are reported. Results demonstrate substantial overlap in confidence intervals (CI) among programs. In some instances, 68% and/or 95% CI for programs in specific content areas did not overlap results for the average new teacher or experienced teachers (i.e., they were lower than average new teachers or higher than average experienced certified teachers). Results varied across content areas for some programs.
The extent to which students are prepared for postsecondary education and/or to effectively participate in the global economy after completing their primary education is linked to the quality and quantity of educational experiences they have while they are in school, most of which may be mediated by their teachers. When students leave school underprepared for postsecondary study or for the workplace, educators and schools have commonly been indicted as having failed those students (e.g., Anderson, 2011; DeWeese, 2007; Postal & Roth, 2011). Although data clarifying the direct impact of teacher preparation on student achievement have been limited (Koretz, 2002), some have cited the underpreparation of new teachers who may not be effective at the point they enter the profession as an important contributor to poor achievement (Boyd, Grossman, Lankford, Loeb, & Wykoff, 2009; Sawchuk, 2010). This argument and others have led to concerns regarding new teachers’ readiness for the workforce, to calls for improving teacher preparation, and more recently, to interest in examining the achievement of students who are taught by new teachers who enter the profession through different programs or pathways (Boyd et al., 2009).
Although there is a clear need and desire for teachers and schools to help students overcome challenges to their success, the extent to which this is possible as well as the specific teacher characteristics and practices that might contribute to this end remain an active area of professional discourse and inquiry (Cochran-Smith & Zeichner, 2005a, 2005b; Darling-Hammond & Bransford, 2005; Rice, 2003; Wilson, Floden, & Ferrini-Mundy, 2001). In addition, policies such as No Child Left Behind Act (NCLB; 2001) and the Individuals With Disabilities Education Improvement Act (IDEA; 2004) that place considerable emphasis on test-based accountability increase pressure on educators to identify factors contributing to teacher effectiveness. Teacher preparation programs (TPPs) are an obvious potential source of variability in teacher effectiveness; however, their impact is poorly understood and their impact on student achievement has received limited attention (Boyd et al., 2009; Cochran-Smith & Zeichner, 2005b). Part of the challenge in studying the impact of teacher preparation is that it is a complex enterprise that includes many potentially operative elements. For example, TPPs are engaged to varying degrees in the recruitment of potential teachers, selecting who will be trained, providing new educators content knowledge, transmitting professional knowledge, conveying professional values, teaching new educators professional skills, and selecting who has shown enough promise in preparation to be recommended for licensure. Despite the intuitive appeal that these activities should matter either in isolation and/or in how they are aggregated within programs, limited evidence exists that TPPs produce different student achievement results (Duckworth, Quinn, & Seligman, 2009; Goldhaber & Brewer, 1997).
Historically, one of the most important barriers to studying the impact of TPPs, in this case indicating the ensemble of activities above, is that data systems did not exist that made the necessary connections to complete these kinds of analysis. At a minimal level, data are needed that link students to themselves to provide longitudinal achievement histories, students to teachers, and new teachers to the programs that prepared them. In addition, other key data about students may be needed such as attendance data and/or disability status. Until fairly recently, data systems providing this web of longitudinal linkages have not been available. As these data become available, they provide researchers, teacher educators, and policy makers the opportunity to examine a dimension of TPP effects that it has not been possible to study previously: the extent to which programs, pathways, or practices in teacher education influence student outcomes as measured by state-administered standardized tests. In addition, TPPs prepare educators to work in content areas that are not tested, prepare educators who contribute to outcomes that are not assessed by state-administered standardized tests, and may contribute to other outcomes such as teacher persistence within the profession.
In addition to the challenges associated with data availability, examining whether there is variability across TPPs in the achievement of students taught by their graduates presents consequential methodological challenges (Goldhaber & Brewer, 1997). The interpretation of student achievement scores that are administered annually as reflecting the quality of their teachers’ work is not practical because current year achievement may primarily reflect prior achievement as well as intervening influences such as disability, attendance, or poverty (Anderman, Anderman, Yough, & Gimbert, 2010; Ballou, Sanders, & Wright, 2004; Hershberg, Simon, & Lea-Kruger, 2004; McCaffrey, Lockwood, Koretz, & Hamilton, 2003). In response to these challenges, value-added analysis methods have been developed to examine current educational inputs such as teachers or educational services after controlling for prior achievement and other key variables (Anderman et al., 2010; Lasley, Siedentop, & Yinger, 2006). Research using value-added methods for measuring teacher performance has grown dramatically (Papay, 2010), and these methods are increasingly used in the development of public policy (Heitin, 2011; McNeil, 2009; Olson, 2007).
The use and misuse of value-added assessments in evaluating the work of individual teachers is the source of an active professional and policy debate (Baker et al., 2010; Glazerman et al., 2010; Hanson, 1988; Harris, 2009; Raudenbush, 2004; Tekwe et al., 2004). Although space limitations preclude an inclusive review of the arguments supporting and challenging the use of value-added assessment to evaluate teacher effectiveness, a few critical points warrant review. Concerns have been raised regarding the precision and error rates in differentiating between teachers; variation across statistical models; the narrow focus of test-based, value-added assessments; narrowing of the curriculum; the impact of unmeasured school/community factors; and teacher demoralization (Baker et al., 2010; McCaffrey et al., 2003). Other authors have argued that the issue is not value-added per se, but how the information is used. They suggest that value added provides additional outcomes that have not been available, that design features/policy can be used to provide mitigation against errors, and that the use of value-added data can improve current teacher evaluation practices (Glazerman et al., 2010). Although some of these concerns may be equally relevant to assessing TPPs (e.g., unmeasured variables), it is important to note that examining TPPs differs from teacher assessment in at least two critical ways. First, it is based on the assessment of multiple and potentially many teachers, and second, those teachers can be observed across many schools and potentially many school years. The availability of data across teachers and contexts provides some opportunities for mitigation against some of the concerns regarding value-added assessment in schools.
Evaluating Teacher Preparation in Louisiana
Beginning in 1999-2000, Louisiana’s Blue Ribbon Commission for Teacher Quality identified 60 recommendations to recruit, select, prepare, and support quality teachers. The Louisiana Board of Regents (BoR) approved new policies (see Noell & Burns, 2006, for further information) that required all universities to redesign their TPPs, and the Board of Elementary and Secondary Education (BESE) approved new policies that created greater rigor for teachers to become certified to teach in Louisiana. Universities were required to create teams involving college of education, college of arts/sciences, school/district, and other representatives who were responsible for creating redesigned programs that contained more rigorous content courses that were aligned with state/national content standards, more rigorous pedagogy courses that were aligned with state and national teacher standards, and more clinical experiences that allowed teacher candidates to practice skills earlier in their programs and as a result, over a longer period of time. Concurrent with their recommendations for improving teacher quality, the Blue Ribbon Commission recommended that a Teacher Preparation Accountability System be implemented that examined (a) institutional performance (i.e., Praxis passage rates, new teacher satisfaction ratings), (b) quantity (i.e., quantity of program completers and quantity of program completers in teacher shortage areas), and (c) achievement of students taught by TPPs’ new teachers. Although evaluation of TPPs in terms of the first two items had been ongoing, a data-based system for linking student achievement to teachers and new teachers to TPPs had not previously existed in Louisiana. As the professionals engaged in this study examined methods to link student achievement and teacher preparation, interest in the viability of value-added analyses emerged and was reinforced by the emergence of data systems that would support this type of analysis. Around the year 2003, interest in value-added methods as an additional source of evaluative information regarding TPPs in Louisiana emerged as a result of the State’s ongoing work to strengthen teacher preparation.
The initial issue examined within the value-added analyses in Louisiana was whether there was sufficient variability in teacher effectiveness between TPPs after providing extensive controls for student history, classroom composition, and school composition to make its use desirable in the evaluation of TPPs. In addition, propensity matching studies were undertaken as one test to examine the possibility of bias emerging based on the classes and schools new teachers served (Noell, Porter, & Patt, 2007; Noell, Porter, Patt, & Dahir, 2008). Analyses finding no or very limited variability across TPPs would suggest that focusing analytic and policy work on the quality of teacher preparation would have limited merit. The value-added work within Louisiana also provided a context for examining variability in achievement patterns across content areas within TPP. Whether a TPP’s effectiveness estimates differed among content areas or whether the effects were generalized to the program could provide information that would assist in evaluation of programs. The intent of the work was to examine whether differences could be demonstrated between TPPs that would then set the occasion for subsequent work to examine the potential differential impact of the numerous features around selection, preparation, and endorsement for certification. The remainder of this article describes the procedures that were employed under the sponsorship of the Louisiana BoR to examine the relationship between TPP completion and student achievement outcomes.
Method
Participants
The data sets describing students, teachers, classes, and schools were merged using procedures described in the following sections. Standardized achievement test data for all students in the state who took the assessments were in the original data set, after which eligibility for the analysis was determined. Based on prior research examining the ratio of variability within programs to variability between programs, a minimum threshold of 25 teacher-by-year results were required for results to be reported for TPPs (see Noell et al., 2007). A combination of 3 years of data was used to develop the estimates reported here. For example, a TPP might have had only 10 new teachers (i.e., in their first or second year of teaching) who were teaching in tested grades in a tested subject in their areas of certification in a single year. However, each year, achievement data, new teacher data, classroom data, and school data were merged to determine how many new teachers that year could contribute to the TPPs effect estimate in the overall analysis. When TPPs’ total number of new teachers in a given content over 3 years of data was 25 or more, that TPP was included in the analysis.
Eligibility for analysis
Each year’s records were limited to students who completed at least one content area assessment in Grades 4 through 9. The state collects achievement data in Grades 3 through 9, and this allowed for the use of 1 year of prior achievement scores for each student in each content area. In state databases, teachers are linked to their students at only the beginning of the academic year each year. As a result, it was necessary to determine which students were taught by their beginning of the year teachers for the bulk of the year. Students and teachers linked in the analyses were enrolled or teaching at the same school, based on employment (teacher attendance) or student attendance records from September 15 to March 15. The Louisiana school year begins in mid- to late August, and preliminary analyses revealed significant shifting in enrollment records during August, leading to the choice of September 15. For the years reported herein, standardized assessment was typically conducted in early to mid-April.
It was also necessary for a student to have been promoted 1 grade ahead to be included in the analyses. The meaning of their achievement scores is clearly different for a student who took the same test 2 years in a row versus one who took two different tests over the same 2 years. Following all of the linking and application of eligibility rules, almost 80% of the achievement records with which the analyses began remained in the database. Students were excluded predominantly due to grade retention and changing schools, which prevented linking them to teachers. For each content area, between 162,500 and 237,000 students contributed to the analyses per year. These students were taught by between 5,100 and 7,300 teachers, in 1,050 to 1,250 schools. Table 1 shows how two representative databases were attenuated as a result of cleaning and matching.
Cases Available for 2008 Analysis in Mathematics and Science
Note: The percentage in parentheses within each cell is the percentage of the total records available for analysis in that content area at that stage of database construction.
Research Design
Given the nature of the study question and data, an experimental design was not possible. This study examined the degree to which coefficients for recent program completers from specific TPPs varied across institutions and content domains. The data were fit within a hierarchical linear model (HLM; Raudenbush & Bryk, 2002); students were nested within teachers who were nested within schools (illustrated within Figure 1). Predictor equations were developed for the student, teacher, and school levels of the model based on procedures described in the appendix.

Nesting structure of students within teachers/classrooms and teachers/classrooms within schools
Assignment of teacher level predictors
Teachers were designated as members of specific teacher groups depending on several factors. First, teacher experience data and certification data were used to designate teachers with less than 2 years teaching experience as new teachers as long as they possessed full certification status and were certified in the relevant content area. Based on prior work looking at gains in teaching efficacy over time, for purposes of these analyses, new teachers were defined as teachers in their first 2 years teaching post program completion (Noell et al., 2007; Noell et al., 2008). In addition, teachers had to be teaching within 5 years of completing their TPP to be included.
New teachers were coded as a completer of a specific TPP based on the Title II report data that TPPs submit to the BoR. For these analyses, programs within institutions are differentiated. For example, some universities have an undergraduate and a practitioner pathway to certification. TPPs in Louisiana consist of (a) Undergraduate Programs, (b) Practitioner Teacher Programs, (c) Master’s Degree Programs, and (d) Non-Master’s Certification-Only Programs (described in the following). The TPPs represented in the analysis are listed according to their program and institution in Table 3.
All Louisiana pathways to teacher certification require candidates to address the same content and teacher standards, albeit in differing structures. They offer courses in the following major areas: knowledge of the learner and learning environment, methodology, and internship/student teaching. All programs or pathways also require completers to pass the same Praxis examinations to become certified to teach. The undergraduate pathway to teaching requires a minimum of 180 hr of direct teaching experience in clinical settings prior to student teaching and a minimum of 270 clock hours in student teaching with at least 180 hr spent actually teaching. Credit hours must be earned at a regionally accredited college or university. To be admitted to alternate pathways (i.e., Practitioner Teacher Program, Master of Arts in Teaching, and Certification-Only Program), candidates are required to pass all Praxis I Basic Skills examinations (or an equivalent), pass Praxis II Content examinations, and possess a baccalaureate degree from a regionally accredited university. The Practitioner Program is a fast-track program that can be completed in 1 year after completing 21 to 33 credit hours. The program begins with an intensive summer preparation program followed by acting as the teacher of record as candidates complete their training. There is no degree awarded at its completion. The Master of Arts in Teaching often requires candidates to meet the requirements for admission into a graduate program at that university and consists of 33 to 39 credit hours. The Certification-Only Program requires 27 to 33 credit hours for certification as a teacher and does not result in a degree. Most Master’s and Certification-Only Programs permit candidates to serve as teachers of record at schools while being paid as full-time teachers and meeting the requirements for their internships under the supervision of school and university personnel.
Teachers who had full certification status in a content area and had been teaching more than 2 years were designated as experienced, certified teachers. This code was omitted, serving as the reference group. Practitioner teachers were teachers of record who have not yet completed their TPP and are not yet certified to teach. Teachers who either had no certification or were certified but teaching out of their certification area were designated as not certified.
Student achievement
Test databases included scores for the iLEAP and the LEAP in five content areas: English-language arts (ELA), reading, mathematics, science, and social studies. The iLEAP is a version of the Iowa Tests of Basic Skills (ITBS) that has been modified to include Louisiana curriculum-specific items. The LEAP is the Louisiana Educational Assessment Program, and is used for high-stakes testing in Grades 4 and 8. All other included grades take the iLEAP. The iLEAP and LEAP were reviewed against content standards by a committee of experts who confirmed alignment with state standards and content validity. Cronbach’s alpha was used to assess reliability for all grades and content areas and was described as excellent (iLEAP α ranging from .81 to .95; Louisiana Department of Education, 2008a; LEAP α ranging from .86 to .92; Louisiana Department of Education, 2008b). Scores for all students who took the tests at the spring administration were included in the databases and raw scores were converted to z scores by year, grade, and content area to create consistent scaling across years, grades, and content areas.
For purposes of reporting results, the outcome variable was scaled up to a standard deviation of 50. This standard deviation was chosen because it is the most common approximate standard deviation of scaled scores across content areas, grades, and years; it places the TPP outcomes in a metric that is familiar to educators in Louisiana. In all analyses, experienced certified teachers were the reference (omitted indicator) group, so that results for TPPs are relative to experienced certified teachers.
Student demographic data
State school records for students’ gender, race, special education status, attendance, and free lunch status were included in the analysis. All student variables are listed in Table 2.
Student, Teacher, and School-Level Variables
Course and class data
Course codes were reviewed and categorized based on their primary content. For example, seventh-grade English was categorized as ELA and algebra was categorized as mathematics. Courses such as jazz band did not receive a content category and were not included in the analyses as these courses are not aligned to the five areas assessed by the standardized assessments.
In addition, summary demographic data regarding teachers’ class composition were included in the database. For example, the percentage of their class receiving special education services and class prior year mean achievement in all content areas were included in the analyses. The number of absences each teacher had during the year was included as well. These variables are listed in the second column of Table 2.
School measures
Summary school demographic data such as mean prior year achievement in each content area and the percentage of students receiving free or reduced price lunch were included in the models. These variables are listed in the third column of Table 2.
Procedures
Linking databases
Multiple distinct annual data systems were linked connecting new teachers to TPP, teachers to students, teachers to their descriptive data, test data to demographic data, students to themselves across years, and students to teachers within years. Due to the number of data sources, years, and the complexity of the merge process, substantive collaboration was necessary with the Louisiana Department of Education and the Louisiana BoR.
Once teachers, students, courses, and schools were linked, each content area database was checked for students who had multiple teachers in a subject. If one student had two different mathematics courses taught by two different teachers, that student contributed to the estimated classroom result at one half the weight of students for whom that was their only course in mathematics. This convention was also used for situations in which two teachers were instructors of record for one course due to a team-teaching situation.
Analytic models
The analysis is based on HLM (McCulloch & Searle, 2001; Raudenbush & Bryk, 2002) using achievement scores from the State’s mandated testing programs in Grades 3 through 9; demographic data for the students, teachers, and schools; attendance data for teachers and students; teachers’ certification information; and the TPP they completed. The result estimates the degree to which students taught by new teachers from specific TPPs achieve more or less than would be predicted based on an extensive set of student, class, and school predictors in ELA, reading, mathematics, science, and social studies. Additional technical information concerning the analytic procedures may be found in the appendix.
Results
There were sufficient data to provide estimates for between 7 and 10 programs per content area. The number of programs for which results are available was reduced due to a historical artifact resulting from the State’s redesign of TPPs (see Noell & Burns, 2006, for a description). This analysis was completed for a period during which the State was transitioning from its original programs to redesigned programs. The decision was made by the BoR to provide data only for currently operational programs (i.e., post-redesign), which meant that all programs had not yet completed the transition and were producing sufficient graduates from their post-redesign programs to be included in the analysis.
The transition from the original programs to the redesigned programs created another issue. Within each content area, six to eight programs were alternate certification programs and only one or two were undergraduate programs. This occurred because alternate programs generally went through the redesign process first and required less time to complete. The imbalance between undergraduate and alternate programs in these data is a result of the reality that it took longer to transition undergraduate programs and longer to complete those programs. Data from a period further removed from program redesign should not evidence this imbalance.
For all data discussed, the results represent student achievement in Grades 4 through 9. For each content area, a mean new teacher effect estimate was calculated. Data are reported herein using the same 68% confidence intervals (CI) that were included in the original reports to the Louisiana BoR. The 68% CI was adopted in that work for several reasons. First, the estimate obtained for each institution was the population of all program completers in the tested grades and subjects. As a result, typical uncertainty deriving from sampling issues was attenuated. Second, because the intent of the analyses was to develop data for program improvement (formative data), a more aggressive CI was adopted because not acting when improvement was needed was weighed as a more serious error than acting somewhat too often. Third, the measurement was repeated annually (repeated observation) providing an opportunity for any aberrant results to be identified and interpretations corrected. Fourth, prior experience with these measures over annual reports demonstrated a degree of stability that exceeded the level that would be expected based on even the 68% CI.
Effects are reported relative to average student achievement for each content area for experienced certified teachers, which is the x axis or 0 points. In other words, if a program’s effect estimate in mathematics was 1.0, this would indicate that students taught by new teachers from that program scored 1 point higher than would be predicted for those students based on prior achievement, demographics, and attendance in an experienced certified teacher’s class, based on an achievement test whose mean is 300 and standard deviation is 50. TPP coefficient estimates may be found in Table 3. They are represented graphically in Figure 2.
Teacher Preparation Program Coefficients with 68% Confidence Intervals in All Content Areas

Coefficient of difference between predicted and actual achievement for teacher preparation programs
Average New Teacher Effects
All teachers with 2 or fewer years experience were averaged to produce the mean effect for new teachers. Average new Louisiana teachers’ effect estimates in ELA, mathematics, and reading clustered between −2.7 and −2.9 points. Average new teachers scored closer to experienced teachers in science, where the effect estimate was −1.4 points. In social studies, the effect estimate for new teachers was −2.1 points.
TPP Effect Estimates
Results were obtained for exemplars of all four types of TPPs in Louisiana: two undergraduate programs, two Master of Arts in Teaching alternate certification programs, two Non-Master’s Certification-Only alternate certification programs, and four Practitioner Teacher Programs—two at institutions of higher education and two private, nonuniversity providers. Figure 2 presents the TPP results. It is important to recognize that at least two reasonable standards for comparison are available. One could choose to compare programs to the average experienced certified teacher (the 0 point). Comparison with this referent would determine the extent to which new teachers from specific TPPs obtain student achievement results that are distinct from average veteran teachers. Alternatively, one could choose to compare results to the average new teacher result in that content area. Comparison to this referent essentially asks how program completers fair compared with other new teachers. Choice of the comparison point changes which programs will be identified as outlying.
For programs with results in multiple content areas, TPP coefficients predominantly clustered in ranges that were either similar to average new teachers or they clustered predominantly in a range that was distinct from average new teachers. In the case of programs whose coefficients were generally higher than the average new teacher result, in some content areas, the 68% CI did not overlap with the average experienced certified teacher result either (the 0 point), and in one case, the 95% CI would not have overlapped with the 0 point (private practitioner TPP 2, mathematics).
The three programs whose coefficients were predominantly higher than the average new teacher result and whose 68% CI did not overlap with the average new teacher result were Private Practitioner TPP 2, Masters Alternate Certification Program 1, and University Practitioner Program 2. All three programs had one content area with a result whose CI did overlap the average new teacher effect (Private Practitioner Program 2, social studies; Master’s Alternate Certification Program 1, mathematics; and University Practitioner Program 2, mathematics).
University Practitioner Program 1 had mixed results with three areas whose CI overlapped with the average new teacher estimate and in two content areas in which the 68% CI did not overlap with the average new teacher. Four programs’ coefficients were predominantly similar to other new teachers (CI overlapping the average new teacher result). These were Undergraduate Programs 1 and 2, Non-Master’s/Certification-Only Program 1, and Private Practitioner Program 1. In addition, among these programs, two TPPs had results that were lower than the average new teacher result and for which the 68% CI did not overlap with the average new teacher result. This occurred for Private Practitioner Program 1 in reading and the Non-Master’s/Certification-Only Program 1 in ELA. It is interesting to note that in both of the cases, the result would also fall outside the 95% CI for a comparison to experienced teachers.
Discussion
This study describes the output of 1 year’s analyses of a systematic approach to examining student achievement outcomes for recent program completers across TPPs in Louisiana. The analysis was made possible through the integration of distinct data systems from the Louisiana Department of Education and the Louisiana BoR and was built on longitudinal data regarding student achievement as well as demographic variables such as attendance, free lunch status, and disability status. Similarly, completing the analyses modeled here also required data regarding teacher certification and experience in addition to TPP completion data. Analyses were completed using a three-level HLM with students nested within teachers nested within schools and TPP coefficients extracted at the teacher level. Results demonstrated considerable overlap in CI between programs, with some programs having coefficients whose CI did not overlap with substantive anchors such as the average new teacher or the average experienced certified teacher in that content domain with either a 68% or a 95% CI. One of the critical challenges that this sort of data create is arriving at reasoned and reasonable decision-making rules for a context in which measures are repeated annually, potentially provide information for program improvement, and include the population of new teachers from a program in tested grades and subjects. This is a decision-making context that diverges from the typical hypothesis testing research context in a number of important ways.
Although contributing to student achievement is a hallmark of teaching, an inadequate literature base exists directly linking student achievement outcomes to teachers, teaching, or how teachers are prepared (Boyd et al., 2009; Cochran-Smith & Zeichner, 2005a). One of the notable gaps in the literature is the extent to which teacher preparation matters. It is certainly within the realm of possibility that individual teachers vary widely in effectiveness, but that this variation at the individual level is not related in a systematic way to the preparation programs that recruited, admitted, prepared, and recommended them for professional licensure. Although it is possible, it seems somewhat improbable. The absence of a compelling literature base regarding variability in teacher preparation may have more to do with the complexity of studying it than with a lack of interest in the topic on the part of researchers. Studying the contribution of teacher preparation to student achievement requires elaborate longitudinal databases that link students, teachers, and preparation across multiple years. It also requires data across multiple programs to make comparisons. The availability of data that permits study based on large representative administrative databases is a relatively recent phenomenon.
Although value-added methods for evaluating educational inputs have existed since at least the 1970s, their complexity, computational challenges, and data demands have limited their application to problems of teacher preparation. Value-added analyses have emerged with dramatically increasing popularity in education over the last two decades as the methods have become more widely understood, computational resources have become more readily available, and the necessary data have become more available (Papay, 2010). They have used previous achievement, student demographic variables, and contextual variables to estimate the contribution of a range of educational inputs, including individual teachers in its most controversial application (McCaffrey et al., 2003; McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004; Todd & Wolpin, 2003). This study illustrates a method for extending this type of analysis to the study of teacher preparation.
One interesting challenge in examining the methods and data is conceptualizing what independent variable is being studied or should be studied. Is the independent variable simply the TPPs that vary in a variety of ways? This conceptualization might be of interest to policy makers who would like to identify programs whose graduates obtain unexpectedly positive or negative contributions to student achievement. Another possibility is that the independent variable should be conceptualized as pathways to certification. For example, are practitioner programs more effective than undergraduate programs in preparing teachers? Although the data presented here do not contain enough information on enough different programs within each pathway to answer that question, they do provide clear cautionary data. Examination of the differences between Private Provider Practitioner Programs 1 and 2 suggests the possibility of considerable variability within pathways. A final possibility is interest in features within preparation programs (e.g., Boyd et al., 2009). For example, what is the relative importance of the amount of supervised practice with feedback provided to candidates versus the degree of selectivity of program admissions? The challenges underlying this sort of contrast are particularly daunting given the measurement challenges surrounding program features and obtaining sufficient controls to isolate the impact of individual program features. This type of analysis was beyond the scope of the current study.
These data also illustrate a potential process for providing TPPs feedback on the achievement of students taught by new graduates. These types of data provide one key element of continuous improvement models, the ability to obtain repeated measurements of a relevant meaningful outcome of interest (Reusser, Butler, Symonds, Vetter, & Wall, 2007). This is not to argue that it is the only outcome that should be of interest to teacher educators. However, if sufficient controls are in place that policy makers and teacher educators trust is a fair evaluation and results are stable or have clear trends over time, they provide the potential for a substantive new tool in program improvement by providing feedback in a domain in which it has not been available in the past.
Louisiana’s BoR is using value-added analysis to provide TPPs with information on how students taught by recent program completers are fairing on standardized achievement tests. This has occasioned some programs to examine their programs in domains for which they were dissatisfied with their results either relative to their performance in other content areas or relative to the state to identify potential points of program improvement. The difficulty of this task has highlighted a key challenge underlying this work. As with all value-added data, the results do not answer why a particular result occurred or what might be done to improve on it; rather, all it does is provide feedback on performance, focus program improvement efforts, and provide a benchmark that helps sustain a focus on continuous program improvement (Hart & Bogan, 1992; Reusser et al., 2007). It is interesting to note that at least two programs that engaged in the sort of self-study occasioned by the data identified plausible hypotheses regarding actions that they could and did take to improve their results.
Limitations and Future Directions
There are relatively few TPPs that have yet had data sufficient to report publicly. New undergraduate programs have a minimum of 4 years from the time they are redesigned to produce their first graduates, and those graduates have to teach for a year before their data can appear in the analyses. New alternate programs generally take less time to complete, so their graduates appear in the schools earlier than their undergraduate colleagues do. As Louisiana moves further from the redesign process, more of the new programs’ graduates will contribute to the analyses and produce effect estimates for an increasing number of programs. This creates the inevitable tension between disseminating what is known now and withholding information until some later date when more complete information is available.
In addition, the analyses reported herein reflect only those teachers in tested grades and subjects. These analyses do not provide any information regarding early elementary teachers or teachers in subjects such as art, band, or physical education. It is exceedingly unlikely that data will become available to use this sort of framework to examine the preparation of teachers in these domains. Subsequent distinct work is needed to develop practical methods for TPPs to obtain information regarding the impact of these educators on student outcomes of interest.
The absolute magnitudes of the coefficients for TPPs are not large. However, given that the coefficient represents the deviation from predicted achievement for large groups of students taught by new teachers from specific programs in some cases, one could argue they may represent socially significant outcomes. Specifically, although a 5-point (0.1 standard deviation units) difference score for a single student may have limited importance, when that difference emerges in 2,000 students distributed across many schools, it may take on social significance. It is also important to recognize that these differences for current year total achievement emerge after controlling for an extensive series of predictor variables including prior achievement and this year’s result will in turn affect subsequent year’s expected outcomes. Finally, it is worth noting that socioeconomic disadvantage is widely accepted as having a socially significant impact on student achievement (Krovetz, 2008). In many cases, the TPP coefficients are two times the magnitude of the coefficients for free lunch status as an indicator of socioeconomic disadvantage (see Table 4).
Base Hierarchical Linear Model for Mathematics Achievement 2007-2008
The obvious core future work in Louisiana will be to continue to accumulate data to permit a broader representation of TPPs. The data raise the questions of why Private Practitioner Program 2’s graduates obtained mathematics achievement that was so much higher than predicted or why Private Practitioner Program 1’s graduates’ students did more poorly than predicted in reading. Although the TPPs have been informally engaged in self-appraisal with and without these data, it has become clear that unpacking the elements of TPPs that may link to student achievement is exceedingly challenging. In some ways, it is not much different from developing the value-added analysis of student achievement. TPPs have many moving parts that may be complimentary, compensatory, or conflicting. It is possible that rigor in either admissions or preparation may mask a weakness in the other domain. Similarly, the experiences that candidates have in field placements, student teaching, or internship may reinforce didactic preparation or it may actively encourage students to disregard the didactic preparation they were provided. In addition, the features of a program may play out very differently based on its day-to-day management and faculty, so two programs that have similar design may achieve very different results based on the program faculty. The authors’ point is not to suggest that there is not an urgent need for research on what aspects of TPPs support improved student achievement outcomes in K-12 schools. Much the opposite, we believe it is a critical need. Rather, we raise these concerns to surface the complexity and subtlety of the work needed to generate the sort of broadly applicable knowledge the field needs.
Footnotes
Appendix
Acknowledgements
The authors would like to thank Mike Collier; Allen Schulenberg; David Elder; the Division of Planning, Analysis, and Information Resources of the Louisiana Department of Education; and the Louisiana Board of Regents, without whom this work would not have been possible.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been supported by award CT-06/07-VAA-01 from the Louisiana Board of Regents, which was funded by a grant from the Carnegie Corporation of New York.
