Abstract
Principals are the second-largest school-based contributor to K–12 students’ academic progress. However, there is little research evaluating whether efforts to develop principals’ skills improve school effectiveness. We conducted randomized controlled trial studies of the impacts of a professional development program called the Executive Development Program (EDP) and of the incremental effects of coaching to help principals implement the EDP curriculum. We find that the EDP alone influenced principals’ practices, but not student achievement, within 3 years. Coaching had a small positive effect on students’ English Language Arts achievement, but no effect on math achievement or on principals’ practices. Coaching had the largest effects in disadvantaged schools. We hypothesize that coaching enhanced the quality of implementation of recommended practices.
Keywords
Given the important roles principals play, investments in improving their performance could be an efficient way to improve schools’ effectiveness. Accordingly, the Every Student Succeeds Act (ESSA), passed in 2015, allows states and districts to use federal funds to enhance the quality of school principals. However, there is relatively little evidence about what types of investments can enhance the skills or performance of principals and improve student learning, nor the conditions or contexts in which such investments may be more or less effective (Herman et al., 2017).
This article presents the results of two randomized control trials (RCTs) evaluating the impacts of the National Center on Education and the Economy’s (NCEE) Executive Development Program (EDP) and coaching on schoolwide practices and performance. Our study was designed to identify the separate effects of the year-long EDP course and of more intensive one-on-one coaching aligned with that curriculum. It is the first rigorous study of principal coaching of which we are aware to find effects on schoolwide student achievement and it is also the largest scale study to date of a principal professional development program. Our study as well as the provision of the EDP and coaching were funded by the U.S. Department of Education’s Supporting Effective Educator Development (SEED) program.
Building upon our prior research describing the implementation of the EDP and coaching (Wang et al., 2019), we examine variation in program participation rates across contexts, and we evaluate the effects on principal and teacher practices and on student achievement. We also explore which school characteristics or contexts appear to moderate these effects, and we place our findings in the context of previous research on principal professional development and coaching. In this way, our study provides new insights into the feasibility of improving the performance of current school principals through coaching and professional development activities, as well as the conditions that may be more or less conducive to implementing such programs successfully.
We begin by situating our study in the existing body of research about the effects of professional learning interventions for current principals. We then describe the intervention that is the focus of our study and our research questions. After detailing our data and methodology, we present our results. We conclude with a discussion of the implications of our findings.
Background
Given the dual focus of NCEE’s work on formal group learning for principals (i.e., the EDP) and one-on-one coaching for principals, we start by reviewing prior research about the impacts of each of these activities on student and school outcomes. We then discuss research about the characteristics associated with higher quality professional development more broadly.
Impacts of Professional Development for School Principals
Few studies of principal professional development have rigorously evaluated impacts on student academic achievement, and those that have yielded mixed results. Of the five experimental or quasi-experimental studies that we identified on this topic, two found positive effects of principal professional development on student achievement and three found none. For example, a 3-year RCT study of McREL International’s Balanced Leadership Professional Development Program (Jacob et al., 2015), a 2-year professional development program which spanned 126 rural public schools in Michigan did not identify effects on teacher practices or student achievement within 3 years, but did find impacts on principal-reported practices and a reduction in staff turnover. A separate RCT study of a 2-year principal professional development program in 100 elementary schools in five states similarly did not find evidence of impacts on school practices or achievement within 3 years (Herrmann et al., 2019).
The remaining three of five professional development studies are of the EDP itself. Two of these were quasi-experimental studies, one implemented in 38 schools in Massachusetts (Nunnery, Ross, et al., 2011) and one in 101 schools in Pennsylvania (Nunnery, Yen, & Ross, 2011). Both found sizable positive effects on student achievement within 3 years or less. Each of these studies compared the performance of schools that had volunteered for and fully participated in the EDP program to comparison samples of schools matched according to baseline school achievement and other characteristics. These programs also included a component of district-level leader professional development, which the current SEED-funded study did not. A more recent RCT study performed by this research team of the EDP paired with one-on-one coaching for novice principals in 323 middle schools in three states found no evidence of impacts on student achievement within 3 years and only limited evidence of impacts on principals’ practice (Master et al., 2020). In that study, a national pool of NCEE-trained coaches who were former principals provided the coaching, although only around one third of treatment-assigned principals fully participated in it.
Impacts of Coaching Provided to School Principals
There is even less research published in the last decade that has rigorously tested the impacts of principal coaching on school and student outcomes. With the exception of the previously mentioned study of the EDP combined with coaching (Master et al., 2020), there is a dearth of rigorous quasi-experimental or experimental studies evaluating the impacts of coaching on student achievement (Barnett & O’Mahony, 2008; Grissom & Harrington, 2010).
There is, however, some evidence of associations between coaching and principal effectiveness. Quantitative research on this question suggests that coaching can have a positive effect on principals’ behaviors such as supporting teachers’ leadership development, as reflected in actions such as sharing feedback results with teachers, emphasizing continuous improvement, and holding teachers accountable for supporting students (Goff et al., 2014; Grissom & Harrington, 2010; Warren & Kelsen, 2013; Wise & Hammack, 2011). Recent qualitative studies of various coaching programs also suggest that coaching can enhance principals’ abilities to lead school improvement efforts (e.g., Klar et al., 2019; Lackritz et al., 2019; Wang et al., 2019).
Characteristics of Higher Quality Professional Learning Programs
Given the limited evidence of impacts of professional learning programs for principals, we know very little about the contexts or conditions that may be conducive or necessary to facilitate effective principal professional development or coaching. There are, however, potentially useful insights from research about professional development more generally. Desimone and Pak (2017) distilled five research-based features of high-quality professional development, including (a) a focus on subject matter content; (b) opportunities for participants to observe, receive feedback, and engage actively; (c) coherence between content and goals of professional learning and school and staff goals, knowledge, beliefs, and context; (d) sustained duration, including at least 20 hours of contact time; and (e) collective participation from an interactive learning community. Other qualitative research has identified characteristics of coaching that is perceived by principals or observers to be more effective (Huff et al., 2013; Silver et al., 2009; Wise & Cavazos, 2017). These include receiving personalized coaching from an experienced administrator and one with a deep knowledge base; coaching that is tailored to the specific school context; sustained coaching that involve revisiting issues and action plans; and a degree of trust between principal and coach. More successful coaching implementation has also been associated with more rigorous screening, selection, and training of coaches (Barnett & O’Mahony, 2008; Hobson, 2003).
Overview of the EDP and of EDP-Aligned Coaching
NCEE’s Theory of Change
Informed by research on effective principal professional development, the EDP and coaching interventions were intended to enhance principal performance through sustained engagement and hands-on practice to help motivated principals apply core concepts over an extended period of time. NCEE’s theory of change for how these efforts would influence schoolwide outcomes is shown in Figure 1. As the left-most box in Figure 1 shows, NCEE views the EDP and coaching as complementary interventions, each aimed at developing the same skills and leadership practices of principals. The EDP is intended to influence principals’ practices through formal monthly group seminars coupled with principals’ efforts to implement a customized “Action Learning Plan” (ALP) aligned with the EDP curriculum in their own schools. One-on-one coaching from veteran principals trained as coaches and experienced at implementing the practices recommended by the EDP curriculum is intended to provide more personalized and sustained professional development. Coaching is expected to strengthen the implementation of principals’ ALPs through hands-on guidance, collaborative planning sessions, and constructive feedback. The EDP and coaching are also supported by providing principals access to readings about research-based practices and online tools to facilitate self-assessments of their leadership practices and of their schools’ needs.

NCEE’s logic model for the EDP and coaching.
Both the EDP and coaching are intended to develop leaders’ skills as strategic thinkers and as instructional leaders while also emphasizing the importance of establishing high standards and equitable opportunities for students in their schools. As described in the second box in Figure 1, the EDP and coaching are intended to influence both the direct leadership activities of principals—such as setting professional standards and goals and providing knowledgeable instructional coaching—and the strategic contributions of principals toward establishing systems that can support learning and drive change. The specific topics covered by the EDP are shown in Figure 2 in the next section. They include exposure to key concepts from research such as how students and adults learn, the characteristics of effective instruction in general and in specific subject areas, and recommended systems for schoolwide instructional improvement.

Topics covered by the EDP.
NCEE theorizes that the leadership changes shown in the second box of Figure 1 will, particularly after completion of the 1-year EDP program, improve the day-to-day core functions of the school shown in the third box. They anticipate that the changes in leadership should, over time, make instruction throughout the school more coherent and of higher quality and that the culture among teachers and students should also improve. Finally, the improvements in the culture and instructional quality referenced in the third box should result in higher student achievement, higher academic and behavioral expectations for students, improved student attendance, and decreased student disciplinary actions.
The horizontal box at the bottom of Figure 1 shows contextual factors, such as English language arts (ELA), school and principal accountability systems and the principal’s disposition and other traits, that NCEE hypothesizes will moderate the theorized effects of the EDP and coaching.
Implementation of the EDP
The EDP is delivered by an NCEE-certified facilitator over a total of 24 full-day sessions. The sessions typically occur on two workdays per month over 12 months. EDP sessions in this study were held regionally within states, usually in groups of principals coming from multiple districts. Figure 2 shows the topics included in the 12 EDP sessions.
At the end of the third month of the EDP, school leaders begin to develop their ALP for their school that outlines one or more goals and the strategies and action steps that need to be completed to reach those goals. School leaders refine the ALP throughout their professional learning and use it after they complete the EDP to guide their continued work on the topic the principal specified in their ALP. The following are examples of actual ALP topics that we profiled in a prior report on the implementation of the EDP that includes nine case studies (Wang et al., 2019): wider use of formative assessment practices in the school, improving school culture and climate, instituting small-group reading instruction, aligning the ELA curriculum across and within grade levels, and creating teacher Professional Learning Communities.
Implementation of Coaching
In this SEED-funded study, NCEE-trained coaches offered 60 or more hours of one-on-one coaching that was primarily conducted face-to-face in 3- to 4-hour meetings at the principal’s school over a period of up to 30 months. The large majority of the coaching occurred in face-to-face meetings, although it was also acceptable to NCEE for a minority of the coaching hours to occur via phone, email, and Web meetings. In a survey we fielded of the coaches, the coaches indicated they spent 70% as much time traveling to get to schools as they did in the actual meetings with principals.
At the in-person meetings, coaches spent the majority of the time on the principal’s ALP and using the ALP as the applied way for principals to put into practice the skills like strategic thinking that the EDP conveys. For principals who had completed the EDP prior to receiving coaching, coaches referred to the ALPs they had previously developed and worked with principals to further refine those plans.
Selection, Training, and Support for Coaches
A total of 48 coaches provided coaching to one or more principals in the study. These coaches held the title Distinguished Principals (DPs), which means they were either current or retired principals who were NCEE trained and certified to provide coaching. The DPs had worked in the same state as the principals they coached. We include more detail about the recruitment and training of the coaches in the Technical Appendix in the online version of the journal.
NCEE matched DPs to principals so that the coach had reasonable travel time to the principal’s school, experience in similar settings (rural, urban, suburban) and similar levels (elementary, middle, high) or similar student populations (e.g., majority students of color, high English Learner population) as the principal they were coaching. NCEE avoided matching coaches who were active principals in the same district as the coachee. Most (39) of the DPs were working principals and were assigned just one or two coachees. However, nine of the DPs were former principals who supported between 6 and 13 coachees each.
NCEE provided two forms of ongoing support to DPs during their 2 to 3 years of providing coaching to ensure consistently high-quality on-topic coaching: shoulder-to-shoulder coaching from NCEE-certified national coaches, who had prior experience coaching principals around the country, and Distinguished Principal Institutes, which were a series of 2-day professional learning and networking opportunities for DPs that took place quarterly for 1 year following the credentialing process. Each of those institutes was tailored to the needs of the DPs and had components focused on learning theory, high-performance organization and management, and effective coaching to support them in their current roles as both school leaders and peer coaches. In addition, NCEE also hosted separate monthly Web-based meetings for the DPs in each state to confer about coaching.
Research Questions
We designed this study to address the following four questions about participation in and the impacts of NCEE’s EDP and coaching interventions:
Study Design
Our study evaluated the effects of the EDP and of EDP-aligned coaching in school districts that volunteered to have their principals participate as part of a grant-funded opportunity. Districts in our sample were familiar with the EDP and many of the participating districts had principals who had previously completed the EDP. However, the districts did not pay for the programs themselves, as is typically the case with the EDP program. Thus, the implementation and effects that we measure may not generalize to a context where a participating district or school pays for the program and thus has more “skin in the game.” Instead, our study was intended to generalize to a policy context where an external provider (e.g., state or federal policymakers) provides a district access to the program for free.
As shown in Table 1, we used two randomized controlled trial studies to answer our research questions, separately testing the effects of providing (a) the EDP program alone and (b) the incremental effect of coaching. The samples of schools in the two RCTs were drawn from the same states and partially overlapped (see Table 1). In the first RCT, we randomly assigned schools and principals in participating districts to be offered the EDP either immediately (the treatment group) or else 28 to 31 months hence (the control group).
Treatment and Control Group Conditions and Initial Sample Size of All Study Schools K–12
Note. High schools were randomized in separate design blocks from nonhigh schools. EDP = Executive Development Program.
Delayed EDP started 28 to 31 months after the study began.
In the second RCT, we randomly assigned schools and principals to either receive the offer of coaching (the treatment group) or not (the control group). The second RCT was designed to test the incremental effect of offering coaching to a mixed sample of current and former EDP participants, with the expectation that the effects of coaching would be similar across these two groups. Principals who were eligible to be offered coaching thus included some principals from the treatment arm of the first RCT who had been assigned to participate in the EDP as well as additional principals from the same school districts who had already completed the EDP program in the recent past. NCEE coaches were available only in certain regions, so not all of the principals assigned to receive the EDP in the first RCT were eligible to be part of the second RCT that examined the effects of coaching.
A total of 139 school districts across three states 1 agreed to participate in either or both RCTs, with a total of the 779 unique schools participating in the EDP study, the coaching study, or both. The EDP intervention began for the treatment-assigned group in June 2016. The coaching intervention began for the treatment-assigned group between November and February of the 2016–2017 school year.
Of the 779 initially randomized schools, a small portion—56 schools in total—closed before the end of the study (N = 27), were randomized in error in spite of their principals being ineligible for treatment (N = 11), or lacked outcome data from statewide data sources (N = 18) and thus could not be included in some or all of our analyses. 2 A total of 583 study schools had student outcome data in any of Grades 3 to 8 and are included in our primary analyses of student outcomes. However, we included all participating schools spanning any Grades K–12 in our survey data collection and analyses.
Data and Analytic Samples
In this section, we describe our data, some of the key measures we constructed using these data, and our analytic samples.
Student and School Administrative Data
State Departments of Education in each of the three states provided us with data for all students and schools in this study to facilitate our analyses of the impacts on student outcomes. These data span school year 2014–2015 through school year 2018–2019, and data were provided in all years for all students who attended a study school at any time during this period. These data included student demographic and socioeconomic indicators, student achievement and attendance data, information about students’ classification as special education or English Language Learner students in each year, and enrollment records indicating the school(s) and grade levels students attended in each year.
Primary Grades 3 to 8 Analytic Sample
Our primary analysis of the effects of the EDP and/or coaching on schoolwide student achievement outcomes included students in tested Grades 3 to 8 only, as all three states had comparable achievement exams in only these grade levels. All students present in study schools and with achievement outcomes in a given study year were included in analyses for that year of the study. However, some study schools (including all high schools, which were randomized in separate design blocks) were not included in our analyses of effects on student achievement. Between our treatment and control samples for the EDP and coaching studies, most of the 10 baseline characteristics were not significantly different. However, in the coaching study, treatment schools on average had significantly lower enrollment size, averaging 514 students in the treatment group versus 670 in the control, and an F-test of joint significance indicated a significant difference between the samples in the coaching study (these differences are shown in Online Technical Appendix Table B.1). To account for this chance imbalance, however, our models control for all observable baseline school characteristics, including enrollment size.
In Table 2, we describe the characteristics of students and schools in the analytic samples for our evaluation of the effects of the EDP and of coaching on student achievement as of Study Year 3. The majority of the students in our study sample were White (55%–63% depending on the analysis), and most came from low-income backgrounds and were therefore eligible for free or reduced-price lunch (66%–67%). Relative to the nation’s public school student population, 52% of whom were eligible for free or reduced-price lunch as of SY 2016–2017, this sample reflected a more disadvantaged student population (National Center for Education Statistics, 2018).
Baseline Student and School Characteristics for the Sample Used in Our Estimates for Student Achievement Impacts in Schools Spanning Grades 3 to 8
Note. Unless otherwise noted, all statistics shown are as of school year 2015–2016. Sample includes only students with outcome data in Year 3. Analytic sample does not include schools such as high schools where no students attend Grades 3 through 8. School and student characteristics differ somewhat due to systematic differences in the characteristics of students who attend larger versus smaller schools. EDP = Executive Development Program; ELL = English language learner; SPED = special education; FRPL = free or reduced-price lunch.
The average total school enrollment (across all grade levels served) in our analytic sample of schools ranged from 531 to 550 students across the two RCT samples. Study schools varied in locale and included both rural and urban districts, but those schools that were in the coaching study included relatively fewer rural districts and had schools with somewhat larger average enrollment size and more students of color.
EDP and Coaching Participation Data
NCEE maintained attendance records for all EDP sessions and shared these data with the research team. These data allowed us to identify which of the 12 EDP units each study participant attended. The 48 NCEE coaches also maintained electronic logs that documented the number of coaching sessions each individual had, the mode and length of those sessions, and the topical focus of each session. We used these data to document the amount of coaching that each participating principal received. Finally, NCEE maintained and shared with the research team a study roster that tracked treatment-participating schools and school leaders during the period of the study. This allowed us to track when treatment-assigned principals left their schools. In addition, in most cases where principals opted to not participate or not participate fully in the EDP or coaching, the roster documented the reason (to the degree NCEE knew it) for this decision.
Survey Data, Samples, and Measures
We designed and fielded a baseline survey with control and treatment group principals in 2016, and a final survey to principals and another survey to up to 17 randomly sampled teachers at study schools in fall 2018. The final surveys are the focus of our outcome analyses. To create the surveys, we first reviewed several candidate surveys to include validated items in our surveys of principals and teachers. However, we found they did not directly capture the concepts covered in the EDP (see NCEE’s logic model in Figure 1), so we wrote most of the items. The principal and teacher surveys covered the following topics:
Leadership practices;
School culture;
School safety;
Standards, curriculum, instruction, and assessment;
Teacher collaboration, professional development, and leadership opportunities;
Principals’ views of the EDP and coaching (for those assigned to it); and
Respondents’ background characteristics.
Both the teacher and the principal surveys took approximately 30 minutes to complete. Additional details about the survey administration are set out in the Online Technical Appendix.
In Table 3, we list the timing of and types of the survey data that we collected to evaluate the effects of treatment on principals’ and teachers’ practice, alongside response rates. For the principal survey, response rates were very similar between treatment (64%) and control (62%) schools. For the teacher survey, treatment (97%) and control (97%) schools included at least one teacher respondent at a similar rate, but teacher-level response rates were somewhat higher for treatment school teachers (66%) as for control school teachers (59%).
Researcher-Collected Survey Data
Note. The survey response rate is the number of completed surveys divided by the number of eligible, invited individuals. Some schools and districts opted out entirely from participation in surveys.
The difference in teacher-level response rates between treatment and control school teachers was statistically significant at p < .001. bThis is the percentage of schools we surveyed where we had at least one teacher response. This is distinct from the percentage of eligible teachers we invited (up to 17 randomly selected teachers per school) that took the survey level. Additional details about the survey administration process are provided in the Technical Appendix in the online version of the journal.
Among principal respondents to our final survey, 84% self-reported as White, 15% as African American, and 1% as Hispanic. The average leadership experience at baseline was 7 years, with 5 years in the current school. Among teachers, 87% self-reported as White, 8% as African American, and less than 1% as Hispanic and Asian, with others not identifying their race/ethnicity. Thirty-nine percent were self-contained classroom teachers and 45% were subject-specific teachers. Many teachers taught multiple subject areas, with 54% teaching subjects such as ELA, reading, writing, or literature; 48% teaching mathematics; 36% teaching science; and 36% teaching social studies. We distilled the final teacher and principal survey data after examining the dimensionality and internal structure of the survey results by performing exploratory factor analyses (EFAs). The primary objective of these analyses was rank reduction (Alwin, 1973; Bollen & Lennox, 1991; Cronbach, 1976). Specifically, we used factor analysis as our primary approach to justify the creation of summary scales, reducing the number of variables that would be included in subsequent analyses. To the extent possible, we sought to form scales that included items that were related in ways that were consistent with theory and interpretation. In all, we identified 14 factors from the principal survey and 11 factors from the teacher survey. Additional details about our EFA methodology are provided in the Online Technical Appendix. In addition, Online Technical Appendix Tables D.1 to D.23 list the full wording and response scales of each of the survey items alongside each factor that we identified.
Method
Randomization Procedures and Implications for Analysis
We randomized study schools separately for the two RCTs. Our approach to randomization differed by RCT. In the EDP randomization, we employed a stratified randomization in which we sorted schools in each state into four achievement-related strata, separately for elementary and middle schools and for high schools, for a total of eight strata per state. We created the achievement strata based on publicly available school achievement data from the school year prior to randomization. Within each stratum, each school had an equal probability (50%) of treatment assignment.
In the coaching randomization, we created a randomly ordered lists of schools within each geographic region in which coaching was available. We provided the ordered lists to NCEE, which went down these region-specific lists in order, offering schools’ principals the coaching. If a treatment-assigned principal refused the offer, that school was treated as a noncompliant treatment school in our analysis, but NCEE then made the offer of coaching to the next principal on the randomly ordered list. Schools further down the list did not receive the offer of coaching once all available coaching slots were filled. We provide additional details about the process we used to create the lists in the Online Technical Appendix. The coaching randomization yielded varying probabilities of treatment across the sampling frame, which we account for in our analyses.
Our approach to randomization had implications for our resulting analyses. When analyzing the effects of offering the EDP, all schools had an equal probability of treatment assignment and therefore it was not necessary to include strata fixed effects in our statistical models. However, as a guardrail against any imbalance in the analytic sample and to increase the precision of our estimates of the effects of the EDP, our models include controls for baseline school achievement and an indicator for high school status (as well as other characteristics). We opted not to directly control for the individual strata indicators because a lack of available outcome data for some randomized schools led to some imbalanced strata that did not include both treatment and control schools in some analyses. However, as a specification check, we confirmed that our results were very similar regardless of whether we used strata fixed effects (see Online Technical Appendix Table B.3).
When analyzing the effects of coaching, we had to account for schools’ varying probabilities of assignment to receive the offer of coaching. To do so, we weighted student observations based on their schools’ probability of treatment assignment. We used these design weights in all analyses of the effects of coaching. This approach is consistent with What Works Clearinghouse (WWC, 2020) Standards.
Confirmatory and Exploratory Contrasts
For each of our primary research questions about the impacts of the EDP and of coaching, we conducted only one prespecified confirmatory test within each of the two subject area domains (i.e., ELA or mathematics) using pooled samples across the three states. We treated these as tests of distinct outcomes in each domain, consistent with WWC guidelines and Schochet (2008). We did not correct for multiple comparisons, as we only made one comparison per tested subject area domain for each treatment contrast.
For both the EDP contrast and the coaching contrast, our prespecified confirmatory tests were of impacts as of Study Year 3. For the EDP contrast, there was a risk of potential bias in our estimates in Year 3 because the control group received the offer to participate in the EDP as early as the fall of Study Year 3. However, we determined that our confirmatory test would be of impacts as of Study Year 3 (as opposed to Year 2), conditional on our not identifying significant effects of the EDP in Study Year 1. In other words, we sought to test the impacts of the EDP after the longest possible period of implementation available within the constraints of the grant period, but were cognizant of the potential for our Year 3 estimates to be influenced by the control group beginning the EDP program in that year. As we found no significant effects of the EDP in Study Year 1, we retained Study Year 3 as our confirmatory test year.
We also conducted a range of exploratory analyses of impacts on both achievement and on a large number of factors from teacher and principal surveys. These analyses were meant to inform hypothesis generation around interpretation of our primary findings. For this purpose, we did not formally test whether individual exploratory results could meet a rigorous standard of robustness for our multiple comparisons. Instead, we acknowledge here and elsewhere in the article that it is entirely possible that our exploratory findings may include some false-positive or false-negative findings, and that they should be interpreted with caution.
Intent-to-Treat (ITT) Methodology
To answer our confirmatory research question (RQ2), we used a school-level ITT analysis to identify the effects of offering the EDP and of offering coaching to principals. We focused on schoolwide average student outcomes rather than outcomes only for the individual students initially present in study schools at the time of randomization. This allowed us to measure the effects of the EDP and coaching on the largest possible sample of schools with available student outcomes. As a consequence of our analytic approach, the students included in our analyses varied in terms of the amount of time they were enrolled in their study school following random assignment. As of Study Year 3, approximately 78% of students in our analytic sample were students present in a study school at the time of initial randomization, and the average student had attended their study school for 2.5 years.
We conducted our ITT analysis using multivariate linear regression models estimated using student-level data. We estimated models separately to estimate the effect of the EDP-only and the incremental effect of coaching. For the EDP-only model, we used ordinary least squares, and for the coaching model, we used weighted least squares with the design-based weights. We fit separate (i.e., cross-sectional) models for each of the three treatment years, which yielded year-specific effects for each outcome.
Our overarching approach was to compare outcomes for schools assigned to each treatment and control condition while controlling for pretreatment observable student and school characteristics. The primary pretreatment covariates were the school-level average ELA and math test scores of students who were tested in these schools within 2 years before treatment was offered. A second set of pretreatment covariates were the school-level value-added to ELA and math scores from the year before treatment (details of how we estimated value-added are provided in the Online Technical Appendix). We did not control for student-level baseline scores because they were missing for a large number of students who were not in a tested grade level in the baseline period. Because we pooled math (or ELA) scores across all grade levels tested in that subject, we included in the model grade-level fixed effects, and we also included fixed effects for each of the three states. In addition, when estimating effects of the EDP-only, we controlled for whether or not schools had been randomly assigned to be offered coaching.
The general form of our ITT model can be expressed as follows:
where
We conducted two sets of sensitivity checks to ensure the robustness of our results to alternative analytic approaches. The first one entailed using two-level hierarchical linear models with students nested within schools as an alternative approach to account for the clustering of students within schools. The second one employed a backward elimination procedure for selecting a more parsimonious set of school-level covariates. This procedure was potentially informative for subgroup analyses that included a smaller number of schools than the pooled analyses. We concluded that our results are indeed robust to these alternative approaches.
Treatment-on-the-Treated Methodology
The ITT analysis provided an unbiased estimate of the effects of offering the treatment to study school principals. However, not all principals chose to take up the offer of treatment, and others either left their school or stopped participating partway through. To gauge the effects of actively participating in the treatment, we also performed a treatment-on-treated (TOT) analyses. This analysis can yield unbiased estimates of the effects of compliance with the offer of treatment if we assume that the treatment schools that never took up the treatment were unaffected by the offer and the effect on the few control schools that did take up the treatment is not systematically different than the treatment schools that took up the treatment (Angrist et al., 1996).
We calculated our TOT effect estimates by dividing the estimated ITT effect by the proportion of schools in the in the sample who participated to any extent in the treatment (either the EDP or the coaching, depending on the analysis). This is akin to the Wald estimator or Bloom adjustment for noncompliers in an experimental study (Bloom, 1984). 3 However, a TOT analysis is complicated to interpret when schools’ degree of “compliance with treatment” is on a sliding scale, as was the case in this study in which principals in the treatment group could and often did partially complete the EDP or coaching. In this context, our TOT estimate which estimates effects of any participation in treatment should be considered as a lower bound for the effects of full participation in treatment. In describing the extent of participation in the EDP and coaching treatments and when interpreting the TOT effect estimates, we distinguish between partial and full implementation, based on criteria that reflected NCEE’s goals for principals’ participation in each intervention. 4 NCEE viewed full participation in the EDP treatment as a principal completing at least 10 (of 12) EDP units when offered the EDP and who completed the desired 60 hours or more of coaching for those offered it (or both, when offered both).
Exploratory Analyses of Specific Hypothesis
In cases where we had a specific hypothesis related to a potential moderator of treatment effects, we conducted our ITT and TOT analyses separately for individual subgroups of interest. We also ran these analyses in pooled models using interactions terms to test whether any differences in estimated treatment effects between groups were statistically significant. In particular, we explored whether principals experienced different effects of coaching depending on whether they were taking the EDP concurrently or had completed the EDP in the past. We also explored whether principals experienced different effects of the EDP depending on whether they were in regions in which coaching was offered. These groups differed markedly in the rate at which they participated in the treatment, which suggested possible differences in implementation quality or principals’ motivation, which we hypothesized might moderate treatment effects. 5
Exploratory Causal Forest Analyses
In addition to exploring specific theory-driven hypotheses about subgroup effects, we also looked for evidence of heterogeneity in both the coaching and EDP treatment effects using a machine learning approach inclusive of all school and student characteristics observed in our data. These included school baseline achievement and attendance levels and school and student demographics, socioeconomic indicators, school size, schools’ initial EDP/coaching eligibility, and state indicators. In the absence of clear hypotheses around which of these many characteristics might moderate treatment effects, we used a machine learning technique developed by Athey, Imbens, and Wager (Athey & Imbens, 2015, 2016; Athey & Wager, 2019) and implemented by Davis and Heller (2017). 6 This technique, referred to as “causal forest analysis,” uses an algorithm to identify differences in treatment effects that correspond to the observed characteristics of schools and students.
Each iteration of the technique involves randomly splitting the study sample into two portions and using one portion of the split sample to identify characteristics of schools and students where treatment effects appear largest, whereas the other portion is used to estimate the treatment effect at the resulting splits. In each iteration, the procedure runs through all of the school and student characteristics and sequentially splits the identification portion of the sample across those characteristics, starting with the characteristic that provides the largest variation in treatment effects between the resulting two subsamples (Athey & Imbens, 2016; Wager & Athey, 2018). This process continues until no further splits on observable characteristics produce substantive differences in predicted treatment effects. Using this causal forest methodology, we identified which school characteristics were associated with larger treatment effects, and we generated individual schools’ predicted treatment effects as a function of those characteristics. We then used our standard regression models to estimate the observed treatment effects corresponding to individual schools’ forest-predicted effects.
Analyses of Survey Outcomes
When analyzing the effects of the EDP and/or coaching on survey-reported outcomes, we examined the effects of the offer of treatment. In other words, we compared survey responses between all treatment-assigned schools with principal or teacher respondents in comparison with all control-assigned schools with respondents. However, we could not be sure that our analyses of survey outcomes were unbiased or fully representative of our randomized samples of schools because not all schools had principals or teachers who opted to complete our surveys. Therefore, we deem these analyses nonexperimental. We conducted survey analyses at the school level, using either principal responses linked to their school or teacher responses linked and aggregated at their school. To at least partially account for potential bias related to survey nonresponse, we generated and used weights for the teacher and principal survey analyses that we estimated as a function of baseline school and principal characteristics.
For both the teacher and principal survey outcomes, we conducted regression analyses similar to those described for our student achievement analyses, but that included the response weights to compare survey outcomes for the relevant treatment and comparison samples. We analyzed each principal or teacher survey factor or item of interest in a separate regression and included controls for available school-level covariates defined in the pretreatment school year. When analyzing effects on the principal survey responses, we also controlled for baseline principal responses on the same factor or item from our 2016 principal survey administered at the same school. Finally, when analyzing the effects of the offer of coaching, we accounted for varying probabilities of treatment assignment by controlling for the contributing design variables that jointly influenced probabilities of coaching treatment assignment.
Additional Methodological Details and Analyses
In the Online Technical Appendix, we include additional technical details on how we handled missing data, about our approach to randomization, about our monitoring of overall and differential school attrition rates, and about the baseline equivalence of our randomly assigned samples. School attrition rates were low and similar across treatment and control conditions in both RCTs. In the Online Technical Appendix, we also detail our approach to conducting the surveys, our methods for analyzing whether effects on survey factors mediated effects on achievement, and the extent to which the principal and teacher survey factors were correlated with measures of schools’ value-added to student achievement. The results of these analyses related to survey factors are discussed briefly in the following section.
Results
RQ1: Participation in the EDP and Coaching
The degree of participation in the offer of EDP and of coaching varied across schools, driven both by districts’ and principals’ willingness to participate in the programs and principals transferring to different schools’ mid-stream. In the first section of Table 4, we show the participation of the 301 principals offered the EDP. Overall, we found that 66% of principals offered the EDP fully participated in the sense of attending at least 10 of its 12 units according to NCEE’s attendance records, with another 11% partially participating.
Extent of Principal Participation in the EDP and in Coaching When Offered, Overall and by Eligibility Groups
Source. NCEE rosters.
Note. EDP = Executive Development Program; NCEE = National Center on Education and the Economy.
We also describe in the first section of Table 4 the EDP participation rates separately for schools that were included in the coaching study, versus schools in regions where no NCEE coach was available and that thus could not be included in the coaching study. The differences in EDP participation rates were significant and substantial across these two groups, with 75% of principals located in coaching-eligible regions fully participating, whereas only 47% of principals fully participated in regions where no coaches were available. However, this difference across regions was not related to whether an individual principal was offered coaching (offers to participate in coaching were not made until 4 to 6 months after principals began participating in the EDP). In coaching-eligible regions, when principals did not fully participate, NCEE records indicated that it was usually due to them transferring to a new school. In contrast, in coaching-ineligible regions, most nonparticipation was due to either individual principals turning down the offer of the EDP or of whole districts withdrawing from the project prior to the start of the EDP. We hypothesize that the availability of an NCEE-trained coach within a region was a proxy for a prior relationship between NCEE and at least one district within that region. In other words, the places where NCEE had enough institutional ties to identify and train a coach were also the places where there was more district buy-in and familiarity with NCEE and the EDP.
Next, in the second section of Table 4, we show the participation behavior of school principals offered coaching, both overall and separately for those offered coaching concurrent with the EDP versus those who had previously completed the EDP. Across both groups, 58% fully participated in coaching, another 23% partially participated, and 19% opted not to participate. 7 The average coaching participant received 58 hours of coaching over an average of 16.4 months, according to National Institute for School Leadership (NISL) coach logs.
The pattern of participation in coaching was significantly and substantially different between prior and current EDP participants. Among principals currently engaged in the EDP, 73% fully participated in coaching, whereas among former EDP graduates, only 42% fully participated. NCEE records of the reasons for nonparticipation indicated that most of the difference related to former EDP graduates declining the offer of coaching at a higher rate than current EDP participants. It appears that principals who were offered coaching as a stand-alone offer were more selective in whether or not to participate and in the extent of their participation.
RQ2: Effects of the EDP and of Coaching on Student Achievement
As shown in Table 5, we found no significant effects of the offer of the EDP on student achievement in ELA or mathematics. The estimated effects of offering the EDP were directionally negative (although statistically insignificant) in both ELA and math, particularly in Years 2 and 3.
Estimated Effect of Offering the EDP and/or Offering Coaching on Schoolwide Student Achievement in Grades 3 to 8, by Study Year
Note. EDP = Executive Development Program; ELA = English language arts.
p < .05. **p < .01. ***p < .001.
The offer of coaching, when provided to principals who were either currently participating in the EDP or who had previously completed the EDP, had a small significant (0.036 effect size, p = .041) positive effect on student achievement by Year 3 in ELA, but not in math. The size of the incremental effect of the offer of coaching was equivalent to shifting students 1.4 percentile points higher in the distribution of student achievement in ELA (e.g., moving the median student from the 50th percentile in ELA performance to the 51st percentile). Our corresponding estimate of the effects of treatment on the treated (not shown) indicated that principals who participated to any degree in coaching increased schoolwide ELA achievement by 1.7 percentile points by Year 3, with potentially larger impacts for those who participated fully in the coaching.
Finally, for the subset of study schools that took part in both the EDP and coaching studies, we also examined the combined effects of offering both the EDP and coaching concurrently. Relative to schools whose principals were offered neither the EDP nor coaching, schools that were offered both treatments experienced a small significant (0.045 effect size, p = .022) positive effect on schoolwide student achievement in ELA by Year 3, but not on math, and with no significant effects as of Year 2. The Year 3 effect is equivalent to shifting the average student approximately 1.8 percentile points higher in the distribution of ELA achievement. Subsequent analysis of the impacts of treatment on the treated indicated that actual participation to any degree in both the EDP and coaching concurrently increased schoolwide ELA achievement by 2.1 percentile points by Year 3.
RQ3: Heterogeneity in the Effects of the EDP and Coaching on Student Achievement
In addition to addressing our primary research questions about the overall impacts of the EDP and coaching, we also conducted a range of exploratory analyses intended to generate hypotheses that might help to explain our primary findings and suggest directions or considerations for future research. In this section, we summarize our most noteworthy exploratory findings related to the characteristics of schools associated with heterogeneity in the effects of the EDP or coaching on student achievement. However, given the large number of hypotheses we considered, there is a reasonable chance that some of these exploratory findings could be false positives or negatives due to random chance and as such they should be interpreted with caution.
Heterogeneity in the Effects of the EDP
By and large, across a wide range of school, student, and principal characteristics that we examined, we found only limited evidence of heterogeneity in the effects of the EDP. However, as shown in Table 6, we did identify a significant difference in the estimated effects of the EDP as a function of whether schools were in a region of the study where NCEE coaches were available and were thus included in the coaching study.
Estimated Effect of Offering the EDP on Schoolwide Student Outcomes in Grades 3 to 8, for Schools in Regions With and Without NCEE Coaches
Note. EDP = Executive Development Program; NCEE = National Center on Education and the Economy; ELA = English language arts.
Indicates a subgroup-specific result that, in a pooled model, was significantly different from estimated effects of the same program for the rest of the sample.
p < .05. **p < .01. ***p < .001.
For the majority of study schools located in regions that were part of the coaching study (Sample A), the estimated effects of the offer of the EDP-only were slightly directionally positive but not significantly different from zero. However, in regions where NCEE coaching was not available (Sample B), we found a significant negative effect of the offer of the EDP on ELA achievement in both Years 2 and 3. In a pooled model, this negative effect estimate in Year 2 in these coaching-ineligible regions was significantly different from the estimated effect of the EDP in coaching-eligible regions. The estimated effect of the EDP in coaching-ineligible regions was −0.076 standard deviations in ELA by Year 3, which corresponds to a 3.0 percentile point decrease in student test scores. The corresponding estimated effects of any degree of participation in the EDP for these schools was a 5.4 percentile decrease in ELA achievement.
The estimated negative effects of participation are large in part because only 53% of principals participated in the EDP in these regions, and we assume that any effects are related to the participation in the program and not simply due to the offer of the program. These results suggest that in regions where NCEE did not have as much prior history working with districts, the EDP had negative effects on school effectiveness.
Heterogeneity in the Effects of Coaching
Next, we present the most noteworthy results of our exploration of heterogeneity in the effects of coaching. We did not identify evidence of significant heterogeneity of coaching effects on math achievement. However, through our causal forest analysis, we did identify a cluster of school characteristics that were associated with substantial heterogeneity in coaching effects on ELA achievement.
In Table 7, we rank order the top five school-level characteristics that best predicted heterogeneity in coaching effects in our forest analysis. We also show the mean characteristics of the top one third of observations in our sample identified as having the largest predicted treatment effects, relative to the remaining two thirds of our sample that the forest analysis predicted to have smaller treatment effects. 8 The forest analysis predictions indicated that disadvantaged schools—as measured by poverty, as well as schools with lower achievement levels and more students of color—benefited the most from coaching.
Characteristics of Schools With the Largest Effects From Coaching
Note. Covariates are listed in descending rank order, where rank order is defined as the amount of standardized variation in the covariate across quantiles. ELA = English language arts; FRPL = free or reduced-price lunch.
To validate these predictions, we compared the forest-predicted school average treatment effects with effects that we identified using our standard regression models. In Table 8, we show the forest-predicted treatment effects alongside observed treatment effects for the top one third of schools with the highest forest predictions, relative to the bottom two thirds. The offer of coaching in the schools with the top tercile of forest-predicted treatment effects had an observed treatment effect of 0.079 standard deviations in ELA, significantly larger than effects in the remaining schools and more than twice as large as the overall average estimated effects of coaching in ELA. The estimated effects of offering coaching for this top tercile of schools correspond to approximately an average 3.2 percentile increase in student ELA achievement (i.e., moving the median student from the 50th percentile in ELA to the 53rd percentile). The estimated effects of any degree of participation in coaching for these top tercile of schools was a 3.8 percentile increase in ELA achievement. Variation in effects of coaching appeared directionally similar in math, but there was not a significant difference in math effects across the terciles. These results suggest that NCEE coaching was more effective in schools that serve more lower achieving, low-income, and non-White students.
Comparison of Forest-Predicted and Observed Treatment Effect Heterogeneity
Note. No tests of significance are shown for the forest-predicted treatment effects. ELA = English language arts.
Indicates a subgroup-specific result that, in a pooled model, was significantly different from estimated effects for the test of the sample.
p < .05. **p < .01. ***p < .001.
RQ4: Effects of the EDP and Coaching on Survey-Reported Outcomes
Next, we summarize the results of our exploratory analyses of the effects of the EDP and of coaching on school instructional practices and school culture as measured by surveys of principals and of teachers in the fall of Study Year 3. Our exploratory analyses of survey outcomes included 25 distinct factors and four individual items, which we organized into two sections: (a) principal leadership practices and (b) school instructional practices and climate. Given the large number of factors we considered as potential outcomes in these exploratory analyses, we caution that some findings could be the result of random chance.
Effects on Principal Leadership Practices
Looking across both the EDP and coaching treatments, only the EDP showed any evidence of possible effects on principals’ and teachers’ reports about principals’ leadership. Results for the effects of the EDP are shown in Table 9. Across 11 factors, the only significant effect teachers in schools where the EDP was offered reported being more frequently observed by an administrator. In separate exploratory analyses, shown in Figure 3, we found that this effect was not simply a uniform increase in observations, but instead involved a statistically significant shift (as measured via a Kolmogorov–Smirnov test for equality, with p = .006) in the distribution of observations, with fewer treatment-assigned schools doing infrequent observations, but also fewer treatment-assigned schools conducting very frequent observations. The EDP appears to have encouraged principals to prioritize regular observations of teachers, but in moderation.
Effects of Offering the EDP on Principal Leadership Practices
Note. Principals from 347 schools responded to the principal survey and are used to estimate effects of offering the EDP-only. Of these, the full participation analysis excludes schools in which the responding principal did not fully participate as well as control schools that did participate, leaving a total of 261 schools. The corresponding sample sizes for the teacher survey are 507 total schools (each with multiple teacher respondents per school), of which 415 schools are included in the analysis of effects in schools in which a principal fully participated. EDP = Executive Development Program; P = principal survey; T = teacher survey.
p < .05. **p < .01. ***p < .001

EDP-only treatment and control group comparison of the distribution of the teacher survey factor “administrators observe teachers’ classrooms.”
EDP effect estimates across a variety of other factors, including teachers’ reports of receiving actional feedback, and principals’ reports about the use of student assessment data and the alignment of the curriculum, were directionally positive but not statistically significant. Only two thirds of those offered the EDP fully participated in it, which means that our estimates of the effects of the offer of treatment are attenuated relative to the effects of full participation.
We found no significant effects of coaching on principal leadership practices, nor any clear pattern of directional differences in the point estimates. These results are provided in Online Technical Appendix Table C.1. However, given the smaller sample sizes associated with the coaching contrast, we note that we were less well powered to detect any true effects on survey-reported outcomes that may have been small in size.
Effects on School Instructional Practices and Climate
Looking across 18 dimensions of school instructional practices and climate we measured, we see more evidence of effects from the EDP, and again no evidence of effects of coaching. As shown in Table 10, for those offered the EDP there is a cluster of effects related to increased instructional quality, such as teachers in treatment schools agreeing more strongly than teachers in control schools that the school has features of high-quality academics. Principals in schools offered the EDP also reported giving higher priority to NCEE priorities like devoting significant time in a majority of staff meetings to discuss instruction (as opposed to logistics), addressing student preconceptions during instruction, and getting students to articulate concepts they are trying to learn. Teachers in schools where principals were offered the EDP were also more likely to prioritize formative assessment. Separately, principals offered or fully participating in the EDP were less likely to indicate that their school offered smaller classes to struggling students, which may or may not be an indicator of less academic tracking or less use of pull outs for lower performing students.
The Effects of Offering the EDP-Only on School Culture, Academics, and Practices
Note. A higher score for “School is unsafe and disorderly” corresponds to greater disorder/disaffection. EDP = Executive Development Program; NCEE = National Center on Education and the Economy; P = principal survey; T = teacher survey.
The three NCEE-aligned features are (a) “Instructional collaboration among the teaching staff,” (b) “The clarity of vision for the direction of our school,” and (c) “The focus on narrowing the achievement gap.” bThis item refers to percentage of teachers who selected both “Adapting teachers’ daily instruction based on data from frequent, short, informal assessment” and “Providing regular feedback to students about their work to help clarify classroom learning goals” among most important assessment practices. Principals from 347 schools responded to the principal survey and are used to estimate effects of offering EDP-only. Of these, the full participation analysis includes 261 schools. The corresponding sample sizes for the teacher survey are 507 total schools (each with multiple teacher respondents per school), of which 415 are in schools in which a principal fully participated.
p < .05. **p < .01. ***p < .001.
We found no significant effects of coaching on principal leadership practices, nor were there any apparent trends in terms of directionality of the point estimates. These results are detailed in Online Technical Appendix Table C.2.
Mediation Analyses
Finally, in supplemental analyses that are detailed in the Online Technical Appendix, we explored whether impacts on survey outcomes mediated impacts on student achievement, and whether scores on survey factors were correlated with schools’ value-added performance. We found no evidence that the impacts of the EDP on survey-reported outcomes mediated impacts on student achievement, neither overall nor in the subgroups of schools defined by their coaching eligibility. However, we did find that scores on many of the principal and teacher survey factors, including some affected by the EDP, were significantly correlated with schools’ value-added performance in the same school year. These results are detailed in Online Technical Appendix Table C.3.
Limitations
A key limitation of our study is that our estimates of the effects of the EDP can be assumed to be fully unbiased only in Study Years 1 and 2, as these are fully prior to the control group being offered the EDP in fall of the third study year. In Year 3 of the study, those control group principals who started the EDP (which approximately 65% of control group principals did) would have completed on average about five of the 12 EDP units as of the time students took the standardized tests in April 2019, which is the final student achievement measure in the study.
Although most of the control group started the EDP in Year 3, we still believe these Year 3 estimates are the most policy relevant because (a) we hypothesized that participation in the EDP would take time to demonstrate effects on students; (b) the control group had only completed roughly half of the year-long EDP sessions by the time of our final student achievement measures and only one of 12 EDP units by the time of our final survey measures; and (c) we did not see clear evidence of effects of the EDP alone on student outcomes in the first year when principals first received the EDP and therefore we do not expect Year 3 effect estimates are likely to be biased by control group principals starting the EDP in Year 3. Nevertheless, we cannot rule out the possibility that our estimates of the impacts of the EDP (or of the EDP concurrent with coaching) in Study Year 3 may be biased. Our estimates of the effects of coaching do not share this limitation, however, as the control group in the coaching study never received coaching.
Separately, our survey analyses have an additional limitation. Some principals and teachers did not participate in our surveys, and those that did not may well be systematically different from those that did, making it harder to generalize from the survey results to the entire set of schools in the study. For this reason, our survey findings are substantially more subject to potential biases than our experimental evaluation of the interventions’ impacts on student achievement outcomes.
Discussion
Ours is the largest study to date of the impacts of a principal professional development program on student achievement. It is also the first study, experimental or otherwise, to find effects of providing coaching to principals on student achievement. Our results, as well as a comparison with prior studies of the EDP, also provide new insights about the potential mechanisms by which principal professional development activities impact schools, and the conditions in which they may be more effective.
However, several of the findings from this study illustrate the sobering difficulty of improving school leader effectiveness at scale through professional development. These findings include the small size of the impact of coaching on achievement relative to the substantial effort and resources required to provide extensive amounts of one-on-one coaching, the lack of impacts of the EDP alone on achievement, the puzzlingly mixed results from EDP versus coaching on principals’ practices, and apparent heterogeneity in effects across different school contexts. Nevertheless, in focusing on principals rather than teachers, even resource-intensive professional development models like this one may still be cost-effective and thus worth pursuing—perhaps especially when there is school and district buy-in, as we detail next.
Summary of Findings
The EDP itself appears to have affected principals’ practices in some ways that its developer, NCEE, intended. For example, teachers in treated schools reported that principals conducted more classroom observations and they described greater personalization of instruction to students through formative assessment practices, both of which have been associated in prior research with improvements in student learning (Grissom et al., 2013; Kingston & Nash, 2011). Principals reported focusing more on NCEE’s core concepts related to narrowing achievement gaps, increasing instructional collaboration among staff, and providing the school with clarity of direction.
However, in contrast to some prior studies (Nunnery, Ross, et al., 2011; Nunnery, Yen, & Ross, 2011), the EDP alone did not increase student achievement within three school years of the start of the 12-month EDP program. Thus, while the intervention appears to have changed some principals’ practices, these changes in practice do not appear to have been sufficient to influence student achievement. It is also possible that principals spent more time on the practices recommended by the EDP but that the training alone did not help them to implement those practices skillfully.
While the EDP influenced principal practices but not achievement, we surprisingly found the opposite for coaching. Yet, principals did view coaching highly favorably, providing almost universally positive feedback about it. For example, in interviews of 41 EDP-trained principals, many said that their coach helped them think more strategically and intentionally (Wang et al., 2019). The lack of coaching effects on survey-reported practices may indicate that our survey did not probe the right mechanisms by which coaching influenced practice. For example, coaching may have helped improve the quality with which principals implemented EDP workshop-induced changes. Implementing new leadership practices after learning about them in a workshop format may not be as effective as cocreating and implementing a new instructional leadership system in your own school with the help of an expert coach.
The NCEE coaching required substantial time and resources to implement. Coaches provided an average of 58 hours of support to participating principals and spent around 40 additional hours on travel plus additional time spent in planning and preparation. Coaches also had to be selected, trained, and supported—processes that required significant oversight and strong connections between NCEE and veteran principals in participating districts. Large-scale implementation of a comparable coaching program in other contexts would require active coordination of coaches’ activities and access to experienced principals or former principals who share a coherent vision of school instructional leadership.
Although coaching demanded much from participants and its effects were small, intensive interventions of this type still have the potential to be cost-effective due to the large number of students affected by each participating principal. Although we were not able to gather data on the costs of the NCEE coaching program as part of this study, estimates from comparable interventions suggest that intensive principal coaching of this type might be expected to cost schools in the range of US$5,000 to US$15,000 per treated principal, in addition to the costs to schools of the EDP itself, which has been priced at US$4,000 in previous research (Gates et al., 2019; Lochmiller, 2014; Nunnery, Yen, & Ross, 2011). As a point of comparison, instructional coaching for teachers, a promising intervention with typical costs estimated to be in the range of US$3,000 to US$5,000 per teacher, has demonstrated average effect sizes on student achievement roughly 5 times larger than the effects we identified of principal coaching on ELA achievement in this study (Kraft et al., 2018). However, because the average principal influences around 25 times as many students as are in an average teacher’s classroom (National Center for Education Statistics, 2018), a principal-focused program with small effect sizes such as those we observed has the potential to be similarly cost-effective as teacher coaching.
Hypotheses About Differential Take-Up and Effects of the EDP and Coaching
Importantly, the most disadvantaged schools appeared to benefit the most from coaching. These results may indicate that principals who felt their schools had the most to gain from coaching were also the most engaged or the most likely to implement recommended practices. It may also be the case that the EDP-aligned practices emphasized by the coaches are particularly useful in more disadvantaged school contexts.
It is curious that the effects of coaching were only apparent in ELA and not in math. While the NCEE curriculum addresses effective instructional practices in both subject areas in the EDP and in coaching, it is possible that the ELA content was of higher quality or more useful to principals. However, we think the more likely explanation is that there were positive effects in both math and in ELA, but they were too small for us to reliably detect in each subject. We find some evidence for this hypothesis in the larger positive (although still not significant) effect estimates in math for coaching in disadvantaged schools, similar to the differential effects identified in ELA in these schools (as shown in Table 8 earlier in this article).
School principals in districts where NCEE had a limited historical footprint were less likely to take up the offer of the EDP, and in these regions, we found negative effects of the program. One way in which the EDP might reduce student achievement would be the substantial amount of time the program requires participating principals to spend outside of their school (24 workdays to be precise). However, the negative effects we observed in this subgroup did not manifest in the year in which principals participated in the EDP and thus do not clearly support this hypothesis. Conversations with NCEE suggest that the result could also reflect lower district buy-in to the EDP, and even some resistance to some elements of the program, such as its emphasis on comparisons of the United States with international benchmarks and on redistribution of school resources to disadvantaged students. It is possible that principals may have struggled to effectively implement EDP-aligned reforms in contexts where they or their staff were conflicted about elements of the program. Alternatively, their emphasis on unpopular reforms may have crowded out other effective school leadership efforts that would have otherwise taken place.
The Importance of District Buy-in
Comparing the findings from this study with prior studies of the EDP further suggests that district and principal buy-in could be important predictors of positive effects. Two prior studies (Nunnery, Ross, et al., 2011; Nunnery, Yen, & Ross, 2011) found positive effects of the EDP alone (coaching was not a part of either study) on math and on ELA, with effects more than 3 times as large as those we identified in ELA. In these two studies, the school districts and sometimes the state department of education paid for the EDP, rather than receiving the intervention for free via a grant such as was the case in our study. Also, district leaders in those two prior EDP studies received some professional development and thus directly participated as well. Finally, districts in those studies sought volunteers to participate in the EDP as part of a districtwide initiative, rather than the superintendent assigning only a portion of principals to attend the EDP immediately, as was the case in our study.
Having the intervention paid for by federal, rather than local or state funds, and assigning principals to take the EDP may have reduced district and principal investment in the programs. In another recent study of the EDP and coaching where we found no significant effects, funding for the program was similarly not paid for by districts and principals’ rate of participation was very low for the coaching component of the intervention (Master et al., 2020).
Qualitative evidence about the implementation of the EDP and coaching also reinforces the importance of district buy-in. Interviewees from case study research about principals participating in the EDP and coaching (Wang et al., 2019), as well as interviews of NCEE state coordinators, indicated that district leaders’ support for principals to take the time to leave their buildings to attend the EDP was an important signal to principals about the value of investing their own time and effort in the EDP or coaching.
In summary, the results from our study and a comparison with prior research suggest that principal professional development and coaching are difficult interventions to implement effectively at a large-scale and across diverse contexts. In particular, they may not be as effective when assigned or when offered freely with limited commitment required from participants. Looking across all four studies of the EDP to date, including this one, effects have generally been more positive in contexts where principals and districts were familiar with the EDP and its recommended practices, volunteered to participate, demonstrated a desire for the professional development by paying for it, and/or where principals may have felt a greater need to transform school performance such as in low-performing schools.
Therefore, we recommend that developers of principal professional development, technical assistance providers, and funders should prioritize first getting robust principal and district buy-in for professional development and coaching. Extensive professional development like the EDP and coaching requires a substantial investment of principal time, which makes principals’ and their supervisors’ endorsement of the professional development and attendant reforms to school practice all the more important. Successfully identifying and enlisting principals and schools in districts that are motivated to adopt new professional development may be necessary if this investment is to reliably improve outcomes for students. In line with this, we suggest that future research on principal professional development programs document not just principal or school staff participation in the professional development, but also district championship of it. This championship could appear in district communications to principals, districts’ commitment of its resources to the professional development, and, equally importantly, supports like release time for principals to participate in the professional development or encouragement from principal supervisors. It would also be important to know districts’ plans to selectively or universally adopt the professional development throughout their district.
Supplemental Material
sj-docx-1-epa-10.3102_01623737211047256 – Supplemental material for Developing School Leaders: Findings From a Randomized Control Trial Study of the Executive Development Program and Paired Coaching
Supplemental material, sj-docx-1-epa-10.3102_01623737211047256 for Developing School Leaders: Findings From a Randomized Control Trial Study of the Executive Development Program and Paired Coaching by Benjamin K. Master, Heather Schwartz, Fatih Unlu, Jonathan Schweig, Louis T. Mariano, Jessie Coe, Elaine Lin Wang, Brian Phillips and Tiffany Berglund in Educational Evaluation and Policy Analysis
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors prepared the work as employees of RAND Corporation.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Authors
BENJAMIN K. MASTER is a policy researcher at the RAND Corporation. His research focuses on issues of human capital and workforce development, typically in K–12 school contexts.
HEATHER SCHWARTZ is the program director of Pre-K to 12 Educational Systems within the Education and Labor division at the RAND Corporation. She researches education and housing policies intended to reduce the negative effects of poverty on children and families. She also codirects the American School District Panel.
FATIH UNLU is a senior economist and the director of Labor, Workforce Development, and Postsecondary Education Program at the RAND Corporation. His project portfolio includes evaluations of interventions that aim to increase students’ access to and success in postsecondary education, career and technical education programs, and alternative routes to teacher and principal preparation. He has also published on a wide array of research methodology topics.
JONATHAN SCHWEIG is a social scientist at the RAND Corporation. His research focuses on education policy and the measurement of school classroom and school climate, including teacher working conditions.
LOUIS T. MARIANO is a senior statistician at the RAND Corporation. His education research includes evaluation of policies and practices aimed at improving student outcomes and experimental and quasi-experimental design methodology.
JESSIE COE is an associate economist at the RAND Corporation and a core faculty member at the Pardee RAND Graduate School. She is trained as a methodologist with specializations in panel data methods, generalized method of moment (GMM), missing data, and, most recently, discrete choice analysis. Her application interests include program evaluation, poverty alleviation, food security, and maternal and child health outcomes.
ELAINE LIN WANG is a policy researcher at RAND Corporation. Her research focuses on English language arts curriculum and instruction, and system-level factors that influence student learning and achievement in this area, including the quality of school leadership.
BRIAN PHILLIPS is a senior quantitative analyst at the RAND Corporation. His research focuses on educational program evaluation, transitions to postsecondary education and employment, and workforce development and management.
TIFFANY BERGLUND is a quantitative analyst at the RAND Corporation. Her research addresses issues of college access, school improvement, and military recruiting resource allocation.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
