Abstract
Background:
Italy is a country showing low math achievement, especially in the Southern regions. Moreover, national student assessments are recent and rigorous policy evaluation is lacking. This study presents the results of one of the first randomized controlled trials implemented in Italian schools in order to measure the effects of a professional development (PD) program for teachers on student math achievement. The program was already at scale when it was being evaluated.
Objective:
Assessing the effects of a PD program for math teachers on their students’ achievement and making suggestions for future policy evaluations.
Design:
A large-scale clustered randomized control trial has been conducted. It involves 175 lower secondary schools (sixth - eighth grade) in four among the Italian lowest performing regions. Alongside national standard math assessments, the project collected a wide amount of information.
Subjects:
Math in lower secondary schools.
Measures:
Math achievement as measured by standardized tests provided by the National Education Assessment Institute (Istituto Nazionale per la Valutazione del Sistema di Istruzione e Formazione); teacher and student practices and attitudes collected through questionnaires.
Results:
Findings suggest that the program had no significant impact on math scores during the first year (when the program was held). Nonetheless some heterogeneity was detected, as the treatment does seem “to work” with middle-aged teachers. Moreover, effects on teaching practice and student attitudes appear.
Conclusion:
Some effects attributable to the intervention have been detected. Moreover, this project shows that a rigorous approach to evaluation is feasible also in a context lacking attention towards evidence-based policies, such the Italian school system.
Introduction
In many countries, the drive to improve education has triggered a season of rigorous research on what kinds of instructional practices, curricula, and interventions are effective. Italy is still lagging behind for several reasons: Data on student achievement are limited and there is widespread aversion to testing and little tradition of evidence-based policy evaluation. Given the relative weakness of Italian students in international assessments on math and science (i.e., Trends in International Mathematics Science Study [IEA-TIMSS], and Organization for Economic Cooperation and Development-Programme for International Student Assessment [OECD-PISA]), there has been a recent boost in initiatives to help schools and teachers to improve student achievement (many of which supported by European Union [EU] funding), and there is an urge to understand their effectiveness. Educational research today clearly agrees on the fact that teachers have a fundamental influence on student results and are crucial to improve students’ achievement (Scheerens 2000; OECD 2009; Goe 2007; Rivkin, Hanushek, and Kain 2005; Hanushek and Rivkin 2012) and also their future labor market outcomes (Chetty, Friedman, and Rockoff 2011). Teacher effects have also been estimated through experiments and quasi experiments, leading to results consistent with previous observational analyses (Nye, Konstantopoulos, and Hedges 2004; Kane et al. 2013). Notwithstanding the influence of factors such as socioeconomic background, family, and school context, student learning is influenced by what and how teachers teach (e.g., Ma and Papanastasiou 2006; Muijs and Reynolds 2010).
The average age of Italian teachers is among the oldest in Europe, and they did not necessarily acquire specific teaching skills after during or after University. Recruitment is centrally based; teacher mobility is driven essentially by seniority and by the position teachers have acquired in provincial lists more than on school’s choice. High teacher turnover leads to self-selection of the more experienced teachers in their preferred schools, and usually in those with better-performing students (Barbieri, Rossetti, and Sestito 2011). In this context, there are several issues suggesting that a general improvement of the existing staff is possible through in-service training. Indeed, professional development courses (PD hereafter) can activate two important levers to increase student achievement: first, the introduction of new skills, especially among the older and not specifically trained teachers; second, the development of teacher communities within schools, an element often associated with effective schools (Anderson 2004; Ma and Ma 2004; Scheerens 2000; Stoll et al. 2006). Improvement for teachers already in service might be sought through training following different approaches by helping them to understand more deeply not only the contents they teach and the ways students learn but also by providing them with alternative solutions, methods, and materials to present these contents.
This study investigates the effects of a specific teacher PD program called M@t.abel. 1 The program, supported by the Italian Ministry of Education, covers a substantial fraction of the lower secondary national school curriculum. 2 The program deals with math topics belonging to four different thematic areas (numbers and algorithms, geometry, relations and functions, probability and forecasting 3 ). More precisely, the aim of the intervention is to improve student knowledge and cognitive processes regarding algorithms and procedures, forms of math representation, problem solving, measurement tools, logic thinking, and quantitative information. One of the most important components of the program involves the training of tutors (by now about 100 “expert” teachers), who are then responsible for the training of almost 4,000 teachers nationwide, thus representing one of the most popular and extensive teacher PD projects implemented to date (Sinclair et al. 2010). After several years of implementation especially in Northern Italy, M@t.abel is being eagerly promoted in the four regions of Southern Italy receiving EU funding (the so called “PON” regions).
According to international and national tests the regions of Southern Italy show the lowest levels of math achievement in Italy. The OECD-PISA 2009 findings reveal that one out of three students is unable to properly master most elementary and routine tasks in math (the ratio is only 1 of the 10 in Northern Italy); 4 TIMSS surveys 2007 and 2011 show that while performing above the international average in fourth grade in math, there is a significant fall in eighth grade, placing Italy among the more poorly performing countries (Mullis et al. 2008, 2012). In the 2010 Istituto Nazionale per la Valutazione del Sistema di Istruzione e Formazione (INVALSI) 5 national assessment on sixth-grade students, the share of correct answers in Southern Italy was on average 4 percentage points less than in Italy as a whole. Available empirical evidence suggests that the differences in student performance between Northern and Southern Italy accumulate over time: while narrow in primary school, the gap widens in lower secondary school. Educational literature explains this North–South gap by focusing on economic and social resources available in the communities where schools are placed (Ministry of Education and Ministry of Economy 2007; Bratti, Checchi, and Filippin 2007; Fondazione Agnelli 2011). More recently, the OECD-PISA 2012 confirmed the difficulties characterizing Southern Italy where, once again, math achievement remained very low, despite the moderate increase at national level achieved during the last 3 years.
Using a randomized control trial, we seek to detect whether the M@t.abel program makes a measurable difference in promoting student achievement and attitudes and in modifying teaching practices. To our knowledge, this is a new attempt in the Italian school system 6 and it meets standard requirements for the identification of rigorous evidence in the field of education. 7 There are other reasons why this evaluation experience is worthy of international attention. Quite unusually, the intervention and its evaluation are both held at scale and not through a small size or pilot study. Hence, we are observing and assessing a PD program facing the real constraints of everyday school life and not a fancy implementation in an artificial setting. This situation allows us to test the effectiveness of M@t.abel while avoiding all the issues related to scaling up interventions. Moreover, our evaluation is based on an unusually large sample of teachers and students, allowing us to explore impact heterogeneity. Finally, this randomized controlled trial evaluates an EU-funded project, in line with the recent boost to counterfactual evaluation coming from the EU (for this purpose, see the European Commission’s guidance provided to increase the counterfactual evaluations of the next European Social Fund program).
In this article, we present the effects estimated at the end of the first year of the experiment. 8 The article is organized as follows. In the second section, we briefly describe relevant literature; in the third section, we discuss the Italian school context and the M@t.abel program; in the fourth section, we illustrate the design of the experiment and the data collected; in the fifth section, we discuss the effects of the M@t.abel program on student achievement and attitudes and, in the sixth section, on teacher practice.
Features of Effective PD Programs
PD initiatives vary widely according to their organizational structure, their contents, and their ultimate goal. 9 For example, teacher training programs can be aimed at improving teaching skills, or at updating teachers on their subject content, curricular changes, or other educational reforms; the courses can be structured as intensive summer schools, 1-day seminars, or series of lectures where teachers have to attend and/or collaborate with their peers on specific topics under the supervision of a disciplinary expert. This huge heterogeneity can explain the difficulties of current research to draw univocal conclusions and recommendations about the features enhancing PD effectiveness.
The PD program evaluated in this article focuses on training teachers in new teaching methods. The key idea underlying M@t.abel is that math should be taught in a very practical and hands-on way, linking math concepts to daily life problems. The idea of teacher PD as a factor boosting student performance has gained momentum: In order to help students to learn, it becomes crucial to have a well-trained and up-to-date teaching force. This point is stressed by several observers (Desimone et al. 2002; Schleicher 2011) as well as international organizations (OECD 1998, 2005; Eurydice 2003). This line of reasoning lies on the key assumption that a change in teachers’ skills will lead to a change in their students’ performance.
However, empirical evidence about which characteristics affect teaching practices and student achievements is still controversial and inconclusive (Guskey 2003; Fraser et al. 2007). Most research on PD deals with small-scale studies based on teacher surveys or tries to identify good practices. Using U.S. cross-sectional data, Garet et al. (2001) have identified extended duration (i.e., long-term activities rather than 1-day generic workshops), opportunities for active learning, and coherence with other learning activities as key features for successful training programs. These elements have been confirmed by a further analyses based on longitudinal data (Desimone et al. 2002). Other nonexperimental evidence suggests slightly different features: Student performance is likely to improve when PD is focused on academic content, based on teachers’ collective participation (e.g., Kennedy 1998; Ingvarson, Meiers, and Beavis 2005; Timperley et al. 2007).
From this perspective, M@t.abel seems a promising training program and holds features considered successful according to international literature. Indeed, it focuses both on teacher practice and on math content, offering specific teaching materials and promoting group work in the classroom. It lasts the whole school year, and, finally, it involves teachers and their continuous interaction through an online virtual community.
However, the evidence on which these considerations are based is weak: Most previous studies are seriously challenged by self-selection issues as even with longitudinal data it is not possible to rule out endogeneity problems completely and to establish causal links.
In the last decades, some experimental evidence has been gathered-mostly in the United States-showing less consistent results than previous observational research. In their review, Yoon et al. (2007) highlight a generally positive effect of PD on student achievement, although it is not always statistically significant given the small samples used in many studies. Such results are confirmed by further experiments showing the positive effects of PD on student achievement, especially in reading and comprehension skills (Vaughn et al. 2011; Sailors and Price 2010; Kim et al. 2011). At the same time, some programs have proved to be not effective, at least in the short run (Gersten et al. 2010; Garet et al. 2008).
There is also recent experimental evidence of effective PD regarding math. Looking at middle school teachers, Santagata et al. (2011) found a positive effect of a series of video-based modules designed for U.S. teachers. Other successful interventions in math are those reported by Saxe, Gearhart, and Nasir (2001), Sloan (1993), Cole (1992), and Carpenter et al. (1989). Saxe and colleagues evaluated the impact of a weeklong summer school, accompanied by 13 follow-up meetings, mainly on fractions: Teachers revised their knowledge of the subject, along with a deeper understanding of the way students learn and their motivation to do it. The authors report good results of treated students, who outperformed controls in a standardized test on concept knowledge of fractions; at the same time, the authors also detected a negative effect, not statistically significant but substantially relevant, on fraction computation. Carpenter et al. (1989) evaluated a 4-month workshop session on the cognitive strategies students use to learn math and the relationship between math problems and the processes students use to solve them: They found a positive effect, not statistically significant probably because of the small sample utilized. A similar approach, focused on student cognitive response, was also found to be effective in math competence in Cole (1992), even if the analyzed course was not discipline-specific and delivered to language and math teachers. Finally, Sloan (1993) reports positive effects of a 2-month course designed to support instructional and questioning behavior of teachers.
Alongside successful cases, however, many PD courses (even if designed according to best practice recommendations) proved to be ineffective: Garet and colleagues (2010, 2011) evaluated a large-scale PD program focusing on teachers’ knowledge of rational number topics (including specialized math knowledge that may be useful for teaching these topics). The program was structured as an intensive summer school with several follow-ups during the school year, for a total of 114 hr of training. Despite the effort, no effect was found either at the end of the training or after the subsequent year. Nonetheless, this is one of few cases, and the only one in math to our knowledge, where medium-term effects were collected both on teachers and on students. A lack of effects was also reported in other experiments on math PD programs, for example: Cavalluzzo et al. (2012) evaluated a program that combined training on teaching methods and the use of online resources in the classroom; Randel et al. (2011) used an approach similar to that of Sloan (1993) but focused on math, not finding any effect on math performance.
These puzzling results are not confined to math PD programs, but they may apply also to similar initiatives on reading and comprehension and on science. Best Evidence Encyclopedia researchers draw up systematic reviews about the effectiveness of PD and of other types of classroom interventions (computer-assisted instructions and curricular or textbooks modifications) across different subjects (Slavin and Lake 2008; Slavin et al. 2008; Slavin, Lake, Chambers, et al. 2009; Slavin, Lake, Cheung, et al. 2009; Slavin, Lake, and Groff 2009). They conclude that, on average, the most effective classroom interventions are those designed to change teachers’ daily practices, with respect to pure technological integration or a mere change in school textbooks. However, the numerous cases of noneffective programs bring into question whether the structural features identified in the literature are enough to guarantee the effectiveness of a program. Finally, a more recent comprehensive review of this literature also confirms that the existing evidence is not sufficient to identify key features for effectiveness (Romano 2012).
As mentioned previously, a large part of literature about PD is based on the assumption that the change in student achievement is driven by the prior change in the teachers. Only recently, however, researchers started to measure effects on teachers. This aspect is theoretically crucial, because the transfer of competences to students must pass through change in teachers’ instructional practice (self-reported or observed) and/or knowledge (measured through surveys or tests). PD programs (not necessarily focused on math) effective on students also show effects on teachers, mainly on their knowledge of the subject (Masters et al. 2010; Heller et al. 2012) or on their teaching practices (Matsumura et al. 2010; Sailors and Price 2010; Greenleaf et al. 2011). Nonetheless, there are cases where an observed improvement in teacher practices/knowledge was not accompanied by a significant increase in student performance (Garet et al. 2008; Gersten et al. 2010; Garet et al. 2011).
To sum up the state of the art, we must acknowledge that much has still to be done in order to fully understand which factors drive PD’s effectiveness. Even when designed according to the criteria suggested in the literature, PD programs do not lead per se to effects on student achievement or on teacher instructional practices. More needs to be explored in order to understand how to shape teacher training to be effective and through which channel the expected change has to happen. Moreover, little is known about whether a program can have a similar impact when delivered across a range of typical settings and when its delivery depends on multiple trainers or tutors. Finally, most studies concern the United States, providing little indication on specific context-based features that might condition the European and Italian framework.
The M@t.abel Program
Promoting effective PD programs for teachers is challenging all around the world. In Italy, they face additional challenges, including having to address the oldest lower secondary teaching force in the world, 10 a career advancement system based only on teaching seniority (OECD 2007), a recruitment system that did not require specific training in teaching for many decades, and a lack of compulsory training during teachers’ careers. Italian teachers declare more frequently than in other countries that they miss feedback about their job (OECD 2009). Most math teachers did not graduate in math or physics and could be considered as “out-of-field.” 11
Although in-service training is formally indicated as a professional duty, schools have limited resources to actually carry out such programs. Incentives to attend in-service training opportunities are limited at school and individual level, since there is no link to any form of career advancement or salary increase (Eurydice 2003). Training supply is fragmented and delivered mainly through 1-day seminars based on formal lectures (Moscati 2000).
The M@t.abel program evaluated in this study presents some novelty. It aims at increasing lower secondary school math achievement by providing teachers with alternative solutions and methods for presenting traditional contents. The main idea is that students, rather than learning abstract formulas and ideas, should be engaged in solving real-life problems through mathematical tools and concepts. The program is addressed to tenured 12 math teachers in Grades 6–8 (middle school) and 9–10 (first 2 years in high school). It is based on formal and online tutoring, and it lasts a full school year. There is a repository of teaching materials on different math curricular contents and adopting a problem-solving perspective. Teachers are required to use at least four of these teaching materials (precisely one per major math content area) in their classrooms and to report on the experience to their tutor and peers through a structured diary. Moreover, the program encourages a virtual community of teachers to exchange views through online forums and discussion groups, even from home. 13
Schools and individual teachers within schools enroll on a voluntary basis to the program (but not to the experiment, see subsequently). While registering, they also indicate their preferred location, among those in their areas delivering the formal training sessions of the program. The delivery takes place through selected schools with appropriate facilities for tutors and teacher meetings.
A M@t.abel course can be activated when at least 12 teachers sign up for the same location. During the school year, enrolled teachers have to attend seven face-to-face meetings (for a total of 27 hr of training) and about 40 hr of online training sessions through an e-learning platform.
Experiment Design, Implementation, and Data
To evaluate the effects of M@t.abel, we designed a large-scale randomized controlled trial, involving 175 schools, 666 teachers, and roughly 11,000 students 14 in four southern regions of Italy. 15 It was designed as a 3-year experiment starting in 2009/2010 and addressed only to lower secondary school teachers (Grades 6–8) having previously demonstrated the will to enroll to the program. A large amount of primary and secondary data were collected both on teachers and on their students.
Our main target measure for student performance is the students’ INVALSI math score; 16 however, to investigate more thoroughly the process underlying the program’s impact, we also seek effects on (self-reported) students’ attitudes and teachers’ beliefs and classroom practice. 17 Data are derived from the INVALSI Student Questionnaire 18 collecting students’ demographic and background characteristics, previous school experience and learning attitudes (i.e., attitudes toward math/reading, intrinsic/extrinsic motivation, test anxiety, attributions, etc.).
As previously highlighted, schools’ and teachers’ enrollment in M@t.abel is basically voluntary. Thus, an observational study on its effectiveness, based on the comparison between enrolled and not enrolled schools/teachers, would suffer from self-selection issues at school and individual level. Hence, an experiment among the enrolled teachers is necessary to establish a causal link. Moreover, due to M@t.abel’s extended duration, after the enrollment, teachers could drop out or not implement all the activities scheduled by the program. We kept these considerations in mind while designing the experiment and we will discuss their implications further, when dealing with compliance issues.
Randomization and Validity
The identification strategy is based on a typical treatment–control comparison between students clustered by classes (and therefore by their teachers) and schools. Given the importance of peer collaboration in the M@t.abel approach, only schools with at least two enrolled teachers were considered for this experiment. We ended up with 175 schools involved in the randomized control trial. Schools were randomly assigned to two groups: the enrolled teachers belonging to the first group of schools received the specialized training in year 2009/2010 (treatment group), those belonging to the second were delayed admission for 1 year (control group) and were admitted to the program in year 2010/2011. We stratified the 175 schools according to geographical criteria (namely, by province, isolating the cities of Naples and Palermo as specific strata) and by peer participation at the school level (schools with less than five teachers enrolled and schools with five or more), obtaining 31 non-null sample layers. Then 55 schools were randomly assigned to the control group, proportionally to the distribution of the schools in the sample layers. The remaining 120 schools were assigned to the treatment group and invited to participate in M@t.abel immediately. 19 From the 175 schools, we obtained a sample of 666 teachers: 473 were invited to attend the program immediately and 193 were considered as a control group (invited to attend the program during the following school year); 85 dropped out before the beginning of the school year but after the randomization. 20 We finally ended up with 174 schools (120 assigned to treatment and 54 controls) because in one control school all the teachers enrolled dropped out. Our final sample of teachers consists of 409 individuals immediately sent to treatment and 172 controls.
Only one class, randomly chosen by the research team among the many in which each teacher works, was assigned to each teacher for evaluation purposes. 21 This random selection was stratified across sixth-, seventh-, and eighth-grade classes (for both treatment and control group), so that students being observed throughout this experiment are about equally distributed across the three grades. Teachers were asked to implement the M@t.abel teaching materials in the assigned class (treatment group) or were informed about the involvement of their class in the experiment and the fact it would be tested at the end of the school year (control group).
Thanks to the large amount of information collected, we were able to test the equivalence between the treatment and control groups across an unusually wide range of variables at school, teacher, and student levels (about 50 22 ). The internal validity of the experiment is verified. 23 Looking at the external validity of the experiment, we determined that the figures of our sample (based on self-selected schools in four southern regions) generally compare with those of the population of other schools, teachers, and students in the whole of the eight regions of Southern Italy, 24 but not to the rest of the country. 25
Noncompliance
The estimated effects could be diluted by the fact that some teachers assigned to the treatment group did not actually complete the M@t.abel program or did not participate for the length of the whole school year. Complier teachers are those who fulfilled the treatment protocol as follows: (a) to receive the final certificate given by the tutor, proving their full attendance at both formal and online training, and tutor’s validation about teaching units required to be experimented within the classes—this validation is based on the evaluation of teachers’ written reports regarding the implementation of the teaching activities with their students and their ongoing reflection on the teaching/learning process; (b) to experiment at least four teaching units in their observed class; (c) these teaching units must come from four different math content areas. The first two criteria were adopted in order to ensure a sufficient intensity of the treatment; given the availability of teaching material, we chose to add the latter criterion in order to guarantee homogeneity of the treatment.
Only 39% of the teachers were compliant with the treatment protocol (Table 1). Nonetheless, thanks to the large sample size, we can rely on 160 treated teachers and their 3,053 students. Noncompliance is characterized by different behaviors: 34% quit at the very beginning of the program; 4% did not get the end-of-training certificate (typically because they experimented with less than four teaching units); 10% did not implement the program properly (they did not use teaching units from different math content areas or not in the assigned class); 13% had intended to participate, but the training course actually never started in their area. The rate of compliance is similarly distributed across the three school grades, ranging from 38.2% in sixth grade to 41.5% in seventh grade and 37.8% in eighth grade. We did not observe crossover among the control group of schools and teachers. 26
Compliance in Terms of Teachers and Students.
Features of noncompliance have been explored using multivariate binary logistic models, pointing out to self-selection issues in the subsample of teachers fulfilling the treatment protocol. Compliance is associated with younger age, information and communication technology (ICT) familiarity, participation in previous in-service training opportunities, and personal motivation to enroll in the program (instead of having the school register them for M@t.abel, on account of the teachers). Teachers who enrolled on their own show a higher rate of compliance compared both with those informed by their school principals and with those involved as part of a school initiative. The recruitment channel based on schools instead of individual teachers, however, was not ineffective per se: The vast majority of teachers (about 76%) actually did enroll upon the suggestion of their school principals.
Probability of completing M@t.abel is greater when teachers were located in more urbanized and less mountainous areas. Given that in several cases teachers were not assigned to their preferred location for the treatment, these factors could have counted on their willingness to follow the formal training sessions or on how easy it was for them to actually get there. 27 Indeed, the main reason reported by noncompliers to justify their dropping out is the distance from the course location, followed by time constraints. The program requires time to reach and attend formal lectures; time in the classroom to use the materials with students; time to report about those experiences; and digital skills to download materials and exchange comments with colleagues. This should be kept in mind when recalling that the large majority of teachers in Italian lower secondary schools are female 28 (OECD 2012) and they have to face difficulties of reconciling work and family demands. It is therefore no surprise that younger teachers, with more ICT familiarity, are more likely to comply with the course requirements and the experimental protocol. 29
Estimating the Effects
The short-term effects of the M@t.abel program are estimated in terms of intention to treat (ITT) by ordinary least squares (OLS) models. Namely, we test whether being assigned to the treatment group produces impacts on students and teachers. Considering the high noncompliance rate, the overall effect on cases receiving the treatment could be diluted, so, we also estimated the average treatment effect (ATE), instrumenting compliance with the assignment to the treatment. 30 The ATE estimates are displayed, despite the self-selection among compliers. The relevance of observed characteristics associated with compliance suggests that it might not be reasonable to expect that effects estimated on compliers can be extended to noncompliers.
In the following sections, we present the base models on students and teachers, where we control only for the randomization stratification variables and the presence of an external observer during the national math assessment, 31 correcting the standard error of the estimates with class clustering. We run several models to check for robustness, using different sets of control variables; 32 the results of our experiment do not change.
Effects on Student Math Performance and Attitudes
Math achievement is the key outcome for evaluating the effectiveness of M@t.abel and the one more accurately measured in our project. Looking at this crucial outcome, we consider three target variables: the overall math score measured by the INVALSI national assessment and estimated using a Rasch model (scaled to an average of 500 and standard deviation of 100 for the seventh-grade scores);
33
the frequency of skipping/double marking at least 1 item, proving a strong hesitancy in responding (with the exception of those skipped because the student did not reach the end of the assessment); the frequency of not completing the assessment (i.e., not reaching 1 or more of the last items in order of presentation).
The results reported in Table 2 show that the treatment has no significant impact on the main outcome, the average math competence score. 34 Students in the classes where teachers participated to the M@t.abel program actually present, ceteris paribus, an average slight advantage in the performance, although it is neither statistically significant nor substantially relevant. However, the program seems to have an undesired impact on increasing the propensity of the treated group to skip at least 1 item during the assessment. However, this behavior did not affect the overall number of items skipped or double marked, neither the propensity of not completing the assessment. 35 The undesired impact is quite surprising and requires to be interpreted. We suggest it is due to the treated students’ concern about doing well in the test and, therefore, avoiding a simple guess when they did not know the exact answer.
Average Impact on Student Math Performance.
Note. OLS = ordinary least squares; IV = instrumental variable; ATE = average treatment effect; ITT = intention to treat.
The symbols ***, **, and * indicate that coefficients are statistically significant at the 1%, 5%, and 10% level.
To reinforce this interpretation, we investigated the effects of M@t.abel on students’ attitudes measured in the INVALSI questionnaire (Table 3).
36
M@t.abel showed significant impacts on the following measures: Self-confidence in math (4 items, that we also synthetize in a unique factor score);
37
curriculum pace (where unfortunately we can rely on only 1 item asking students if they experienced the feeling of proceeding even if some classmates did not understand the topic); attribution of failures to bad luck, a score ranging from 0 to 6 and based on students agreement with a scale of 6 dichotomous items associating school failure to random factors; test anxiety (4 items, where only 1 is strongly affected by M@t.abel).
38
Average Impact on Student Attitudes toward Math and School.
Note. ATE = average treatment effect; ITT = intention to treat.
The symbols ***, **, and * indicate that coefficients are statistically significant at the 1%, 5%, and 10% level.
Students of treated teachers show greater self-confidence in math and report less frequently than the others that the causes of academic failure are due to bad luck, but they also feel a higher level of anxiety while taking the test. By combining these elements, we draw a picture of greater responsibility perceived by students in determining their achievement outcomes (Zeidner 1998), revealing that no answer could be preferable to guessing an answer (Hagtvet and Benson 1997). Our interpretation, supported by empirical clues pointing in the same direction, is that the treatment produced a more “perfectionist” attitude (Mills and Blankstein 2000), characterized by a conscientious and a goal-oriented behavior (Tsui and Mazzocco 2007). In contrast with disengagement, perfectionism in students is generally correlated to greater test anxiety and concern about doing well in exams, because of high personal standards of academic performance and internal attributions for failure (Kramer 1988).
Another unexpected effect lies in students’ perception of time constraints in learning: Treated students report more frequently than controls that they feel they have not had enough time on a given subject. This element matches the treated teachers’ complaints about the little time available to actually implement the new approach in the classroom.
The mentioned effects on students’ attitudes are significant even if their magnitude is low. Although negative in the short term, these effects could lead to future improvements in student math performance: Feeling responsible for their achievement, more interested in math and more confident about their skills, and being perfectionist in solving problems could become an advantage in building math competences, especially as teachers become more familiar with the M@t.abel approach.
Given that specific subgroups could have benefited more from the program than others, and thanks to the large sample size, we explored the heterogeneity of effects among different groups of schools, teachers, and students. Hints of differential program effectiveness appear only with regard to teachers’ age, as shown in Table 4. Students of middle-aged teachers (in our sample, aged 50–55 39 ) show a significant positive effect of M@t.abel in their average math score (ITT: 14.8; average treatment effect on the treated [ATT]: 41.6), although the effect is absent (if not even negative) for other age groups. Looking at attitudes, students of treated middle-aged teachers show some additional difference with respect to their peers: They are less likely to consider math more difficult for them than for their peers, they report learning difficulties due to the curriculum’s fast pace less frequently, they attribute failures to bad luck or chance less frequently, and they feel less nervous while taking the test. These clues suggest that middle-aged teachers managed the new teaching approach better than the others, providing students with the opportunity to be fully involved and actively learn.
Average Impact on Student Performance and Attitudes Toward Math and School by Teachers’ Age-Group.
Note. ATE = average treatment effect; ITT = intention to treat.
The symbols ***, **, and * indicate that coefficients are statistically significant at the 1%, 5%, and 10% level.
We think that a better implementation of the program could also have favored a more cooperative environment, which improved students’ perception of their ability in comparison with their peers, rather than self-confidence or interest in math. In this perspective, a possible explanation for the lack of the program effectiveness among the other teachers’ groups could be linked to time constraints in implementing the activities. This in turn could have made it difficult to adapt the intervention to both good and weak students’ needs, by stimulating higher competitive pressure and performance anxiety in the classroom.
Our interpretation is based on the figures depicted in Table 4 but also on evidence collected through computer-assisted telephone interviews (CATI) and focus groups on teachers and tutors. The small number of teachers for each age group and their distribution in a not random subsample of schools suggest cautiousness in deriving strong conclusions from this result. However, these are interesting results, raising the issue of heterogeneity in treatment effect due to teachers’ characteristics.
Effects on Teacher Attitudes and Classroom Practice
Before even affecting students, M@t.abel should trigger changes in teachers’ knowledge, practices, and attitudes. 40 Thus, it is relevant to examine whether there are changes in the way teachers are teaching. Results in that direction could be promising for student improvements in the next years.
Relevant information was collected through a CATI based on an ad hoc questionnaire, including items on attitudes (toward math teaching, self-efficacy perception, and job satisfaction) and instructional practices (classroom activities, evaluation of students, materials and instruments used to teach, interaction with colleagues). Post-treatment interviews were administered in December 2010/January 2011, once the PD program for teachers in the treated group was finished and a new school year had already started. 41 Here the absence of consolidated international measurement practices and tools led the research group to build an original questionnaire. In order to do so, we designed the questionnaire through an intense consultation with experienced math teachers and with the experts involved in the M@t.abel scientific board. Nonetheless, we used the items useful to our purposes and available in the TIMSS questionnaire (Mullis et al. 2008) and a reduction of the Bandura (2006) scale on teacher self-efficacy (Bandura 2006). We estimated the effects on teachers’ instructional practices and attitudes using linear models on pseudo-continuous variables (we standardized the values of the variable, originally on a 1–10 point scale) and linear probability models on dichotomous variables. 42 Table 5 shows only the questionnaire’s items 43 where we found effects robust to different model specifications (controlling for nonequivalent covariates) for both ITT and ATE estimates. 44
Differences Across Treatment and Control Groups on Teachers’ Attitudes and Practices (ITT and ATT).a
Note. ATE = average treatment effect; ITT = intention to treat; SE = standard error.
aCoefficients of the battery “Interaction with colleagues” are expressed in percentage points. The value of controls for the attitudes toward math teaching and self-efficacy perception refer to the original scaling of the variable (ranging from 1 to 10).
The symbols ***, **, and * indicate that coefficients are statistically significant at the 1%, 5%, and 10% level.
The observed effects regard dimensions relevant in everyday teaching and strictly linked to the program: collaboration with peers and attitudes toward teaching (namely, the way they think about teaching math and the way they can do it). 45 In the school year following the treatment, the teachers became significantly more eager to collaborate with peers at work, both in the preparation of didactic materials and in discussing better ways to present a concept to the students.
The size of these effects is relevant. Since collaboration among teachers can be considered one of the effectiveness-enhancing factors (Scheerens 2000) and since M@t.abel explicitly promoted it, this result seems particularly promising. The program seems to have had an impact on the way teachers feel about teaching math: Those in the treatment group agree less than the control group with the statement that math ability is a sort of “fate” or depends on students’ own endowment (which cannot be substantially affected by teachers’ effort). Finally, the treatment also seems to have led teachers to perceive their limits, making them feel less confident about their effectiveness in promoting collaboration among students. This could be the consequence of more frequent group work practiced in the classroom while using the M@t.abel teaching units and understanding more profoundly the challenges of a more interactive teaching style.
As repeatedly reported in the literature, our experiment also shows effects on teachers that are not accompanied by effects on their students’ results. Such conclusion is still confined to the first year of experimentation. It is possible that the levers touched by the program (as well as the training itself) will translate into better math performance in subsequent years. Above other considerations, however, these results cast some doubts on the top-down approach that characterize many PD programs in education.
Conclusions and Lessons Learned
This article studies the effects of the M@t.abel teacher training program on students’ math performance and on teachers’ behavior and instructional practices. We run the first ever large-scale randomized experiment in the Italian school system, on a sample of schools in four Southern regions. We learned some lessons about the program, but also about the implementation of rigorous policy evaluation in the Italian educational system.
Findings show no significant impact of the M@t.abel program on student math performance in the short run, but some effects on students’ attitudes and teachers’ practices and beliefs do appear. Students of treated teachers show a more positive attitude toward math and there are signs of a greater sense of responsibility among students for their own learning, even while facing the standardized math assessment. Their higher attention toward math performance could become an advantage in building their future math competences. Looking at the effects on teachers, we observe that there has been a change toward a more innovative way of leading the classroom, especially promoting more frequent exchanges with colleagues. At the same time, we detect that complier teachers suffered harder time constraints in combining the ordinary school lessons with the implementation of the M@t.abel approach and the attendance to the training course. Finally, we also detected some heterogeneity of the treatment’s effectiveness, with a positive effect when middle-aged teachers are involved. These results suggest that the lack of overall effectiveness could be due to the additional effort required to teachers during the first year of the program.
The study is continuing and building up a longitudinal sample of the same students for 2 additional years. Thanks to this approach, we will be able to detect possible effects at distance or as teachers gain familiarity with the M@t.abel program. Nonetheless, the mixture of low teacher compliance, tough time constraints, and absence of positive effects on the crucial outcome raises doubts about the top-down approach characterizing this PD program. To improve student achievement, change in teacher attitudes and practices may not be enough, especially if these changes lead to additional difficulties in the daily school routine. At the same time, thanks to the intervention being evaluated at scale, we have been able to identify a number of intervention features that should be modified in order to retain the enrolled teachers and to improve the program’s implementation. 46 For example, given that age is a relevant factor for full compliance, the fact that the program is addressed only to tenured—usually older—teachers could be questioned. Similarly, one might find it useful to ensure that participants have sufficient ICT skills before enrolling for this kind of training program. Moreover, the requirement of completing four teaching units seems too challenging for teachers in one school year.
Although a considerable amount of work remains to be done to acquire richer knowledge from this experiment, the evaluation has contributed to a substantial rationalization of the program design itself and suggested some indications of the strong and weak features of the intervention. Developing ex ante the evaluation design in cooperation with the Ministry of Education and the M@t.abel Scientific Committee has affected the program’s structure. Once a randomized experiment was agreed upon, the institutional actors leading the program were driven to streamline the activities and actually reinforce their key features in terms of duration, content focus, and peer collaboration. In practice, this has meant ensuring that the program actually started in early autumn, so as to allow its completion by the time the INVALSI standard math assessment was held in May. It also implied fixing more precise and homogeneous requirements on the training, such as requiring the same number of teaching units to be used in the classroom and encouraging school-level rather than teacher-level participation. The evaluation produced deeper collaboration among the different institutions, because they had to collect and exchange data about the intervention and its features.
Before randomizing, there was substantial fear among institutions and researchers that teachers would complain about being excluded from the program, disobey the recommendation on which class was to practice M@t.abel, and/or be unwilling to collaborate in the data collection. The actual picture was very different. Many teachers asked for explanations concerning the logistics of the evaluation through the website contact e-mail; almost all participated in pre–posttreatment interviews 47 and allowed their students to be assessed. Within the treatment group, only a minor share of the teachers decided to practice M@t.abel in a class other than the randomly assigned one. This evaluation has proved that randomized experiments can be conducted in Italy and be useful in a context where there is a lack of culture and some resistance to such an approach (see also Argentin, Romano, and Martini 2012; Abbiati et al. 2013). We see this experience as an encouraging note for future randomized trials and evidence-based policies in the Italian education system.
Supplemental Material
Supplemental Material, Final_paper_-_Appendix - Trying to Raise (Low) Math Achievement and to Promote (Rigorous) Policy Evaluation in Italy: Evidence From a Large-Scale Randomized Trial
Supplemental Material, Final_paper_-_Appendix for Trying to Raise (Low) Math Achievement and to Promote (Rigorous) Policy Evaluation in Italy: Evidence From a Large-Scale Randomized Trial by Gianluca Argentin, Aline Pennisi, Daniele Vidoni, Giovanni Abbiati and Andrea Caputo in Evaluation Review
Footnotes
Acknowledgment
Special thanks for their support and precious advice to Alberto Martini (who contributed to design and set up the project), Piero Cipollone, Annamaria Leuzzi (Ministry of Education), Annamaria Fichera (Ministry of Education), Erich Battistin, Alessandro Cavalli, Maria Pia Perelli d’Argenzio, Enrico Rettore, and Jaap Scheerens.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research was funded by the National Operational Programme “Competenze per lo sviluppo” – FSE -2007-IT 05 1 PO 007, Project ‘Valutazione Matabel - Plus’ (I-3-FSE-2009-2; B15B09000020006).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
