Abstract
Professional development is central to teacher competence and, with particular attention to larger-scale, structured and formal professional development interventions (PDIs), this article discusses how the impact of professional development can be conceptualized and evaluated. Various kinds of impacts on teachers, organizations and systems, and students that PDIs can have are considered, and examples from actual PDIs in the field of English language teaching (ELT) are used to illustrate how such impacts can be assessed. Throughout, various challenges that arise in trying to demonstrate that PDIs make a difference are noted, and the article concludes by suggesting some criteria that can be used to guide the development of impact evaluation frameworks on PDIs.
Keywords
Introduction
The centrality of teacher competence to student outcomes has been highlighted repeatedly in the literature in recent years (Hattie, 2012; Schwille, 2007) and one corollary of this argument has been that enhancing teacher quality is key to improving the quality of an educational system more generally. Professional development (PD) is a key strategy for teacher improvement and includes any activity which is designed to bring about positive change in practising teachers’ competence. PD can take many forms, from informal, brief and individual activities such as reading professional articles to participation in larger-scale and extended teacher development projects organized by institutions and organizations such as Ministries of Education. In this article (without in any way, of course, dismissing the value of other kinds of PD) it is the more formal and structured forms of PD that I will focus on and which I will refer to here as professional development interventions (PDIs). My specific concern will be the evaluation of impact, which has been defined as ‘the lasting or significant changes … in people’s lives brought about by a given action or series of actions’ (Roche, 1999: 21); in more practical terms I have found it useful to think about the impact of PDIs via this question: To what extent does professional development make a difference in any way (the ‘what’) to anyone (the ‘who’) at any point in time (the ‘when’) and how can we find out (the ‘how’)?
This question highlights key elements in the process of evaluating the impact of PDIs: who the target beneficiaries are, the nature of the hoped-for benefits, when evidence of impact will be collected, and what strategies will be used to do so. I will examine these issues in the discussion below.
My analysis here has a strong practical grounding and throughout I draw on examples of PDIs in English language teaching (ELT) I have worked on in recent years. In terms of underlying theory, research papers over the years have reported on how different kinds of PD impact on practising language teachers (for example, Borg, 2011; Curtis and Szesztay, 2005; Kubanyiova, 2012; Lamb, 1995; Lamie, 2004; Scott and Rodgers, 1995; Turner et al., 2016), but my concern here is not to present, as these studies do, evidence of impact from a specific intervention, but to examine more broadly how the evaluation of impact in PDIs can be conceptualized and implemented. A few descriptive accounts of the evaluation of PDIs in ELT projects are also available (see Hayes, 2014b) and these are another source of relevant reading here. In education generally, Guskey (2000) is widely cited on the evaluation of PD (but see also Earley and Porritt, 2014; Goodall et al., 2005; Kutner et al., 1997; Muijs et al., 2005). Another important theoretical source is the literature on teacher competence, as this highlights the multi-dimensional nature of this concept and points to the different ways in which the impact of PDIs can be conceptualized. In ELT, for example, frameworks exist (such as the British Council CPD Framework or Cambridge English Teaching Framework) which define the competences language teachers need. In education more generally, an example of a competency framework for teachers is that proposed by the National Board for Professional Teaching Standards in the USA (National Board for Professional Teaching Standards, 2016). An analysis of these frameworks is beyond my scope here (but see Borg, 2018); their relevance to this analysis, though, is that they demonstrate that teacher competence includes a wide range of skills, knowledge, dispositions and behaviours which are all potential target impacts in PDIs. Finally, I have also found the literature on the monitoring and evaluation of development projects to be a particularly insightful source of ideas (for example, Bell and Aggleton, 2016; Görgens-Albino and Kusek, 2009; Gosling and Edwards, 2003; Markiewicz and Patrick, 2016) and I would recommend this to individuals who are responsible for the evaluation of PDIs. Overall, though, while substantial amounts of evaluation work do take place on ELT PDIs around the world and a range of relevant theoretical sources can inform this work, the evaluation of impact in PDIs is not an issue that has been widely discussed in the applied linguistics literature.
Markiewicz and Patrick (2016: 4–5) suggest six reasons why the evaluation (ongoing and summative) of development interventions is important and their analysis is pertinent to this discussion of PDIs in ELT. The reasons they give are as follows (with my comments in brackets):
Results (understanding what the impacts are)
Management (such as understanding the extent to which targets are being met)
Accountability (examining whether resources are being used effectively)
Learning (understanding what works and what does not)
Improvement (such as making formative adjustments in response to evaluation results)
Decision-making (using evaluation data to make decisions about the design, direction and resourcing of PDIs).
There are, therefore, many good reasons for evaluating the impact of PDIs. Overall, if an institution or organization is going to invest significant resources in a PDI, including the time of the participating teachers, questions about whether the PDI actually makes any difference are eminently justified. Impact evaluation should thus be seen as an important element – rather than an administrative formality or superficial afterthought – in the design of PDIs. In relation to the ‘when’ question noted earlier, a distinction can be made (Gertler et al., 2011) between prospective impact evaluation, which is built into an intervention and collects ongoing evidence (including at baseline), and retrospective evaluation, which takes place only on completion of a programme. One general point that will be noted throughout the discussion below is that prospective impact evaluations are (because they allow for comparisons of evidence collected at different points in time before, during and after a PDI) more likely to generate robust insights into impact compared to those which only collect exit evaluation data.
The Impact of PDIs
PDIs can have many kinds of impact and I will discuss several of these below. In most projects I have worked on, the hoped-for outcomes of the PDI are defined at the outset and these provide a reference point for the evaluation of impact. In fact, without a clear sense of intended impacts, it is not possible to think systematically about impact evaluation (because success cannot be measured without reference to targets). In project management terms, the outcomes of a PDI are often called key performance indicators (KPIs) – criteria against which the success of an intervention will be measured (for a discussion of KPIs, including critical perspectives, see Bakewell and Garbutt, 2005; Gertler et al., 2011; Görgens-Albino and Kusek, 2009).
The framework for evaluating the impact of business training proposed by Kirkpatrick (Kirkpatrick and Kirkpatrick, 2006) was adapted for use in educational settings by Guskey (2000). The original framework defined four increasingly deeper levels at which impact could be assessed – reactions (i.e. immediate satisfaction), learning (knowledge, attitudes and skills), behaviour (real-life practice) and results (broader benefits for organizations). Guskey proposed five levels – reactions, learning, organizational support and change, use of new knowledge and skills, and pupil learning outcomes. This last item was a significant addition that emphasizes the point that, ultimately, the purpose of PD for teachers was to enhance student outcomes and this is an issue I return to in the discussion below. This is not explicitly structured around the levels listed above, although I do refer to them in analysing the different kinds of impact that a PDI can have.
Inputs, Reach, Participation and Outputs
In PDIs, impacts can be defined with reference to inputs (activities), reach (how many people are affected), participation (how much participants engage) and outputs (products). Excessive focus on these elements at the expense of deeper kinds of impact is viewed critically in the literature (Markiewicz and Patrick, 2016). Nonetheless, several projects I have worked on have examined such issues because they were important variables in understanding the extent to which deeper kinds of impact were achieved.
Reach refers to the number of individuals who are affected by a PDI. We can distinguish between those who are directly affected (i.e. the primary beneficiaries) and those who may benefit less directly (for example, the students of teachers who take part in a PDI). Reach is not a measure of impact; it is a description or estimate of the number of people who may be affected by the PDI. One project I worked on had as one of its objectives the improvement of English for all children from age 11 upwards; as there were several million such children in the country, this was a description of potential reach. To take another example, the TEJAS project in Maharashtra in West India, 1 refers to a reach of 18,000 teachers. While on smaller PDIs it may be possible to verify what the actual reach is, on larger-scale projects this becomes increasingly difficult.
Inputs are PD activities – for example, the number of workshops that were delivered. In the ELISS project in Maharashtra, 2 a questionnaire was distributed to the participants (mentors) at the end of the project and they were asked questions about how often they visited teachers in schools and how often these visits included pre-observation and post-observations. Such questions focus on PD activities rather than on impact. Knowing that activities have taken place, though, is often necessary because this is the assumption on which the subsequent assessment of impact (for example, whether teachers’ classroom practices changed) is predicated.
Participation captures information about participants’ engagement in a PDI. For example, an online course may be provided (that is an activity) but participation levels may vary – some teachers will complete the whole course while others will not. Analytics often provide information about online participation by recording how many hours teachers are logged on to the PDI platform (although being logged on may not in itself mean the teachers are engaged). To take another example, if, as in the ELISS project, WhatsApp groups are set up to promote teacher discussions, participation can be analysed by looking at the number of messages that are posted (Parnham et al., 2018 present such an analysis). In face-to-face PDIs, teacher attendance at sessions is a measure of participation.
Outputs are products that are created as a result of a PDI – for example, a set of teaching materials, a new language test or an edited collection of reports written by teachers (this is a key output, for example, on the Cambridge English-English UK Action Research Scheme). 3 Outputs do provide evidence of what an intervention has achieved but, again, as with activities, reach and participation, outputs should not be confused with the deeper kinds of impact discussed below.
Impact of PDIs on Teachers
I will now move on to discuss the different kinds of impact that PDIs can have on teachers.
Satisfaction
As noted above, the first level in Kirkpatrick’s framework was ‘reactions’. This is a measure of how satisfied participants are at the end of a PDI. For example, one PDI I worked on involved weekly workshops over 10 weeks and at the end of the final session teachers were asked to fill in an evaluation sheet where they rated the course content, the trainer, course delivery and training facilities. The questions about the trainer, for example, included the following:
Donovan and Townsend (2004) discuss the pros and cons of such ‘happy sheets’ (as they tend to be called in business training contexts). On the plus side, they are quick to administer and analyse, but, less positively, they provide no actual evidence of impact – for example, participants may be (or appear to be) highly satisfied but may not have acquired new knowledge or may not transfer anything they learned back to their workplace. Yan and He (2014) used a questionnaire at the end of a PDI to get feedback from teachers, and although the title of their article suggests a focus on longer-term impact, the evidence they report only captures teachers’ immediate reactions at the end of a 30-hour in-service course. One other general concern attached to self-reported impact evaluation data of this kind is that, even when this is provided anonymously, factors such as a reluctance to be self-critical or critical of others may generate responses that do not reflect how participants actually feel. Clearly, then, while it is possible and desirable to collect evaluation data from teachers immediately after a PDI, such information is, on its own, limited in what it reveals about how PD affects teachers and should ideally be combined with other forms of evidence collected earlier in the process (i.e. at baseline) or even (where possible) some time after the end of the PDI.
Language Proficiency
On projects I have worked on and in others where reports have been made public, 4 observations of lessons and/or proficiency tests have shown that many teachers have levels of English at A1 or A2 on the Common European Framework of Reference for Languages (CEFR). This is against the general recommendation that teachers of English should be at least B2 and ideally C1 (Hayes, 2014a). Improving teachers’ English proficiency is thus often a key target of PDIs. For example, in one project I am working on, 600 teachers completed a standardized English test at the start of a one-year PDI and again at the end. Comparisons of their entry and exit scores will provide evidence of the impact the PDI had on their English. In the EfECT project (Borg et al., 2018), over 1600 teacher educators completed English language assessments at baseline, mid-point and end-project. Again, these were compared to assess the impact of the PDI on the participants. Of course, if improvements in teachers’ language proficiency are being targeted it is essential that the following conditions are in place: the availability of an appropriate test; standardized administration and marking procedures; explicit language development work; and sufficient time for language development to occur (the general advice is that around 100 hours of study are required for learners to progress from A1 to A2 and another 150 to progress from A2 to B1). 5 Real-world project conditions (such as funding) may mean that not all of these conditions can be met; for example, in one project I am currently involved in teachers’ exit levels of spoken English are being assessed via a two-minute online discussion with their tutors. A moderator also takes part to standardize the process somewhat, but this is of course a less robust form of assessment than that which a standardized test would provide. However, PDIs are always subject to real-world constraints and decisions about how to evaluate impact will be governed by what is feasible in any particular context.
Teacher Knowledge
Teacher knowledge is a complex concept and the knowledge teachers have or need has been conceptualized in a variety of ways in the literature (Carter, 1990; Shulman, 1986; Verloop et al., 2001; Woolfolk Hoy et al., 2006). Overall, though, it is generally accepted that higher levels of teacher knowledge contribute positively to the quality of teaching and learning. One form of teacher knowledge that lends itself to assessment is variously called pedagogical knowledge (Zohar and Schwartzer, 2005) or (in ELT) teaching knowledge (Khalifa et al., 2014); this is teachers’ explicit knowledge about educational theory, teaching and learning. For example, in the Teaching Knowledge Test (TKT), teachers of English are assessed on their theoretical knowledge of the principles and practice of ELT. A PDI may, then, seek to improve this kind of knowledge. Courses for teachers which prepare them for tests such as the TKT do not very often incorporate a prospective evaluation framework – teachers will study the prescribed modules and do the test, and success may be measured retrospectively in terms of test results or pass rates, rather than against any baseline. PDIs, though, may approach the assessment of teacher knowledge prospectively too. For example, on the EfECT project mentioned earlier, teachers’ knowledge of interactive teaching methodology was assessed through a specially designed matching test before a methodology module and again at the end, several months later. Comparisons of the two measures were then used to evaluate whether teacher knowledge had improved as a result of the project. In another project I worked on a few years ago, teacher knowledge at the end of the PDI was examined during an interview, where teachers were asked to explain their understanding of a number of concepts they had been introduced to, such as lead-in, mingling and pre-reading. In both the examples I have given, teacher knowledge was being examined at a basic level – receptively (through matching terms and definitions) or in propositional terms (by giving a short oral explanation). No claims can be made from such assessments about how well teachers can make use of such ideas in the classroom.
The literature on teacher knowledge does, though, posit more complex notions of teacher knowledge which are more closely connected to actual practice and which emphasize the point that theoretical knowledge is an insufficient basis for effective teaching. For example, the notion of pedagogical content knowledge (PCK) introduced by Shulman (1986) or offshoots from it such as mathematical knowledge for teaching (Silverman and Thompson, 2008) place more emphasis on the knowledge teachers need to make subject matter meaningful to students. None of the PDIs I have worked on have, for evaluation purposes, included an explicit focus on such kinds of knowledge for teaching, although it must be acknowledged that what PCK means in the context of ELT has not been widely discussed (in the context of grammar teaching, though, see Johnston and Goettsch, 2000; Sanchez and Borg, 2014). One final point to make about assessing teacher knowledge here is that, while the formal testing of teacher knowledge does occur (I gave examples from English above), there will be contexts where teachers react negatively to the idea of having their knowledge tested. This is a further reminder that the evaluation of PDIs is not just a technical exercise; it must be sensitive to the participants and more generally to the context in which the PDI operates.
Instructional Skills
PDIs can also target teachers’ instructional skills – i.e. their ability to implement specific teaching strategies in controlled or contrived settings, such as micro- or peer teaching or some kind of practical assignment (such as designing instructional materials). 6 None of these involve actual teaching in real classrooms – I discuss this kind of impact separately below. One current large-scale PDI for English teachers uses peer teaching to assess the extent to which teachers have acquired communicative teaching skills such as managing pair or group work. Whilst teaching their colleagues, teachers are observed three times during the PDI and assessed against specific observation criteria. A comparison of the teacher ratings provides a measure of how much impact the PDI had. One interesting point that arises in such contexts and is relevant to comparisons of pre- and post-PDI evaluation data more generally is the distinction between improvement and competence. If teachers’ instructional skills are being assessed on a scale of 1–4 where, for example, 1=limited ability and 2= basic ability, teachers who (across several items) receive an average observation rating of 1 at baseline and 2 at exit will have improved but not achieved the skill levels (3–4) that equate to higher competence. How target outcomes are defined, then, has implications for the way that impact evaluation results will be interpreted. For example, if the KPI is that 90% of teachers will achieve a better rating on their exit skills assessment compared to baseline, this is very different to a KPI which states that 90% of the teachers will achieve an average rating of 3 by the end of the PDI. The latter is criterion-referenced and provides evidence of outcome (i.e. a result) but, unless a comparable baseline measure is available, not of impact (i.e. change).
Attitudes and Beliefs
PDIs also often target changes in teachers’ attitudes and beliefs. In Kirkpatrick’s framework, attitudes (along with knowledge and skills) are included under the level of impact called ‘learning’. This was seen as a pre-requisite to changes in ‘behaviour’ (which was a subsequent, deeper level of impact). Guskey (2000), though, argues that behaviour can sometimes be a precursor to changes in learning (i.e. that teachers may first change their behaviour and consequently develop new attitudes, skills and knowledge). This is an interesting point and I agree that, while mastery experience (Bandura, 1997) (in our context, repeated first-hand experience that an innovative teaching activity works) is an essential part of behaviour change and belief formation, sustained changed in behaviour is unlikely unless it is accompanied by deeper changes in teachers’ attitudes and beliefs (and unless the context in which teachers work is supportive of the change being promoted). However, irrespective of how the relationship between attitudes and beliefs, on the one hand, and teaching behaviour, on the other, is conceptualized, the key point here is that changing teachers’ attitudes and beliefs may be a legitimate goal of PDIs and hence an appropriate focus when the impact of a PDI is being evaluated. The precise difference between beliefs and attitudes is not straightforward but generally attitudes are broader positive or negative dispositions whereas beliefs are more specific views about what is considered to be true. Much has been written both in the psychological and the educational literature on the assessment of attitudes and beliefs (Crawley and Koballa, 1994; Vogel and Wanke, 2016) and in the context of language teacher cognition research there has also been much discussion of the study of teachers’ beliefs (Borg, 2015). This literature highlights many different ways in which teachers’ attitudes and beliefs can be evaluated in the context of PDIs. A detailed review of this material is beyond my scope here, but it is important to note that it is characterized by a number of debates. One relates to the extent to which attitudes and beliefs can be validly measured via decontextualized questionnaires and similar self-report tools which separate belief and behaviour (Kagan, 1990; Kubanyiova and Feryok, 2015). In experimental psychology questionnaires have been the standard way of measuring attitudes but this approach has been the focus of substantial criticism from more critical approaches to attitude research (see Stainton Rogers, 2011 for a summary) which argue for the need to study attitudes in the context of specific social contexts. It will not always be possible for individuals responsible for assessing the impacts of PDIs to engage in the deep academic study of this material, but some awareness of the options available for assessing beliefs and attitudes and the strengths and limitations of these is desirable.
The examples I provide below, and which are taken from actual projects I have worked on, focus mostly on beliefs, which is perhaps a reflection of the greater attention this concept has received in language teacher research compared to attitudes.
In one PDI in sub-Saharan Africa, teachers’ beliefs about teaching English in the primary school were examined by three interview questions at the end of the 12-week project:
What in your opinion are the characteristics of an effective English lesson?
What do you try to do in your lessons to help your pupils learn English more effectively?
What in your opinion is the best way to help your pupils learn English?
These questions were designed to tap into teachers’ beliefs indirectly, by asking them about their teaching. This is generally more productive (in generating a response) than direct questions which ask teachers for their beliefs on a particular issue. Teachers may find it hard to respond to such direct questions because, for example, the questions are too abstract or because teachers have not had previous opportunities to make their beliefs explicit and will be unsure of how to respond.
In another PDI, this time in Denmark, teachers completed a questionnaire at the beginning, mid-point and end of a two-year project. The focus of the project was plurilingualism in early foreign language learning and several of the questionnaire items focussed on teachers’ beliefs about early foreign language learning, such as:
It is important to formulate distinct objectives for the teaching of English in the first grade.
It is important to give the students of English in the first grade feedback.
It is good that the students of Denmark learn English as their first foreign language.
It would be an advantage if the students had more languages to choose as their first foreign language.
A comparison of teachers’ responses to such items at project entry and exit was then used to assess the extent to which changes in teachers’ beliefs about plurilingualism and early foreign language learning had occurred during the project.
The impact of PDIs also often examines teachers’ beliefs about the impact they feel the project has had on them. For example, teachers may be asked to rate their knowledge of a particular topic or how well they can complete specific practical tasks. Thus, on the ELISS project, an exit questionnaire asked mentors for their views on the impact of the PDI on their practices and on those of the teachers they were supporting. One of the questions was this:
4.1. Describe the impact of the mentoring project on
In interpreting the responses to such questions, it is important to indicate that these are beliefs or reported impacts rather than a direct measure of (in this case) whether mentees’ teaching had changed. To take another example, in a project with 600 teachers a self-assessment questionnaire was administered to teachers before and after a one-year intervention which focussed on improving both their spoken English and their approach to teaching speaking. Sample items in the instrument were as follows:
In this case a comparison of the pre- and post-project self-assessments allowed conclusions to be reached about the extent to which teachers felt their ability to speak and teach English had improved.
Teachers’ beliefs about their ability to complete certain tasks, as in the example, above, can be framed as an evaluation of teacher confidence (this is often discussed in the literature using the term self-efficacy – Bandura, 1997). I have worked on projects where improvements in teacher confidence have been identified as a KPI, and the issue merits a brief comment here as the assumption that teacher learning equates with greater confidence may not always be warranted. In the EfECT project (Borg et al., 2018), participants were asked to rate their confidence in using English and in teaching interactively at baseline, mid-project and end-project. The assumption was that increased confidence would be one indicator of impact, but comparisons of the baseline and exit data did not show a consistent pattern of improved confidence. Rather, there were cases where reported confidence at end-project was lower than at baseline. There were two explanations for this outcome; at baseline teachers’ ratings were unrealistically high and at end-project they had a better understanding of how much they did not know. In this case, lower end-project confidence could actually be seen as a positive impact associated with greater self-awareness. Increased teacher confidence may thus only be an appropriate target impact when teachers have sufficient time to gain mastery experience (through repeated successful practice). While questions do arise about the validity of self-assessments of this kind (see Borg and Edmett, 2018), attention to such issues in evaluating the impact of PDIs can be justified with reference to the view that confidence influences how much effort a teacher invests in teaching (Tschannen-Moran and Hoy, 2001). Teachers are also more likely to engage in behaviours they feel they can fulfil competently.
Still on the subject of attitudes and beliefs, perceptions of impact can also be assessed by asking teachers to write about the ways in which they feel that their participation in a PDI has made a difference to them. One project I worked on used a technique called Most Significant Change (MSC – see Davies and Dart, 2005) to collect impact data of this kind. This technique asks teachers to produce a short written account (approximately 300 words) in which they reflect on one significant change they have experienced as a result of the intervention they have taken part in.
7
Below is an example from the EfECT project (Borg et al., 2018): Before the project began, I didn’t know how to ask questions to develop HOT [higher order thinking] skills. I was disappointed because my students didn’t get a chance to think critically and to expand their answers. During the EfECT project I have thought about questioning. I started pre-preparing questions which are LOTS [lower-order thinking skills] and HOTS [higher-order thinking skills] including thinking time and planned which activities I should use. For example, think-pair-share is good for HOTS students can think individually without other dominance and in pair-work, they can check their answers with their partner. Then they can consider when they share their answers. As a result of this, I have improved my questioning. Now I can ask questions them to develop their HOT skills. For example, in my last observation, I asked the whole class for closed recall questions and for open questions. I give thinking time to discuss answers in pairs and nominating individual students to answer. I am really satisfied that I have been able to improve my questioning.
With any kind of qualitative impact evaluation data it is important to ensure that the resources (such as time and expertise) required for its analysis are available. On another project where the MSC technique was used to assess 100 teachers’ perceptions of how they had changed, resources had not been allocated for the analysis of the written accounts and consequently these accounts were not used in the evaluation of the project. Another reason for this was that it was not clear which of the project’s KPIs the MSC accounts were relevant to; this reflects another point I made above – that the collection of evaluation data should be linked to the objectives of the PDI.
Classroom Practice
PDIs can also target change that occurs in teachers’ routine classroom practices. In Kirkpatrick’s framework and those derived from it, this is Level 3 of impact, which focusses on ‘behaviour’.
Observation is the most obvious strategy to use to assess this kind of impact. For example, in the TEJAS project in Maharashtra, the evaluation involves visits to classrooms by observers who rate what teachers do against a number of criteria that are aligned with the communicative goals of the PDI. In the project with 600 teachers of English mentioned above, every teacher was observed once at the start of the project and again at the end (about 12 months later). A structured observation sheet was used, the content of which was aligned with the topics covered during the PDI. Observers were required to assess the extent to which specific teacher behaviours or pedagogical strategies were in evidence. Here is an extract from the tool:
Observers attended a training session in which the observation tool was introduced and where some standardization work took place using a video-recorded lesson.
In the above example, the observation was a rating scale. In another project the tool used time-sampling, with the classroom behaviours of teachers and learners being recorded against predefined criteria every 60 seconds, as follows:
No baseline observations were available but a control group (teachers not involved in the PDI) was also observed at end-project using the same tool and the results of the PDI and non-PDI groups compared to assess the extent to which classroom practices had been influenced by the project.
The availability of a baseline will assist in judgements about the extent to which behaviour has changed, though, given the potential for reactivity (changes in the observee’s performance as a result of the presence of an observer) multiple observations over time are likely to produce a more reliable picture of what teachers can do (Borg, 2018); additionally, other potentially less intrusive strategies for assessing changes in teacher behaviour are available, such as video-mediated observations (platforms such as VEO or IRIS Connect allow teachers to record lessons and upload them for others to view - see Hockly, 2018) or teacher portfolios (Alwan, 2007; Gelfer et al., 2015; Xerri and Campbell, 2016). Decisions about which specific methods to use in assessing changes in teaching behaviours during or as a result of a PDI will be made, though, not only with theoretical concerns in mind but also on the basis of the resources available; technology-assisted observation using commercial products has cost implications, while there is no point in making portfolios a key strategy for evaluating the impact of a PDI if the resources required to assess them are not available (this applies to the collection of qualitative impact evaluation data more generally).
While observations of what teachers do will always be the most direct way of understanding the impact of a PDI on what happens in the classroom, the resources required to observe large numbers of teachers mean that there is also scope for examining reported impacts of a PDI on teaching and learning. For example, in one project, teachers were asked in an exit questionnaire questions such as these:
Teachers’ responses to such items cannot be interpreted as evidence of actual change in the classroom but of their perceptions of the extent to which change has occurred.
Portfolios were also mentioned above as a strategy that can be used to assess the impact of a PDI on what teachers do. A teaching portfolio is a collection of materials compiled by teachers to exhibit evidence of their teaching practices, school activities, and student progress … portfolio materials are collected and created by the teacher for the purpose of evaluation and are meant to exhibit exemplary work (Goe et al., 2008: 30).
As noted by Borg (2018: 27), portfolios ‘provide a holistic, authentic and evidence-based picture of teachers’ work over time and thus provide a sound basis for the assessment of teacher competence’. In some ways, it could be argued that the analysis of a teaching portfolio which documents what teachers do over time is a more effective measure of the impact of a PDI than a one-off classroom observation. However, portfolios make heavier demands on teachers and on those responsible for examining the work the teachers produce; in one project I worked on teachers were required to compile simple portfolios over one school year, but because resources had not been allocated for these to be analysed they did not contribute to the evaluation of the impact of the PDI (this does not mean, of course, they did not support teacher development).
Reflective Competence
The last kind of impact on teachers that a PDI may have which I will discuss here is reflective competence. A number of projects I have worked on in recent years have targeted improvements in teachers’ ability to reflect on their teaching and I will illustrate how this has been approached. Reflective practice is of course a complex area supported by an extensive literature (e.g. Mann and Walsh, 2017; Schön, 1983; Sellars, 2017) and one of the immediate challenges in assessing the impact of a PDI on reflective practice is to stipulate how exactly reflection is being defined. Only then can it be assessed. For example, different levels of reflection have been identified (Hatton and Smith, 1995), from that with a more practical focus on what teachers do to more critical perspectives (Morgan, 2017) that invoke issues such as morality and social justice. In the PDIs I have on worked on, reflection has often been a novel idea for teachers and it has generally been conceptualized in basic terms as teachers’ capacity to ‘analyze, discuss, evaluate and change their own practice, adopting an analytical approach towards teaching’ (Calderhead and Gates, 1993: 2). In the EfECT project already mentioned, reflection was assessed at baseline and end-project through pre-lesson and post-lesson discussions between EfECT trainers and the teacher educators taking part in the PDI. The results showed that just over 68% of the participants obtained an improved rating when end-project and baseline assessments were compared. As acknowledged in the report of this evaluation (Borg et al., 2018), this was a rather limited way of assessing reflective capacity, relying as it did on one quantitative indicator. Greater insight would have been obtained, for example, via regular conversations between trainers and teacher educators over time, or through teacher portfolios; these were discussed above as a way for teachers to provide evidence of what they can do, but portfolios can be enhanced when they include a reflective component too (i.e. when teachers not only describe their work but examine its rationale and effectiveness). Strategies such as teacher journals are also noted in the literature (e.g. Farrell, 2007) as a way of obtaining insight into teachers’ reflective skills; these have not been used in any of the ELT PDIs I have worked on; there will be various reasons for this, such as the added demands that writing a journal makes on teachers, the linguistic proficiency required for teachers to reflect in writing in English, and the resource implications that arise when large numbers of teacher journals need to be reviewed. In the ELISS and TEJAS projects, WhatsApp groups have been used to create online communities of practice where teachers can share ideas and experiences and reflect collaboratively on their work; in this case, assessing the quantity and quality of reflection that occurs requires the analysis of the WhatsApp transcripts (see Parnham, Gholkar and Borg, 2018), which is, again, a time-consuming task for which resources and expertise may often not be available.
One final example of how teacher reflection can be evaluated on PDIs is through video-based observations. I am aware of projects where this strategy is currently being explored; in one I was involved in recently, teachers were given access to a platform called VEO and encouraged to upload recordings of their lessons for other teachers to view and comment on. While this has exciting potential for creating online reflective communities, various factors that I discussed earlier hindered the process, including technical challenges in recording and uploading videos as well as professional issues related to teachers’ lack of reflective experience. Ethical issues will also arise with video recordings, especially when children are included, and relevant permissions will need to be obtained. Another use of video to promote collaborative reflection is film clubs, 8 where teachers watch videos of lessons together and discuss them. I am aware of one large-scale ELT PDI where this is being trialled.
A key point here, then, is that while there are many possible ways of encouraging teachers to reflect, for the purposes of evaluating PDIs it is necessary to define what the evidence of impact (in this case of teacher reflection) will be and how it will be captured. While reflective competence is a justifiable objective in a PDI, its assessment does raise a number of theoretical and practical challenges. In my experience, one difficulty has been that because reflection is often one of several objectives in a large-scale PDI and because it is not as concrete as improvements in teacher language proficiency and teaching strategies, it tends to be given less systematic attention when a PDI is being evaluated.
Impact of PDIs on Organizations or Systems
In Kirkpatrick’s model, the final level of impact is described as ‘Results’. As originally defined, this refers to the broader benefits to a business of training. This may be expressed in terms, for example, of profit, productivity and quality. It is not wholly obvious how these apply to educational settings and in the adaption of the Kirkpatrick framework provided by Guskey (2000) it is not included. One way of thinking about this in educational terms, though, is in relation to the broader impact that professional development has on an organization or educational system more generally. For example, it might be hoped that as a result of a PDI schools might become characterized by a stronger collaborative ethos. This is the case with the KfK (‘Competence for Quality’) teacher development scheme in Norway, for example, where one objective is ‘collective learning and development of professional community in each school’ (Ministry of Education, 2015: 5). Teachers who attend this scheme are thus expected to share what they learn with colleagues in their schools (although as far as I am aware no mechanisms exist for formally evaluating the extent to which this impact is achieved). This is an example of how PDIs can seek to promote change that extends beyond the immediate beneficiaries and to contribute positively to schools and systems more generally. Another example can be seen in PDIs (such as TEJAS and PALTAGs in Palestine) which deploy Teacher Activity Groups (TAGs); 9 these create communities of teachers from different schools who meet monthly and continue to interact online outside the face-to-face meetings. TAGs seek to create a more collaborative ethos within educational systems; assessing the extent to which this occurs, although, it is quite challenging given the ‘soft’ nature of PDI impacts of this kind.
Impact of PDIs on Students
In adapting Kirkpatrick’s work, Guskey (2000) added a fifth level to his framework for assessing the impact of professional development: ‘student outcomes’, i.e. the difference that a PDI makes to students. Goodall et al. (2005) analysed how teacher professional development in the UK was evaluated and concluded that while student outcomes were the kind of impact that was the least likely to be assessed, it was in fact the most important. Student outcomes are typically defined in terms of achievement but can also include other kinds of outcome such as student engagement and well-being (Timperley, 2011). In recent years there has been much discussion in the teacher evaluation literature about ways of establishing links between what teachers do and student outcomes and this has highlighted various difficulties in demonstrating that teaching causes learning (such debates have been particularly prominent in the context of the use of value-added metrics to evaluate teacher quality, see Braun, 2005; Corcoran, 2016). If it is difficult to make direct links between teaching and student outcomes (see also Hayes and Chang, 2012), then establishing such links between professional development and student outcomes is even more complex. Nonetheless, it is becoming increasingly common for Ministries who sponsor PDIs to ask for evidence that these are making a difference to student learning; such requests are perfectly reasonable and cannot be ignored, but it is important to approach the task of evaluating how PDIs affect learning with a full understanding of the various challenges that this task creates. Let me explore some of these, using a few real examples.
In the case of a PDI involving over 16,000 teachers and many thousands of students, no baseline assessment of students’ levels of English took place. At the end of the project, a small sample of students were asked via survey about their motivation to learn English. In this case, a student outcome (motivation) is being assessed, but without a baseline, a control group and a representative sample it was not possible to make any links between this outcome and the PDI. In other words, even if the surveyed students said they were motivated to learn English, there is no evidence that they were more motivated than before the PDI, that they are representative of all students taught by project teachers or that students not taught by teachers on the PDI were any less motivated.
On another project, the students of teachers participating in the PDI were assessed using a standardized test at the start of the project, together with a sample of students from similar classes whose teachers were not taking part in the project. At the end of the project both groups of students were assessed again. This allowed for the assessment of whether the experimental group (students whose teachers were on the PDI) improved more than the control group (students whose teachers were not). If the experimental group did improve more (and statistically speaking the difference between the two groups was significant), this would support the claim that the PDI made a difference, with the proviso that the students in both groups were not chosen randomly (in a statistical sense) and that other variables beyond the control of the PDI may have influenced the results.
To take a third example, student achievement was assessed after a PDI using the national tests administered by the Ministry of Education at the end of each school year. The students in classes taught by project teachers were compared to those of non-project teachers and no significant differences were found. While prima facie this might be seen as evidence that the PDI did not make a difference, such a conclusion would be inappropriate. For example, questions would arise about the lack of a baseline (initial differences in the student groups may have been missed); the quality of the national test being used and how it related to the focus of the PDI; the extent to which teachers transferred to the classroom what they learned during the PDI could also be a significant factor in explaining why the project students did not perform better than the non-project students.
To take one final example, in the Danish project on early foreign language learning mentioned earlier, pupils (both experimental and control groups) were administered a test at the start of the project and again at the end, with a view to evaluating the impact of the PDI on pupil outcomes. One challenge here was that the official end of the project was perhaps too soon to fully evaluate the impact of the PDI on the children; during the project teachers had developed new materials to support early foreign language learning and at the end of the project (when the pupils were retested) these materials were only just starting to be used in classrooms. This highlights one key point relevant to the assessment of student learning – enough time must have elapsed to allow for a realistic assessment of the impact a PDI may have had.
Focussing on non-cognitive forms of student outcome such as student engagement and motivation can alleviate some of the difficulties that arise when the focus is exclusively on achievement and especially when achievement is defined as test scores – it may be assessed more qualitatively and longitudinally too (for example, through the ongoing and systematic analysis of the work students produce – see Easton, 2009). The broader challenge of demonstrating that a PDI is the cause of student outcomes will, though, remain and can only be addressed to some degree by robust impact evaluation designs of the kind which PDIs may lack the resources and expertise to implement (Gersten et al., 2014 analysed over 900 studies for evidence that PDIs for mathematics teachers impacted on student achievement and concluded that in only five of these were the evaluation procedures robust enough to produce trustworthy evidence). Time is also another common adverse factor here; on short PDIs, the expectation that significant measurable impacts on student learning can be obtained is unrealistic. Despite the arguments above, though, it cannot be denied that attention to student outcomes is an important dimension in the evaluation of PDIs. Timperley (2011) does in fact argue that enhancing student outcomes is the primary purpose of professional development.
Conclusion
The purpose of this article has been to argue for the importance of evaluating the impact of PDIs, to discuss ways in which such impact may be defined and to illustrate how impact has been evaluated on a range of PDIs I have worked on in recent years. My analysis has drawn on examples of larger-scale and structured PDIs which operate at the level of educational systems or whole organizations because such PDIs normally entail a substantial investment of time, money and effort and an understanding of whether they make a difference is justified from an accountability perspective. However, as discussed earlier, there are additional important benefits of evaluating the impact of a PDI, including formative decision-making and broader understandings of what works when it comes to professional development. Despite my focus on larger-scale PDIs, many of the points I have made apply to smaller initiatives too, where in fact the evaluation of impact will be facilitated by easier access to participants and the more modest resources required to examine the difference that a PDI has made. I have referred to several actual PDIs to illustrate both what gets evaluated and how; further possibilities do exist though – for example, teacher motivation is another impact that a PDI might want to examine, although none of the PDIs I have worked on have explicitly examined this issue (for recent discussions, see Lauermann, 2017; UNESCO-IICBA, 2017; World Bank, 2018).
Key points to emerge from the analysis here are the following:
The impacts of PDIs can be defined in various ways to include a range of cognitive, affective and behavioural outcomes affecting teachers, students, organizations (and educational communities more broadly) at different points in time.
Defining the hoped-for impacts of a PDI is a necessary early step in formulating a strategy for evaluating how far these impacts have been achieved.
The impact of PDIs on students is typically defined in cognitive terms (achievement) but can also include various non-cognitive outcomes such as motivation, engagement and confidence.
Establishing causal links between a PDI and student outcomes is complex and requires robust impact evaluation designs.
In evaluating PDIs, it is essential to distinguish between direct (such as observation) and indirect (such as reported behaviours) measures of impact.
It is difficult to talk about impact in the absence of a baseline against which subsequent measures (quantitative and/or qualitative) can be compared.
Various strategies for evaluating the impact of PDIs are available and it is important that choices about these are compatible with the kinds of impact being targeted.
In PDIs, decisions about how to evaluate impact will be shaped not only by theoretical considerations but also (and often more powerfully) by practical constraints such as the availability of time, funding and expertise.
By way of conclusion, Table 1 lists a number of criteria which, based on my experience on various projects in recent years, can be used in assessing proposals for evaluating impact on a PDI. Most of these criteria have been mentioned in the discussion above and attention to them can enhance the effectiveness with which the impact of a PDI is evaluated. An integral approach to evaluation builds the analysis of impact into the design of a PDI, rather than simply adding it at the end, while evaluation which is objectives-driven will ensure that PDIs are being evaluated against their actual targets (this does not preclude, though, some focus on any unintended outcomes a PDI might have). Multi-staged evaluation involves data collection at various points of time – various options exist before, during, at the end of and after (short, medium and long-term) a PDI. To enhance the quality of impact evaluations it is important that data collection and analysis are rigorous – evaluation tools need to be well-designed and fit for purpose and the information that is collected should be analysed systematically. Evaluations can also be enhanced when they are multi-method and draw on different kinds of qualitative and qualitative information. In many contexts, the evaluation of impact in PDIs can also be enhanced through the use of technology and this potential should be considered when an impact evaluation framework is being designed. The two final criteria here are that evaluations should be formative, feeding into ongoing adjustments to the design and delivery of PDIs, and feasible – doable, given the resources available.
Some Criteria for Impact Evaluation on PDIs.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
