Abstract

This book is a recent addition to British Educational Research Association’s (BERA) research methods series, which already covers qualitative research, ethics, case studies, action research and ethnography. The five co-authors are from the Centre for Evidence and Social Innovation of Queen’s University Belfast.
The introduction and Chapter 1 are duly cautious in their claims for randomised controlled trials (RCTs). The authors remind us that each child is different, and that RCTs can only tell us what is ‘likely to work, for what types of children and in what contexts’. This is an important qualifier in a period when some politicians are prone to talk about ‘what works’ full stop. The authors reject as unhelpful the notion that RCTs are the ‘gold standard’ in research, although there is little discussion of what else might be valuable other than qualitative investigations to accompany, qualify and illuminate quantitative experimental studies. In our neoliberal policy environment, where other research is underfunded and undervalued and where education is frequently viewed simply in terms of providing ‘human resources’, it is important to defend philosophical or critical policy studies which discuss educational aims and values, and ethnographies and sociologies which develop a stronger sense of who is (or is not) being educated.
I have some sympathy for the authors’ annoyance at critics writing off their chosen methodology in a ‘constant recycling of stylised objections’. However, by insisting that RCTs are simply a neutral method, without particular ideological leanings, the authors fail to acknowledge the neoliberal context in which their own work takes place, and the reasons why it is politically in favour. They quote from Blair’s inaugural manifesto, in 1997, ‘What counts is what works’ without recognising neither what this ‘end of ideology’ discourse might imply nor its consequences in terms of educational aims. They speak of the £125 million awarded by the Department for Education to the Education Endowment Foundation (EEF) as if this were a politically neutral decision in the interests of greater efficiency. This fails to recognise the dubious functioning of the EEF.
Let us consider just one example. The unit where these co-authors work, and which is headed by one of them, has just been awarded over £1 million in partnership with Ruth Miskin Training to evaluate synthetic phonics courses produced by her own company! It is well known that synthetic phonics is strongly promoted by England’s schools minister. My intention here is not to argue ad hominem but to call for greater awareness, particularly among educational statisticians, of the power structures – the regime of untruth – in which they currently operate.
The authors are correct in pointing out the way some fads have been adopted by teachers without supporting evidence. They convincingly argue the moral case for evaluating teaching methods but are less successful in arguing that RCTs are the best method.
Consider, for example, the classic studies by Douglas Barnes and others on patterns of classroom language use, based on detailed classroom observation and transcripts. These studies had a powerful influence in the 1970s in persuading teachers to reflect upon their own practices and adopt more dialogic approaches to develop pupils’ spoken language. They provided rich examples of the quality of language and thinking which can result from small group discussion. By contrast, RCTs tend to lose the key evidence, reducing it to a number. They also tend not to identify how and why particular practices impact on learning. Although the authors recognise these problems in Chapter 1, calling for more multi-method evaluations, this awareness fades in the rest of the book.
Critical realism
Chapter 1 also summons Critical Realism as a philosophy which overcomes the empiricist tendency to rest content with surface features. Crucially, critical realists stress the importance of looking for causal mechanisms and forces which may be hidden, rather than staying with surface data: Whilst they believe that there is an external reality that is reflected in measurable regularities and patterns, it is argued that these cannot be captured by a simple set of variables but are generated through complex and inter-connected sets of human activities that are essentially open and located within particular social contexts. (p. 25)
This is a weak reading of critical realism. For a brief introduction, readers should examine for themselves the early chapters of Ray Pawson’s (2006) Evidence based policy: A realist perspective, which Connolly et al. cite but fail to engage with. Pawson (2006: 21) quotes Sayer (2000:14) in making a fundamental distinction between regularity and causality: For realists, causation is not understood on the model of the regular succession of events, and hence does not depend on finding them or searching for putative social laws. The conventional impulse to prove causation by gathering data on regularities, repeated occurrences, is therefore misguided: at best these might suggest where to look for candidates for causal mechanisms. What causes something to happen has nothing to do with the number of times we observe it happening.
Pawson also reminds us that, in science and medicine, experiments try to ‘remove any shred of human intentionality from the investigation of whether treatment brings about cure’ (p. 28). This is impossible in educational experiments. He insists that, whether in natural sciences or social policy, interventions should be underpinned by theory and data must be evaluated in theoretical terms: ‘causal inferences are secured by theory-building’. Such considerations take us well beyond the minor adjustments recommended by Connolly et al. for making educational RCTs fit for purpose.
Logic models and outcome measures
Chapter 2 ‘Logic models and outcome measures’ outlines how a trial should be planned, to respond to an issue in sufficient complexity. The authors particularly favour trials which value classroom practitioners and involve them in the design.
There is emphasis here on theory but only in the sense of a generic theory of change or theory of intervention. Too little attention is given to asking how measurable outcomes relate to achievements and skills which are more important but less easily measured. The mountains of (largely) US-based RCTs on which recent systematic reviews rely are distorted by their frequent use of simplistic (but easy to administer) tests of limited sub-skills of literacy or numeracy.
There is a useful worked example of a logic model, based on health visitors distributing free books to 2-year-olds. The discussion shows how RCTs can be complemented by interviews and focus groups which explore details of the process, and case studies to identify ‘generative mechanisms’.
Designing trials
Chapter 3 discusses research designs for RCTs. This includes the warning not to mistake correlation for cause and effect, though it is questionable whether the RCT process itself can make this judgement. The importance of random allocation is stressed, though not what to do when pre-existing school classes result in an imbalance between experimental and control groups. To use another synthetic phonics example, a recent EEF-funded trial of Ruth Miskin’s programme for struggling Y7 readers announced ‘three months additional progress’ though the data inside the report shows this could be entirely due to poor randomisation.
The subjective factor of teachers’ buy-in to the intervention is emphasised as important – but doesn’t converting teachers to believers distort the quasi-scientific experimentation? This creates a paradox which puts the entire method in doubt as far as education is concerned: human volition is both necessary and a contaminator.
Data analysis
Chapter 4 explains the basic methods of data analysis. The authors take us patiently through linear regression and provide a formula for calculating constant, gradient and ‘error’. Error is the term often given to the deviation of individuals from the line of best fit; in other words, the extent to which each individual’s y value differs from what is expected on the basis of the x value. Constant and gradient are very clearly explained.
The notion of ‘error’ is far more problematic: there is no serious discussion of what the term means in real terms. Is it (a) an error of measurement? or (b) individual differences? or (c) the presence of other causal factors which had supposedly been ‘controlled for’? And how can we distinguish between these? This is a crucial discussion, as it raises the question of how important the intervention might be against the many other factors which affect outcomes in real situations. In some respects, the ‘error’ is the most interesting part of an RCT result.
Chapter 5 ‘Analysis of more complex RCT designs’ provides an opportunity to teach readers about multilevel modelling, the analysis of cluster RCTs, statistical significance and Hedges’ g. The problems of binary outcomes, missing data and non-randomised trials are considered. There is unfortunately no warning of the common problem of misreading statistically ‘significant’ as important.
Meta-analysis
Chapter 6 discusses systematic reviews, and in particular meta-analysis. It includes some important pieces of advice, including guarding against publication bias, the need for adequate theorisation and precise definition of the issue.
There is a welcome note on the need to define what ‘business as usual’ (the control group) actually means. Unfortunately (given that Hattie is mentioned in various places in the book), the writers seem unaware about the controversy that has surrounded his work, in terms of lack of clarity about the null condition and the ‘hinge point’. To take a key example, in an RCT on open questions, would the control group receive entirely closed questions or should their teacher simply be asked to ‘carry on as normal’? As Pawson (2006:51) states, And what of the control? This is not a piece of apparatus at idle. This is not the world in repose. This is no vacuum, because there is no such thing as a policy vacuum. Control groups or control areas are in fact kept very busy.
This chapter gives good advice on different ways of calculating effect size. It mentions the need to avoid aggregating dissimilar interventions but too briefly to do justice. The ‘apples and oranges’ problem is widely recognised as a central danger in meta-analysis. Unfortunately, the most important use of meta-analysis in schooling in the United Kingdom, the EEF Toolkit, sins grievously and systematically by casually calculating average gains regardless of type of intervention, outcome measure, age and other characteristics of pupils, or curriculum area.
Conclusion
Overall, the book has some useful guidance but is disappointing given its BERA accreditation. Although the authors are clearly aware of many of the difficulties with RCT methodology, they do not deal with the problems with sufficient rigour. The substantial literature around the whole question of ‘evidence’, in education and other fields, should not be written off as subjective prejudice against evidence and systematic evaluation.
