Abstract
This study analyses how studies on disadvantaged schools, improvement and test-based accountability relate to each other. The analysis covers 69 studies on disadvantaged schools reported in prestigious educational journals and conducted in 1995–2015. Educational policies related to evaluation and accountability define the official goals of schooling, and the aim in this article is to analyse how the chosen studies discuss these educational policies and understand school success and failure. The following questions were asked: What typologies related to test-based accountability can be constructed in research on disadvantaged schools? What understandings of good schools are embedded in the identified typologies? Disadvantaged schools are at the centre of improvement and therefore also the target of evaluative policy practices. The results show that research supports test-based accountability practices, and that critical studies on school improvement are in the minority.
Introduction
This study analyses the relations among studies on disadvantaged schools, school improvement and test-based accountability practices. The focus is on how a good school is constructed in the interrelations between studies and hard accountability practices. The data consist of 69 English-language articles reporting research on disadvantaged schools and published in 1995–2015 in six prestigious educational journals. The data were identified following a systematic literature review and subjected to qualitative content analysis.
Educational policies define the aim and targets of schooling. Many education systems have adopted accountability practices utilising hard techniques such as the standardised testing of student outcomes to control the achievement of these policy aims. Various studies have raised critical discussions on how test-based accountability, along with large-scale assessments such as the Programme for International Student Assessment (PISA), narrow educational goals. Evaluation based on standardised high-stakes testing outcomes in particular has caused concern. Schools have a lot at stake in achieving acceptable outcome levels, and this has had profound consequences on everyday life and understanding related to schooling (Ball, 2003; Fullan, 2011; Hamre, Morin, & Ydesen, 2018; Kauko, Rinne, & Takala, 2018; Kelly, 2018; Lindblad, Pettersson, & Popkewitz, 2018; Lingard, Martino, Rezai-Rashti, & Sellar, 2016; Mons, 2009; Ozga, Dahler-Larsen, Segerholm, & Simola, 2011; Ranson, 2003; Ravitch, 2010; Sahlberg, 2011; Thrupp, 1999; Whitty, 2002; Wrigley, 2003, 2011).
Previous research has shown that schools with a somewhat marginalised student body have difficulty in achieving success measured in terms of learning outcomes (Coleman et al., 1966; Muijs, Harris, Chapman, Stoll, & Russ, 2004). Consequently, so-called ‘disadvantaged schools’ tend to be taken as targets of improvement practices to produce better outcomes. As a concept, disadvantage(d schools) is context-related and it cannot be universally defined. It commonly refers to a school with a student body that is predominantly low in terms of socio-economic status. Disadvantaged schools in this study are understood through the lenses of the studies that were chosen for the analysis, which could be classified as studies on disadvantaged schools based on database indexes.
The research objective of this study is to explore how academic studies on disadvantaged schools are constructing, maintaining and challenging an understanding of the perceived necessity for test-based accountability and related improvement practices in the process of achieving the politically determined aims of schooling. A further aim is to analyse how a good school is fabricated in the process, in other words the role of educational policies in the definition of successful and failing schools in educational research. This matter has previously raised intense discussions among scholars (Cuban, 2003; Gewirtz, 1998; Reay, 2004; Slee & Weiner, 2001; Teddlie & Reynolds, 2001; Thrupp, 2001; Thrupp, Lauder, & Robinson, 2002; Townsend, 2001; Wrigley, 2003, 2011). The research questions addressed are: What typologies related to test-based accountability can be constructed in research on disadvantaged schools? What understandings of good schools are embedded in the identified typologies?
The argumentation in this article proceeds as follows. Previous research on evaluation politics, and especially on test-based accountability, is reviewed next, and after that the history, political connections and formation of quality assurance and evaluation (QAE) policies are discussed, as are various critical comments on the problems involved and the alternatives to test-based accountability. Then, this study is presented in terms of aims, data, methods and analysis. The results are presented through the two main typologies identified. The concluding discussion assesses the critical implications of the findings for future research.
An intensified testing culture as a means of school improvement: background
The national testing of pupil achievement became more widespread in Europe during the 1990s. Since then, and especially in England, tests have been established more fundamentally to monitor and improve the quality of education, and to make educational systems more effective. (Eurydice, 2009) This follows the trend in the United States and Australia, for example, as well as New Zealand, which nevertheless changed course and discontinued using standardised assessments rather recently (see Thrupp, 2018). The political, societal and theoretical background behind the spreading of testing regimes can be traced to and understood from different but related perspectives. One consequence of globalisation is the accelerating convergence in educational policies worldwide. Included in this convergence of educational systems is the ‘governance turn’; the concept of governance or steering from a distance is used to conceptualise the increased potential for monitoring and self-control arising from the explosion in the production of numerical data. (Lindblad et al., 2018; Ozga et al., 2011; Simola, Ozga, Segerholm, Varjo, & Normann Andersen, 2011; see also Kauko, Takala, & Rinne, 2018) Tests influence actions on the grass-roots level and then provide information to the top level, thereby creating a circle of quantified data and social control. International large-scale assessments (LSAs) such as the PISA tend to affect practices in national contexts, thereby enhancing test-based accountability (Thrupp, 2018).
The neoliberal ethos of competition and the global turn towards new public management (NPM) have served to mediate the new knowledge-governance relationship, and perhaps also vice versa (Gunter, Grimaldi, Hall, & Serpieri, 2016; Simola et al., 2011; Thrupp, 2018). According to Natalie Mons (2009), the theoretical background of standardised assessment comprises NPM and political evaluation on the macro-level, and the economics of education as well as research on school effectiveness on the micro-level. NPM, political evaluation and the economics of education highlight the cost-benefit ratio of education, pupil achievement being understood as the quantified output of resources invested in public-sector education. The research on school effectiveness connects rather easily with this theoretical background, given the focus on the organisational and strategic structures within schools and the outcomes they produce (see also Thrupp, 1999; Wrigley, 2003).
Standardised assessment practices appear to be rational and objective, but their connection to these specific schools of thought makes them political (Ball, 2018; Lindblad et al., 2018). National assessment practices should be perceived as political and as parts of larger shifts in educational politics related to processes of globalisation and convergence, and to advanced technologies in QAE (Ozga et al., 2011; see also Ozga, Seddon, & Popkewitz, 2006). These larger shifts have been referred to as ‘policy or education by the numbers’ (Lingard et al., 2016, p. 1), ‘the infrastructure of accountability’ (p. 3) and ‘a comparativistic paradigm’ (Lindblad et al., 2018, p. 5).
Pasi Sahlberg (2011a, 2011b), building on the work of Andy Hargreaves, gives a theoretical and critical summary of recent developments in educational improvement in terms of the Global Educational Reform Movement (GERM) (or GERMS, if one considers the path-dependent vernacular versions – see Lingard et al., 2016, p. 6), which is spreading like a virus, whereas Michael Fullan (2011) writes about wrong drivers in educational reforms. Decision makers in many countries have resorted to hard QAE techniques, meaning national testing, ranking lists, inspection for evaluation, benchmarking and national goal setting, in order to improve education (Gray et al., 2011). According to GERM, six globally common features are perceptible in attempts to improve school quality, and student achievement in particular; standardisation, an increased focus on core subjects such as literacy and numeracy as well as predetermined results in the curriculum, the transfer of models from the corporate world, increased control and high-stakes accountability policies (Sahlberg, 2011a, 2011b).
Problems with and alternatives to test-based accountability
Improvement through hard QAE, and more specifically test-based accountability, provoke severe criticism and arouse concerns among scholars. First, there seems to be no empirical consensus with regard to testing, not even for the alleviation of educational inequalities (Mons, 2009). Second, the relationship between testing and what is considered to be education’s core purpose appears to be contested; testing conflicts with inclusive education, social justice, democracy, equal opportunities and recognition, and may even be counterproductive to these aims (Bingham, 2001; Hamre et al., 2018). Thus the question arises about the goals of education. ‘Should education be a production factor in the achievement of economic prosperity or should education serve personal development values, Bildung, and making students capable of leading rich and fulfilling lives?’ (Hamre et al., 2018, p. 254; see also Ball, 2018; Kauko, Takala, & Rinne, 2018; Lingard et al., 2016).
The accountability system, which is based on high-stakes testing, seems to be a double-edged sword for disadvantaged schools. On the one hand, given that schools with a disadvantaged student body are most likely to ‘fail’ (e.g., Muijs et al., 2004), the idea behind demanding better results from these schools is to prevent the staff from using the socio economic background of pupils as an excuse, and to make sure that disadvantaged pupils have equal chances to undertake further studies. On the other hand, the assumption that the success of a school equates with good outcomes in standardised high-stakes tests requires the staff of disadvantaged schools to excel personally on a daily basis, and overshadows other aims that schooling might have (Stahl, 2017; Whitty, 2002).
Apart from narrowing educational aims, hard accountability measures have had other consequences. As the concept implies, pupils, teachers and schools have a lot at stake. For pupils, such measures determine their school career. Permanently excluding pupils from schools also seems to be connected with hard accountability (Daniels, Thompson, & Tawell, 2019). As for teachers, pupil success and failure are tied to rewards and punishments related to salary, for example, whereas schools failing tests mean special measures for improvement or closing down. High-stakes tests seem to evoke the ‘teaching to the test’ phenomenon, and the offering of support to those who are close to achieving an acceptable result (Mons, 2009; Stahl, 2017). Tests emphasise individualism and encourage individuals to aim at ‘self-optimization’ (Hamre et al., 2018, p. 257), which also leaves the responsibility for success or failure to the individual. All in all, the chosen models shape the identities and thinking of teachers and students, which in turn shape practices in schools and the overall ethos of education (Ball, 2003; Ranson, 2003).
Alternatives to GERM have also been explored extensively in the research literature. What is common to them all is that they require a larger shift in the converging neoliberal ethos of (educational) politics. More effort could be put into redistributing wealth and funding more justly. Schools could focus on values such as democratic education and teach more complex skills such as problem-solving and social skills instead of a couple of core subjects (Lupton, 2005; Mons, 2009; Olssen, Codd, & O’Neill, 2004; Paulle, 2013; Thrupp, 1999; Whitty, 2002; Wrigley, 2003). The staff should be allowed to take risks; the emphasis should be on high professionalism and intrinsic motivation, and therefore responsibility would occur as a consequence (Fielding, 2001; Fullan, 2011; Sahlberg, 2011a). Hard QAE practices seem to be at odds with these proposals (see also Gray et al., 2011). Alternatively, accountability could be context-related, used for developmental purposes, and it should benefit grass-roots-level work instead of acting as a control device (Kauko, Rinne, & Takala, 2018; Lingard et al., 2016).
The study
The debate, especially on test-based accountability, is not only between critical researchers and policy practitioners, but also between research disciplines. Critics claim that studies emphasising standardised school efficiency have managed to create a discourse that dominates current educational practices (Cuban, 2003; Gewirtz, 1998; Reay, 2004; Thrupp et al., 2002; Wrigley, 2003, 2011) The testing culture has indeed changed the nature of educational research, and even of what counts as educational research (Lingard et al., 2016). Recent studies in the European context have identified the governing mechanisms through which actors on different levels influence educational research; evaluation systems, competition and economic competitiveness have an effect not only on the grass-roots level but also on research (Powell, Zapp, Marques, & Biesta, 2018; Zapp, Marques, & Powell, 2018).
This study is an analysis of literature reporting research on disadvantaged schools in 1995–2015. The focus of the analysis is on how academic research literature discusses the relationship between test-based accountability policies and disadvantaged schools, and on the role of hard accountability practices in the definition of what constitutes a good school: policies define the goals that schooling has, but what is the role of research in this? The concept of a good school is understood as politically articulated and entangled in historical and discursive formations, which makes ‘success’ a social phenomenon. As a consequence, studies on disadvantaged schools are understood as being involved in creating and maintaining the ethos behind school improvement (see Thrupp, 1999; Wrigley, 2003, 2011). Disadvantaged schools are often targeted for improvement. It is therefore necessary to take a critical view on how the desired quality is understood, if it is understood in terms of outcomes in standardised tests it might not, in reality, improve the lives of marginalised students (Whitty, 2002; Ravitch, 2010).
To capture the relationship between research and test-based accountability practices the following questions are addressed. What typologies related to test-based accountability can be constructed in the research on disadvantaged schools? What understandings of good schools are embedded in the identified typologies?
The data consist of 69 English-language articles published in six prestigious educational journals, identified following a systematic literature review (Petticrew & Robert, 2006) and analysed by means of qualitative content analysis (Schreier, 2012). The articles (1) relate to research on disadvantaged schools, (2) focus on primary or lower-secondary schools, (3) are peer-reviewed and (4) were published between 1995 and 2015. These 20 years were chosen because it is a controllable time span and still manages to describe the near history of research on disadvantaged schools. It also covers the rise of the test-based accountability period. The focus was on primary and lower-secondary levels because practically all children go through these educational stages. The data collection proceeded as follows. First, the search was targeted on databases containing educational studies. Within ProQuest and EBSCOHOST, the search was targeted at Academic Search Complete, Education Research Complete, SocINDEX with Full Text, ERIC, PsycINFO and Sociological Abstracts databases, as well as SCOPUS. The following Boolean algorithm was applied, (disadvantaged OR deprived OR ‘low socioeconomic’ OR ‘low socio economic’ OR poor OR failing OR marginali* OR low-performing OR underachieving OR ‘under achieving’) NEAR/3 schools OR challenging NEAR/1 schools.
The search was further targeted on abstracts or ‘all but full texts’. 1 The first extraction limited the number of articles. All the abstracts were read through, and the articles that did not meet the selection criteria were discarded. After this, two criteria drove the choice of journals: they contained several of the studies that were chosen based on the abstracts, and had a high impact factor (IF, Journal Citation Reports, year 2015). Journals that engaged in discussions about how disadvantaged schools should be studied were also emphasised in the selection (see for example, Downey & Condron, 2016; Slee & Weiner, 2001; Teddlie & Reynolds, 2001; Thrupp, 2001; Townsend, 2001). The six journals chosen were the American Educational Research Journal (IF 2.924), the British Educational Research Journal (IF 1.124), Educational Evaluation and Policy Analysis (IF 2.02), the Journal of Educational Policy (IF 2.174), School Effectiveness and School Improvement (IF 1.333), and Sociology of Education (IF 2.000). They contained all in all 69 studies on disadvantaged schools. It is worth taking a critical look at the journals and articles this enquiry reaches. The ranking of journals and their being considered prestigious based on a numerical value assigned to them could be perceived as a circle maintaining itself; publishing in restricted journals is desirable, and this consequently strengthens the position of these prestigious journals. However, the fact that the journals chosen for this study are likely to have a major impact on further research justifies the analysis of articles contained in them.
The theory-related qualitative content analysis (Schreier, 2012) proceeded as follows. It was targeted at the introduction and conclusion of the articles in question. If there was a discussion, it was also included. First, all the articles were read through. Next, the introduction, conclusion and possible discussion were extracted and fed into ATLAS.ti, scientific software for qualitative analysis. All references to neoliberalism, (social)justice, (in)equality, community, outcomes (all related expressions), purpose/aim, success, quality, fail(ure), improvement, effective(ness), and learning were coded, and the excerpts were extracted from each article. The excerpts were re-read, and the most relevant ones were gathered and collated into a table. The perspectives in each article were then summarised critically in line with the theoretical focus of this study. This enabled the construction of typologies from each article, and of two main typologies. The main typologies could be understood as analytical constructions: although not all the articles are clearly one or the other, they all have traits of either one of the extremes. The process was hardly straightforward, which is typical for qualitative analysis. The reading and understanding proceeded as a dialogue, not only with the theoretical part but also with the data itself; there were 69 studies on disadvantaged schools and it was natural to use them to so as to understand what matters when studying disadvantaged schools. The sharpening of the research and the analytical focus was a backwards-and-forwards process. The data were perused several times, and a final round was executed to increase the reliability of the analysis.
Results
In answer to the first research question of what typologies related to test-based accountability can be constructed in the research on disadvantaged schools, two main typologies were identified in the data. the aim in the first one is to improve disadvantaged schools to achieve better high-stakes outcomes, and the second one severely criticises current policies.
According to the effectiveness or quality-as-numbers typology, success and failure are based on national educational policies. Limits of goodness and badness are determined by the respective national accountability system, in other words the standardised high-stakes testing carried out on the school level. 2 Twenty-four articles fall clearly to this typology, and a further 12 consider contextual factors, but do not question high-stakes testing outcomes as the goal of improvement. 3 The research settings of these studies reflect hard accountability practices, and the aim of many is to inform policymakers; school success is straightforwardly attributed as success in measurable outcomes.
At the other extreme, 16 of the 69 studies openly refer to schools as institutions for developing more than academic learning outcomes and cognitive skills. 4 Most of them are critical of test-based accountability structures and their tendency to narrow down the purpose of schooling. 5 Instead, they demand quality that is beyond measurable test outcomes, 6 a focus on motivation, engagement 7 and intersectional troubles in learning, 8 practices and structures that promote social justice and social inclusion, 9 the social formation of pupils as citizens 10 and the development of a democratic character. 11 School quality relates to an internalised understanding of democratic societies. These 16 studies also depict schools as part of a bigger entity trying to affect disadvantage. Consequently ‘succeeding schools’ are part of larger themes of social justice.
The following paragraphs describe the two main typologies and how they relate to accountability policy and practices, improvement goals and perceived goodness, in other words quality that becomes constructed through different understandings of the goals. The results are presented through examples given the extensive data and space restrictions. This description also addresses the second research question concerning the various understandings of good schools embedded in the identified typologies. A ‘good school’ means different things depending on the improvement goal, it may mean good individuals in schools managing to achieve good test outcomes, for example, or it could imply a complex interplay among structural-level solutions, families, students and schools. This is illustrated through examples.
Most of the studies that use standardised test outcomes as empirical data or in descriptions of the features of case schools build their research setting so as to improve the results. In other words, school success is straightforwardly attributed in these studies, 12 schools and the people within them succeed by producing good-enough learning outcomes, and what is under scrutiny is whether they succeed in this, and if so, how. Such an approach assigns specific meanings to some expressions: ‘making a difference’ equals producing maximised test-based learning outcomes, ‘effective’ equals high-performing, while the low-performing are less effective, and ‘performing’ refers to test-based outcomes, for example. As Anzia and Moe (2014) write, ‘these are the schools in greatest need of improvement, and thus in greatest need of high quality teachers’ (p. 84). Improvement and quality here refer to performance, which refers to test outcomes. Carlson, Cowen, and Fleming (2013) ‘[–] show that low-performing voucher students tend to move from the voucher sector into lower performing and less effective public schools than the typical public school student attends, whereas high-performing students transfer to better public schools’ (p. 179). This quotation illustrates how ‘better’ refers to schools performing well in test outcomes.
The performance of individuals becomes a major importance and is in focus when the goal is to produce good test outcomes. This creates ‘kinds of people’ (see Popkewitz, 2018) in schools: teachers, principals and students. Good teachers are created through an internalised appreciation of the importance of children’s succeeding in tests. Nye, Konstantopoulos, and Hedges (2004) and Xu, Özek, and Hansen (2015), for example, suggest tackling inequality by allocating ‘effective’ teachers to disadvantaged schools. Teacher turnover is understood as a means of improving ineffective schools. Ylimaki, Jacobson, and Drysdale (2007) describe the aspired principal of disadvantaged schools as [having traits of] ‘persistence, empathy, passion, and flexible, creative thinking’ – [and having] ‘empathy for the barriers to learning that poverty can produce, [but not allowing] these conditions to be used as excuses for poor performance’ (p. 378). A picture of a good pupil or rather a bad one is also constructed: disadvantaged pupils are at-risk because they do not perform as well as their more advantaged peers, but disadvantage associates only with poor outcomes from the school’s perspective. If schools were to emphasise good behaviour, emotional control and better learning outcomes they might make these children into better learners in terms of test-based accountability. Inequality is understood as unfairness in school outcomes.
There are also studies that fall in between the two main typologies. Teacher turnover as a means of school improvement is considered problematic given its consequences beyond numbers to remaining teachers, for example (Ronfeldt, Loeb, & Wyckoff, 2013; see also Cohen-Vogel, 2011; Ingersoll & May, 2012). These ‘in-between’ studies aim at good outcomes but not regardless of context, 13 and therefore have similar traits as those promoting the view that the wider social and political context is inseparable from school functioning, and that the focus should be on system-level, long-term and deeper-rooted problems of social inequality instead of only on individual actors in schools. 14 Merry (2013; see also Chudgar & Luschei, 2009) investigated inequality using LSAs and test outcomes as data. According to the results of the tests and the LSAs, the differences in achievement between the United States and Canada were attributable to social conditions rather than the ineffectiveness of the US school system. The author calls for contextualisation in research, and also shows how LSAs and tests can be used to make contextualised mechanisms of inequality visible (see also Butler, Hamnett, Ramsden, & Webber, 2007; Charlton, Mills, Martino, & Beckett, 2005; Gorard, Taylor, & Fitz, 2002; Harris & Williams, 2012). Downey, von Hippel, and Hughes (2008) measured school effectiveness by analysing the effects on learning based on times when children are not at school. They claim that a good school cannot be judged on test achievements, and that schools should be evaluated based on their added value (see also Alexander, Entwisle, & Olson, 2001; Borman & Dowling, 2006). Value-added league tables are also criticised in the data, however, ‘[–] the need to distinguish “external” and “internal” variables [is] a distinction which is ontologically confused and empirically elusive’ (Power & Frandji, 2010, p. 391).
Many of the articles representing the typology that takes a critical stand on testing argue against the notion of school improvement based solely on test outcomes. Hard QAE practices in particular are fiercely criticised:
15
A single, externally imposed definition of which knowledge is worth knowing, where expertise lies, and what a good school looks like becomes foregrounded in these settings; learning rooted in students’ background and culture becomes secondary, at best. (Trujillo & Woulfin, 2014, p. 288)
These researchers openly consider, and call for politicians to consider, the aim and purpose of schooling, and juxtapose (measurable) cognitive, academic achievements and other aims in education. 16 Understanding schools as feeders for economic competition is also subjected to criticism (Lupton & Hempel-Jorgensen, 2012; Trujillo & Woulfin, 2014). As Angus (2012) states, for example, neoliberal policies and policy as numbers conflict with social justice, economic opportunity and democratic outcomes. Araújo (2009) calls for ‘becoming a person’ for democratic societies as the outcome of schooling instead of measurable test-based outcomes. The aim of Gewirtz, Dickson, Power, Halpin, and Whitty (2005), Lupton and Hempel-Jorgensen (2012), and Power and Frandji (2010) is to shed light on structural, system-level problems through analyses of recognition and recognitive justice (see also Thomas, 2013). They suggest that politics and research should focus both on pedagogies that make a difference and on practices that tackle inequality on the system level.
Some of these studies shed critical light on policies aimed at helping disadvantaged pupils and show that the policies face (unexpected) problems; the practices seem to be controversial in the sense that they may inflict damage on and exclude disadvantaged children, even though the aim is the opposite. 17 As Smith and Stovall (2008) state, ‘strategies to redevelop poorly performing “inner city” schools and the poor neighborhoods where they are usually located has real potential to do more harm than good for the families the policy is supposed to benefit’. According to Stevens and van Houtte (2011), evaluation systems have an effect on the teacher-pupil relationship, and hard accountability systems do not function for the benefit of the worst performers. Other ways of improving disadvantaged schools and lives are suggested, such as fostering an understanding of local contextual conditions to support families and communities, 18 as well as reshaping structures to create more equal societies. Success becomes constructed as the complex interplay between relationships within schools and policies that work on the level of families.
The concepts of quality and success are also under critical scrutiny in the ‘critical typology’. Hard QAE narrows the possibilities for innovative work in schools (Milbourne, 2004). It is claimed, for example, that it restricts not only other taught subjects but also subjects that are measured in terms of tests, such as mathematics (Hernandez-Martinez & Williams, 2013). This resonates with Luyten, Peschar, and Coe (2008), who express concern that engaged and motivated reading might, in fact, conflict with performance requirements. Hard QAE narrows the understanding of quality; Mintrop and Trujillo (2007) write about their research schools, which were surprisingly similar even though some of them were succeeding in terms of measured outcomes and others were not. The writers conclude that the ‘succeeding’ schools took the accountability system more seriously and aimed to succeed in that manner, whereas the others also emphasised other initiatives and aspects of schooling (see also Riley, 2013). These examples reveal how policies affect the way quality and success become understood and internalised within schools, and how these internalisations differ. It is not necessarily the case that one school is good and another one is bad, it is more a question of colliding understandings of quality (Mintrop & Trujillo, 2007). Singh, Heimans and Glasswell (2014, p. 838) describe these colliding understandings as ‘collateral realities’ and claim that researchers cannot separate themselves from them: when they are doing their work they are creating reality, an understanding of what good teaching, effective schooling and quality learning are.
Conclusion
This study addressed two questions: what typologies related to test-based accountability can be constructed in research on disadvantaged schools, and what understandings of good schools are embedded in them. It became evident that the outcomes of standardised, high-stakes tests play a major role in defining what schools and people within them should be aiming for. They also construct a picture of a good school and a good human being within a school through success or failure in test outcomes. Current policies affect the themes and execution of research, often avoiding open or embedded criticism. To consider this connection critically, what kind of research is left undone or is not accepted for publication in journals that are considered the most prestigious? Educational politics and research resemble a circle in which good and bad schooling are defined in terms of how well pupils do in standardised high-stakes tests, after which the focus turns to how bad schools and the people in them could succeed better, and this is then monitored on the same scale. It is impossible to improve schools in other directions within this circle. Practices that emphasise hard QAE evaluation remain the dominant practice of educational policy, fuelled and maintained by educational research. Kauko, Rinne, and Takala (2018, p. 183) describe the phenomenon as follows: [The results of this project –] ‘indicate that standardised testing feeds a need for more testing [–] and that quality becomes simultaneously a means of problematizing education and providing a solution for it’ [–].
Does this really improve disadvantaged lives? Concentrating on outcomes highlights a rather mechanistic view of people, their relationships and the work they are doing. When people are evaluated only through the lens of outcomes, little room is left for human error, caring beyond effective results and stability. An American researcher and former advocate of standardised high-stakes testing, Diane Ravitch (2010), claims that schools cannot improve unless poverty is tackled and unless the goal of schooling, what education is for, is (re)defined. This follows the conclusion in many of the articles analysed for this study. Thomas S. Popkewitz (2018) claims that ‘the paradox of the international comparisons is its inscription of difference that “makes” differences so that some can never be at the “top”’ (p. 231). If testing and comparisons indeed maintain the division between ‘good and bad’, and the bad are usually the disadvantaged, and if performativity pressures therefore do create exclusion and do more harm than good, as some of the data articles argue, 19 this is the phenomenon on which research could shed more light.
Disadvantaged people might benefit from a more critical perspective in the research on school improvement, including studies on mechanisms of socioeconomic inequality (e.g., Chudgar & Luschei, 2009; Merry, 2013), recognition (e.g., Gewirtz et al., 2005; see also Bingham, 2001; Fraser, 2009) and socially just pedagogies (e.g., Lupton, Hempel-Jorgensen, 2012). Both pedagogies that make a difference and practices that tackle inequality on the system level could be examined (Lupton & Hempel-Jorgensen, 2012), but such studies are clearly in the minority. Sixteen of the 69 studies comprising the data openly demand wider discussion on the purposes of schooling, and radical change especially in how school success is understood nowadays. They also consider the wider social and political context of disadvantaged schools. If the aimed-at success were understood as forging a connection between a human, society and moral, it would also place more comprehensive demands on accountability systems (Autio, 2014; Fullan, 2011; Sahlberg, 2011), but it might put quality back on the agenda instead of emphasising the measuring in itself (see Kauko, Rinne, & Takala, 2018). As Lingard et al. (2016) conclude, We also argue that we cannot reject the need for accountability in education; rather, what we need to do is reconceptualize it, so that systems and schools are held accountable for their educative and social justice purposes, but in ways that are productive, democratic, and socially just. (p. 15) Importantly, accountability systems must operate differently in different contexts. (p. 154)
The results of this study emphasise the need to put complexity back on the agenda. In all probability, there is not one simple measurable solution to problems concerning disadvantaged schools and the people within and around them.
Footnotes
Acknowledgements
The author is grateful to the anonymous referees and the journal editor for their valuable comments which were perceptive, and improved this article considerably. The author also thanks the Assistant Professor Sonja Kosunen, Doctor Helena Rajakaltio and Professor Piia Seppänen for their comments. Doctor Hannele Pitkänen gave invaluable help in the final phases.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Jenny and Antti Wihuri foundation; and the University of Helsinki.
