Abstract
A comprehensive literature review a few years ago found, contrary to a commonly held belief, research on public policy implementation to be still alive and have developed further well into the 21st century in quantitative terms. We pursue this line of inquiry by asking whether this conclusion also applies to progress in qualitative terms. All articles published in core journals in political science, public administration/management and public policy under the label implementation or implementing were used to investigate this research question. Key defining features of the plea for a more rigorous third generation research paradigm was used as benchmark to measure progress or lack thereof in the policy implementation literature over more than four decades of research. Our investigation basically supports conclusions from more intuitively based earlier state-of-the-art reviews: (1) policy implementation research has reached a relatively mature stage of development, (2) more progress has been achieved on methodological than theoretical fronts and (3) this field of study progressed fairly rapidly through two previous generations of research, coinciding roughly with the 1970s and 1980s with some assumed characteristics while progress since 1990 has been much slower and more incremental. The latter fact can probably best be explained by the very demanding nature of the third generation research paradigm itself and some inherent tensions and contradictions between its defining features. Our contribution in this respect is that we have been able to provide more detailed and nuanced information about exactly where progress has been achieved and when as well as where it is still lagging.
Keywords
Introduction
The previous decade saw several state-of-the-art evaluations of the research literature on policy implementation (Barrett, 2004; deLeon and deLeon, 2002; Hill and Hupe, 2009; O'Toole, 2000, 2004; Schofield, 2001; Schofield and Sausman, 2004; Winter, 2012a). The degree of consensus on main conclusions is surprising considering how much implementation scholars otherwise tend to disagree on conceptual, methodological and theoretical issues. Nevertheless, the main message is mixed. While reporting substantial progress on a number of fronts since the 1970s these scholars also point out that much remains to be done. Thus, progress is reported more on methodological than theoretical issues.
This author believes these state-of-the-art reviews to be on the whole quite plausible. Nevertheless, they share at least three shortcomings. The first is that they are usually phrased in fairly general terms, though there are some like O'Toole (2000) who is more specific in this respect than others. Secondly, they are all somewhat dated. A third and more serious problem is the lack of systematic data to bolster statements and conclusions. Rare exceptions here include the somewhat dated reviews of the research literature by O'Toole (1986) and Sinclair (2001). The review of the research literature by Saetren (2005) was both more up-to-date and quite comprehensive, but intended only to assess it in quantitative terms. Nevertheless, the latter literature review demonstrated clearly the difference between believing and knowing something about the evolution and structure of this research field. It is now time to examine whether there might be similar discrepancies with respect to evaluative statements about the state of the field in qualitative terms. This brings us to the purpose of this article.
Our objective is to provide an up-to-date empirically based state-of-the-art assessment of policy implementation research. What is the status of this research field today more than a decade since O'Toole (2000) assessed its progress in qualitative terms? How accurate and up-to-date is the latter state-of-the-art review? What exactly are the areas in terms of research methodology where some progress assumedly has taken place and where is it possibly still elusive? Likewise, what does it mean when progress on theoretical issues is said to be less evident? Is the implication that there has not been any progress at all in this respect? These are the main research questions we will pursue. Thus, our aim is to provide more nuanced, detailed and reliable information in this respect relative to previous review assessments.
Evaluative benchmark: The third generation research paradigm
All assessments of this kind require some sort of evaluative criteria or a benchmark. In this context, the third generation research paradigm promulgated by some leading implementation scholars more than two decades ago (Goggin, 1986; Goggin et al. 1990) serve as an obvious and natural benchmark. The unique trait of third generation research is its rigorous research design (Goggin et al., 1990: 19). Hence, we will examine to what extent the third generation research paradigm has been implemented in research on policy implementation. That is, we will use it as a bench mark to assess degree of progress towards the scientific standards that the third generation paradigm entails. Thus, whenever we use concepts like progress, advances, developments or synonyms subsequently they are used primarily descriptively and not in a normative manner to assess the degree of implementation of some of the key elements of the third generation research paradigm in policy implementation research.
Nevertheless, the overly ambitious and demanding nature of this research paradigm must also be noted (O'Toole, 2000; Winter, 2012b). Hence, it should be regarded more as an ideal type analytical construct rather than something expected to be found fully implemented in single research projects or even the research field as a whole for that matter.
The following defining features of the third generation research paradigm can be gleaned from a careful reading of Goggin (1986: 335–342), Goggin et al. (1990: 15–19) and Lester et al. (1987: 210–213):
Key variables must be clearly defined. Hypotheses derived from theoretical constructs should guide empirical analysis. More use of statistical analysis using quantitative data to supplement qualitative analysis. More comparison across different units of analysis within and across policy sectors. More longitudinal research design (i.e. research timeframe of at least 5 to10 years).
Not all of these aspects of the third generation research paradigm have been investigated by this author. This applies particularly to conceptual and definitional issues that are of crucial importance. The main reason for this omission is that these issues have been discussed in a thorough manner quite recently (Winter, 2012b). Hence, there is no reason for us to waste scarce space reiterating at length what has already been said in this respect. The bottom line here is that scholarly disagreement about what constitute implementation still exists after 40 years of research on this phenomenon. Furthermore, our assessment will focus more on methodological than theoretical issues though we will deal with both. The rationale behind this choice is twofold. First, there is less agreement on what might constitute progress in theoretical and normative terms compared to research methodology as we will explicate below. Second, theoretical development depends critically on the quality and rigour of research designs, which is the essence of third generation research paradigm.
The third generation research design implies by itself that there has been two previous generations of implementation research. Furthermore, it is generally taken for granted that there has been an orderly and linear development of research in qualitative terms from the first to the second generation assuming the challenge for researchers now being to make the move forward into the third (Goggin et al., 1990). The conventional story has it that implementation research in the 1970s was dominated by explorative a-theoretical single case-studies based primarily on qualitative data. We know that the first analytical and theoretical frameworks were developed from the mid-late 1970s. This allegedly resulted in more comparative and theoretical-deductive oriented implementation research and hypotheses testing on quantitative data during the 1980s (Lester et al., 1987). This change in scholarly orientation and research from the 1970s to the 1980s was later referred to as a transition from a first to second generation of implementation research (Goggin, 1986; Goggin et al., 1990; Hill and Hupe, 2009; Lester et al., 1987). By the late 1980s, the call for the third generation research paradigm emerged on the premise that further theoretical development was contingent on implementing its more scientific rigorous research design.
Research questions and hypotheses
First, we ask whether implementation research during the 1970s and 1980s really developed in the orderly and linear manner suggested by the terms first and second generations of research. Did first generation of implementation research really start in the early 1970s as universally claimed or perhaps even earlier? Furthermore, is the change in research focus and content as clear cut from the 1970s to the 1980s as suggested by the generation metaphor? Then, we proceed to investigate the main research question of this article: What has happened to policy implementation research after 1990 compared to the decades before in terms of implementing key elements of the third generation research paradigm? Has there been any progress at all as often assumed, and if so, on which fronts? And even then, how much or little progress are we talking about? What is the time line of this progress? Has it happened gradually and increasingly over the four last decades, has it perhaps already stagnated or maybe even declined?
We also ask whether significant regional differences exist in the implementation literature along the research dimensions investigated. And if so, on what continent, if any, are scholars leading the way towards the third generation of implementation research? Is there any reason to look to and learn from European scholarship in this respect as suggested by a recent reviewer (O'Toole, 2000)? In short, what is the status today more than a decade into the 21st century? What have we achieved and what remains to be done? Have we achieved as much progress as can be reasonably expected given the very ambitious and demanding nature of the third generation research paradigm? And finally, how reasonable and realistic is the third generation research paradigm as a bench mark in this respect?
After having summed up and discussed our major findings, we conclude with some comments on the accuracy of previous literature reviews in this respect and the feasibility and prospect of further progress towards the third generation research paradigm.
Data and methodology
Our data set includes all English language articles published in core journals with implementation or implementing as title word in the period from 1953 to 2009. Core journals are those journals primarily devoted to general theory development in political science or its two sub-disciplines public administration/management and public policy. The rationale behind this selection of journals is the assumption that contributions from political science and its related above mentioned sub-disciplines are more critical to theory development in the field of policy implementation than those from other academic disciplines. We recognize that other disciplines such as management and organization theory also play an important role in this respect, though on the whole less so. We therefore classify journals related to these other academic fields as near-core.
A caveat is in order here with respect to our data source. It should be noted that another major type of publication – books – are not included in our investigation of the research literature. How much bias that represents with respect to our findings and conclusion is hard to know. It should, however, be noted that this bias is somewhat ameliorated by the fact that empirical chapters in many books often are published as articles in core journals prior to their book publication.
Some 547 articles were identified given our search criteria explained below. Primary data sources were the Social Science Citation Index, JSTOR and Google Scholar. Most of these articles were downloaded in full text format when that was feasible or copies were ordered through the inter-library system. Furthermore, every article was registered in an electronic data format based on the Cardbox software program. Names of authors and journals, titles of articles and their publication year were systematically registered. The same was the case with respect to other information pertaining to our research questions. The latter, however, could only be gleaned from a closer scrutiny of the articles themselves. All this information was simultaneously indexed in a retrievable manner for purposes of analysis by the program.
The choice of the title words implementation and implementing as criteria for selecting articles for analysis is justified on three grounds. First, after publication of the seminal book titled Implementation by Pressman and Wildavsky (1973) this concept seems to have become institutionalized among policy scholars as a common denominator for the execution stage of policy process. Second, it is assumed that those who want to make a contribution to the field of policy implementation are more likely to use these as title words than those who do not share this ambition. Third, this concept provides a reasonably well-defined universe of research literature that can be subjected to independent critical evaluation and replication by anybody who might wish to do so.
However, regardless of selection criteria chosen, there are two types of biases operating that should be kept in mind. The first is called false positive selection. This refers to publications that use our key word haphazardly in their title without any real focus or interest in implementation phenomena. They should obviously not have been included. This bias is unavoidable when we perform computer searches in digitalized research literature data bases. It can, however, be detected and rectified through careful reading of selected publications of a manageable size. This has been done in our investigation of the literature on policy implementation. The second bias is called false negative and is much more difficult to handle. This refers to publications that should have been included because of their relevance but are not. These are publications using key words other than implementation in their titles or abstracts. The problem here is that we do not know what these other key words are as this often depends on the author's frame of reference. Compliance, enforcement and execution are some that are often used but there are no doubt many other alternative concepts that are used as substitutes as well. We could have tried to amend this type of bias as well by expanding the key words used in literature searches but decided not to do so on grounds already stated above.
Research results and discussion
The research generations metaphor: From a first generation to a second?
The 1970s do indeed stand out in many ways as different from later decades in terms research profile. The pioneering publications during this decade were indeed less theoretically and empirically oriented. The dominant mode of investigation was the single case-study based almost exclusively on qualitative data (Tables 3 and 4). This made sense then in a relative young, emerging research field where explorative detailed description of single implementation phenomena aiming at generating hypotheses for future research were particularly valuable and hence also appropriate. We must also bear in mind here that the first analytical and theoretical frameworks that caught the attention of policy scholars (Barrett and Fudge, 1981; Sabatier and Mazmanian, 1979, 1980; Van Meter and Van Horn, 1975) were not published until the mid-late 1970s and early 1980s. A couple of other earlier and similar efforts to provide analytical and theoretical roadmaps for implementation researchers (Bunker, 1972; Smith, 1973) were largely ignored in comparison. Another important finding reported earlier by Saetren (2005) in his more comprehensive literature survey is that the first generation of implementation research did not really start with Pressman and Wildavsky's seminal work published in 1973 or earlier in this decade. Thus, no fewer than 19 core journal articles with the title words implementation or implementing were published before 1970 of which four date back as early as 1956. The equivalent number reported by Saetren (2005) including near-core journals as well were close to 150. This means that the beginning of the first generation of implementation research must be back dated at least a decade to the early1960s (see Figure 1).
Articles with title words implementation or implementing published in core journals (political science, public policy and public administration/management) by year of publication. Absolute numbers per year.
Theory relevant features of articles in core journals by time periods
Percentage base: all articles. N=547.
Level of empirical analysis in core journal articles by time periods
Percentage base: all articles with some empirical data.
To sum up here, the first generation idea must be back dated at least a decade. Furthermore, it seems quite appropriate and correct to refer to what happened approximately before and after 1980 as the transition from a first to a second generation of implementation research. This can probably best be explained by the sudden growth of various analytical and theoretical frameworks during a relatively short time period from the mid-late 1970s to the early 1980s. Many of these and subsequent books were efforts to offer a first general analytical and theoretical framework for implementation research (cf. Hill and Hupe, 2009). That watershed event offered implementation scholars of the latter decade opportunities to pursue hypotheses in a more deductive manner that required other and more sophisticated research designs than those appropriate for inductive research approaches.
What about the third generation research paradigm?
Let us start with the area of research methodology where progress allegedly is most noticeable. There appears to be substantial validity to this claim. While the frequency of theoretical references in journal articles after 1990 is at about the same level as in the 1980 the likelihood to derive hypotheses from theoretical constructs and subject them to some sort of empirical testing has increased substantially from only 10% before 1980 to almost 50% in the 2000s. More important is the fact that the tendency to use of more sophisticated statistical techniques to analyse empirical data has increased even more during the same time period from 3% to 24% (see Tables 1 and 2).
Type of data in core journal articles by time periods
Percentage base: empirical articles.
Research design of articles in core journal by time periods
Percentage base: empirical articles.
The latter mentioned trend touches on the familiar discussion of the relative merits of qualitative versus quantitative research methodologies and is unavoidable in the assessment of scientific progress in implementation research. Our comments so far have been premised on the common assumption that certain research design features (e.g. comparative case-studies) are more valuable with respect to theory development than others (e.g. single case-studies). We generally believe that to be true, but not absolutely. Thus, the research methodology literature recognizes that the theoretical utility of even single case-studies can be substantially enhanced by careful selection of crucial or critical cases in terms of testing a particular theory (Eckstein, 1975; Gerring, 2007; King et al., 1994; Stinchcombe, 1968; Yin, 2009). Unfortunately, single case-studies of this kind are rare in implementation studies as elsewhere. For a few rare exceptions see Lundquist (2001) and Lundin (2007).
The ideal, of course, would be for implementation studies to combine the use of qualitative and quantitative methodologies. That is, unfortunately, still not the case. The more even distribution of these two research methodologies is not a feature of individual implementation studies but rather of the research literature as a whole. Nevertheless, we consider this more balanced relationship between the two major research methodologies at the aggregate level a substantial improvement compared to the earlier predominant position of one over the other.
Various types of comparative research design by time periods
Percentage base: all empirical articles.
This very slow almost non-noticeable progress in application of the comparative research design since the 1980s is disappointing. Nevertheless, it is no small achievement that slightly more than 50% of policy implementation studies nowadays have a comparative approach. Whether this is good enough in terms of the third generation research paradigm is a matter of judgement. Comparison per se is no panacea. It is well known that the usefulness of comparative studies in answering research questions depends critically on careful selection of cases in terms of substantive comparability as well as selection of variables, dependent as well as independent, of theoretical interest (Ragin, 1987). Unfortunately, these requirements often receive insufficient attention in many comparative studies, rendering them less useful than they otherwise could have been.
The longitudinal research design is another defining feature of the third generation research design. It was thought of as an antidote to the cross-sectional research instruments like questionnaires and surveys, usually with no inherent time and process dimension in the data gathered. The main trend here is as disappointing as with respect to the comparative research design: no increase in its application after 1990.
Why is this so? We will offer an answer below.
The issue of theoretical progress or rather lack thereof in the implementation literature has been dealt with thoroughly by a number of implementation scholars in recent years (Hill and Hupe, 2009; Matland, 1995; O'Toole, 2000, 2004; Winter, 2012b). There is broad consensus behind the view that no general implementation theory is close at hand. The third generation research design paradigm does not privilege one particular theoretical approach at the expense of others. But it does imply that a theoretical explanatory orientation towards analysis of policy implementation is preferable to a-theoretical and merely descriptive approaches. Hence, we have limited our assessment here to register what type of theoretical constructs have been in vogue during the last decade or two with a view of commenting on differences and similarities to previous decades.
O'Toole (2000: 273–282) in his assessment identified a handful of broadly defined theoretical approaches as particularly promising for future implementation scholarship: (1) rational choice institutionalism, (2) the governance approach, (3) networks and management, (4) formal rational choice models and (5) policy design and instrument approach. How do these predictions hold up to our empirical results?
We have merged the institutional approach, which is the first discussed by O'Toole, with another he does not mention – some forms of political and policy regime theory (Stoker, 1989) – though not always called so by their authors. The main reason for this is that both approaches tend to refer to some macro features of the political-administrative units of analysis. Institutional approaches focusing more on the meso-level of the political-administrative system and their constituent organizations we have grouped under the label organization theory as they represent new and old versions of the latter type of theory (March and Olsen, 1984; Ostrom, 1999; Powell and DiMaggio, 1991; Selznick, 1957). We have also merged the networks and governance approach, which O'Toole discuss separately, with a one-sided bottom-up approach, partly because of their emphasis on inter-organizational actor networks and partly because of pragmatic reasons related to relative small n.
Table 15 reports our results with respect to theoretical approaches discussed more generally in e.g. non-empirical articles and Table 16 where theoretical constructs are subject to some sort of empirical resting. The macro-level institutional and regime approach is the most prominent theoretical framework mentioned more generally and subjected to empirical testing, both overall and especially during the last two decades. The networks-governance approach has experienced the greatest increase in popularity since 1990 in the latter respect sharing a second place ranking (Table 16). Organization theory was more prominent in the implementation literature during the 1970s and 1980s but has receded substantially as a preferred theoretical framework since then. The policy design approach was initially much mentioned in the implementation literature at large (Table 15) but less so after 1990. However, in the literature where efforts are made to confront theoretical constructs with some empirical data it is almost untraceable. Hence, it seems to have suffered the same decline in popularity as the organizational approach (Table 16).
The prominent role of macro-institutional and governance-network theories in the implementation literature after 1990 clearly reflect a broader trend evident in political science. Thus, O'Toole (2000) seems to have been on target with at least three of his five favourite theories but with one important qualification. It is not his preferred rational choice type of institutionalism that implementation scholars refer to and use. Rather it is another less rational version expounded by scholars like e.g. March and Olsen (1984), Hall (1986) and Steinmo et al. (1992). In the same vein the use of more formal, deductive rational choice models, which O'Toole seems to sympathize with as a potential promising parsimonious explanatory framework is scant or almost non-existent in our sample of the implementation literature.
As a supplement to the foregoing analysis we have looked at to what extent one of the most clearly operationalized top-down analytical frameworks formulated by Mazmanian and Sabatier in various versions (1981, 1983) have been actively used by implementation scholars in our sample. The change from playing a dominant role before 1990 to being one among several, and not the leading anymore after 1990 is an interesting development. The latter is consistent with another observation made by O'Toole (2000) and Winter (2012b) namely the gradual silencing of the protracted and unproductive debate between top-down and bottom-up scholars that plagued much of the 1980s as the interest of implementation researchers have shifted towards integrating and synthesizing analytical and theoretical frameworks offered by several scholars during the last two decades (Linder and Peters, 1987, 1990; Matland, 1995; Ryan, 1996; Sabatier, 1986; Winter, 1990, 2012a).
Considering the greater interest in integrative and synthesizing frameworks, it is something of a paradox that we a good decade into the 21st century observe more major contending theoretical frameworks rather than fewer (Table 16). There is no agreement on whether this constitutes progress or not. O'Toole (2000: 268) seems to deplore the multitude of different theoretical constructs in implementation research as he observes that ‘what has not happened is a careful winnowing of the mass of potential explanatory variables towards parsimonious explanation'. Winter (2012b: 265) on the other hand in his recent review of the literature argues that it is premature to expect any scholarly agreement on a common analytical and theoretical framework and that diversity in this respect is more a strength than a weakness. Thus, he suggests that implementation research can be improved by ‘(1) accepting theoretical diversity rather than looking for one common theoretical framework and (2) developing and testing partial theories and hypothesis rather than trying to reach for utopia in constructing a general implementation theory’.
Regardless of these different opinions by two leading contemporary implementation scholars we choose to interpret our empirical observations as consistent with the claim, which stated that there has been more progress on methodological than theoretical issues.
Type of research design and type of data used
Percentage base: all empirical articles of respective type.
Hypotheses formulation by type of data used
Percentage base: empirical articles.
Regional origin of authors or their studies in core journal by time periods
Percentages add to more than 100% since some articles have authors and or data from more than one region. Percentage base: all articles.
Research design of articles in core journals by region of focus or author
Percentage base: all empirical articles.
Different comparative research design in core journal articles by region of focus or author
Percentage base: all empirical articles.
Different comparative research design in core journal articles by region of focus or author
Percentage base: comparative studies. Since categories are not mutually exclusive percentages add to more than 100.
Type of data used in studies by region of focus
Percentage base: empirical articles.
Types of data used in comparative studies by region of focus
Percentage base: articles with comparative focus.
Theory relevant features of core journal article by region of focus or author
Percentage bases of all articles.
Percentage bases of only those empirical.
Types of theories referenced in core journal articles by time periods
Percentage base: articles with some theoretical references.
Types of theories or analytical frameworks most frequently subjected to some form of empirical testing (i.e. both statistical and non-statistical)
Percentage basis: all articles with some form of empirical testing.
Another noticeable difference is the much larger share of comparative studies that are cross-national in Europe (46%) compared to those in the US. In the US, however, this is to some extent compensated for by an equally large share (51%) being comparative across states (Table 10). This reflects, of course, the different political organization of the two continents being compared.
Conclusion
O'Toole (2000: 283) concluded that ‘the study of policy implementation is in many respects in a relatively mature stage of development’ (emphasis added). Among the most important indicators in this respect he mentions the fact that inductive approaches have been increasingly supplanted by more deductive hypotheses testing approaches. As already noted above this is the most consistent positive trend in our investigation of the sampled implementation literature towards the requirements of the third generation research paradigm from its beginning that has lasted well into the 2000s. It is also consistent with recommendation by Winter (2012b: 265) about the best way to enhance theory development in implementation research: to test partial theories and related hypotheses. With few exceptions most of these achievements were made already in the 1980s. Progress since then has been much slower and incremental. Thus, our analysis confirms the conventional first- and second generation description of the evolution of implementation research up to the 1990s as reasonably accurate. The slow and incremental advances made since then bolster the impression of several reviewers in later years that many important defining features of the third generation of research are still far away from being implemented. The explanation for this implementation deficit is as suggested by way of introduction to a large extent due to the very demanding nature of the third generation research design and, as we discovered, some inherent dualities and tensions between some of its essential features that make it hard to optimize them all simultaneously. Nevertheless, this has not made the third generation research paradigm useless and irrelevant for analytical purposes as we have used it as a benchmark to measure more or less progress towards it as an ideal typical analytical construct on some of its salient features.
Our main conclusion is on the whole not much different from those of the other most recent reviewers. We have confirmed to a large extent their more or less intuitive based but still quite accurate assessments. What distinguishes ours from theirs lies primarily in documenting and providing more detailed and nuanced accounts of allegedly mixed results through an empirical and up-to-date investigation of the research literature. Thus, we agree with previous reviewers like O'Toole (2000) and Winter (2012b) that much has been achieved but also that a lot remains to be done. Hopefully, our empirical investigation has helped provide more detailed information on both accounts.
