Research Governance and the Role of Evaluation

Abstract

Through a comparative study of the United Kingdom and Spain, this article addresses the effect of different research governance structures on the functioning and uses of research evaluation. It distinguishes three main evaluation uses: distributive, improvement, and controlling. Research evaluation in the United Kingdom plays important distributive and improvement roles while the Spanish evaluation system plays, mainly, a controlling function and a minor distributive role. The differences that the article identifies should not be attributed to alleged different positions of the two countries in a putative research evaluation learning curve. Evaluation practice fits its national research governance structure.

Keywords

research policy research evaluation governance Spain United Kingdom

The role of evaluation within the policy process is influenced by the governance structure within which it is inserted. Although this point has long been recognized, its implications for evaluation practice and learning are often overlooked. This article analyzes the importance of the research governance structure for the way in which research evaluation is conducted. To this end, it will compare the academic research evaluation practices in two contrasting research systems (the United Kingdom and Spain).

The analysis builds on a specific strand of comparative studies of research evaluation systems. Although the literature has identified some common trends across countries in terms of the introduction of more systematic reporting and regular monitoring, and the inclusion of research evaluation into broader national evaluation systems (Cozzens & Turpin, 2000), comparative studies have tended to identify substantial differences across national research evaluation systems. These have been analyzed by differentiating evaluation approaches, assessing their merits (Coryn, Hattie, Scriven, & Hartmann, 2007; Geuna & Martin, 2003) or presenting differences in the way they have evolved (von Tunzelmann & Mbula, 2003). A different strand in the literature has investigated the reasons behind this diversity of evaluation practices. A straightforward argument is that the proportion of research that is evaluated and the way in which it is evaluated is a direct function of how resources are distributed. For instance, if a country increases the percentage of research funds that are distributed through competitive mechanisms the proportion of research that is evaluated increases; on the other hand, a funding structure based on block funding would not be conducive to evaluation practice (ab Iorwerth, 2005, p. 18–19). The evidence presented in this article is however more complex, as many countries display systems mixing block funding with project-based competitive allocations, and some block funding strategies are grounded on performance evaluation. Previous comparative research has offered a more nuanced view, including broader political and administrative considerations in the analysis of different approaches to research evaluation. Gibbons and Georghiou argued that the different approaches “reflected” the different political and administrative cultures of the countries concerned (Gibbons & Georghiou, 1987, p. 35). Reporting on a comparative set of studies Georghiou (1995, p. 4) attributed the differences to three factors: (1) the state of development of the research infrastructure; (2) the organization of science; and (3) the general governance practices beyond the research domain. In a study on Spain, part of the collective work in which Georghiou developed this argument, Sanz-Menéndez argued that “the evolution of research evaluation activities or practices could be viewed as embedded in the institution for governance of the R&D system and in the general characteristics of the system for making public policy” (Sanz-Menéndez, 1995, p. 80).

This article reconnects with this strand of literature focusing on how research governance structures affect the research evaluation system. Although “governance” is a contested and highly ambiguous term (Jordan, 2008), I am using here because it conveys the notion that the way a society or an organization is ruled goes beyond the formal institutions and processes of government. One problem we face is that similar laws and regulations may not have the same implications in different countries and may not be an indicator of policy convergence; in other words, visible measures may not provide an adequate picture of “policy” (Féron & Crowley, 2003, p. 371). By governance structure, I mean all the processes through which public policies are defined and implemented, the actors involved in these processes, and the relationships among them. The relations of authority among actors and the administrative processes used in the management of research are both part of the governance structure, but the notion of governance focuses our attention beyond the administration of academic research and underlines the variety of relevant actors, and their wide-ranging and sometimes fluid organizational affiliations. As Féron and Crowley note, a governance perspective fits two levels of analysis: the level of the state itself, “defined institutionally,” and “the level at which research is actually performed” (Féron & Crowley, 2003, p. 374). These levels are not self-contained areas separated by sharply defined boundaries: Through mechanisms like peer-review individual academics perform research and also operate as members of the institutional arrangements through which research is managed. The evaluation system is one of the bridges linking these two levels.

By research evaluation system, I mean all the activities and practices related to the systematic determination of the quality or value (Scriven, 1991, p. 4) of research activities (whether proposed, underway, or already conducted), and of the individuals, institutions, and organizations involved in such activities. This is a broad concept. It includes the evaluation of the scientific outputs of an activity or set of activities when the main objective of the evaluation is to assess such activity or those who have performed it, rather than the scientific merit of the outputs in themselves. It extends to the ex ante and ex post evaluation of projects and programs, and of those who perform them, be it individuals, institutions or groupings of the former. Therefore, the research evaluation system includes very different types of activities: it is not the same to assess a research project proposal or an individual’s research accomplishments, to evaluate the results of a program or to assess the research performance, however this may be defined, of a whole country. Evaluation takes place at different levels, but the concept of “research evaluation system” is necessary because the way evaluation at one level is organized is not independent from how evaluation at other levels is conducted; the different levels are related with each other (Darvas, 1997, p. 18). Consequently, a research evaluation system can be characterized by the evaluation procedures at different levels and the links across them. It has been argued, for instance, that research evaluation is becoming more complex as it involves assessments at different levels generating new types of assessment systems and procedures (Frederiksen, Hansson, & Wenneberg, 2003, pp. 161–162). In contrast, the Spanish system has been described as one in which individual evaluations remain prevalent, being more important than organizational evaluations (Cruz-Castro & Sanz-Menéndez, 2008). In short, to analyze a research evaluation system, we have to study how different evaluation levels are addressed and combined.

It has been argued that evaluation systems can play a central role in the definition of new institutional frameworks for research; for instance, in a review of science evaluation practices in Eastern Europe after the end of the Cold War, evaluation emerges as a central building block in the efforts to build a new institutional framework for research: the move from a hierarchical system based on block funding in the communist era to a competitive structure where grants could be awarded on the basis of merit required the implementation of an evaluation system based on peer review (Frankel & Cave, 1997). From this perspective, an evaluation system can help construct a new research governance structure: evaluation becomes one driver of institutional change. It follows that the form of evaluation adopted can help define the nature of the policy process into which it is inserted. As a factor for change, evaluation emerges autonomously from extant institutional structures and evolves to shape them. This view is consistent with the argument that countries can advance through different evaluation culture stages, progressing through increasingly sophisticated evaluation systems (Toulemonde, 2000). The policy message that can be derived from this argument is simple: countries with comparatively less experience of evaluation should adopt the practices of countries with more developed evaluation cultures. The aim to improve the “transfereability” of “best practice” is at the center of some comparative studies of evaluation practice. Even when recognizing that “broad contextual framework conditions” are critical success factors for the transfer of “best practice” (Teirlinck, 2011, p. 20), analysts sill contend that “the heterogeneous visions and practices” in evaluation and impact assessment can be addressed through “further conceptual and practical harmonization” (Teirlinck, 2011, p. iii).

This message has further practical implications: Countries that are relatively newcomers to the field of evaluation are importing evaluation methodologies and their accompanying foreign experts and consultants, to help them develop and implement evaluation strategies, paying little consideration to the research governance structure within which evaluation operates. It is argued, for instance, that in countries like Spain, the evaluation culture is lagging, and this lag is attributed to a dearth of evaluation experience, lack of formal training in evaluation for professionals and civil servants, and lack of formally established evaluation standards (Bustelo, 2006).

In contrast with this line of argument, this article stresses instead that the practice of research evaluation will be influenced by the broader research governance structure. Although governments may try to manage public science through the introduction of different evaluation systems (Whitley & Glässer, 2007), their implementation will take place within a governance structure that will affect how evaluation is implemented in practice. Political agency can influence the way evaluation is conducted but it is not its only driver and may not necessarily determine it.

To show how the research governance structure affects the research evaluation system I will compare two countries with different research governance structures, the United Kingdom and Spain. Noting the diversity of research evaluation practices, I propose three main dimensions along which to compare the approaches to evaluation in different countries, and then use them to compare the research evaluation practices of the United Kingdom and Spain.

Research Evaluation and its Purposes

A Statement of the Problem

The term research evaluation covers a very varied set of activities, can be applied to different evaluands and can be carried out for many different purposes. This diversity is obviously not unique to the field of research evaluation: The variety of tasks that can be conducted under the label of “evaluation” has even been identified as a problem facing the evaluation community. Evaluation has been called a polysemous notion covering many different forms and activities, heterogeneous practices and “unstable” theoretical frameworks and epistemologies (Lascoumes, 1998). This diversity complicates comparative analysis and opens the door to possible misunderstandings as the same word is used to refer to different practices. To establish a basis for comparison, we need, in addition to the common definition of research evaluation system offered above, a way of structuring a comparison of different approaches to evaluation. This is developed in the following section.

Different Approaches to Research Evaluation

A comparative analysis of research evaluation practices requires a framework that can help structure the practices studied. Any comparative analysis needs to address the participants in the different evaluation activities that shape the evaluation systems under study, their main activities and the reasons why they undertake them. We need to distinguish between the subject of the evaluations (the evaluands), who is responsible for organizing and performing them, and the uses made of their results. The nature of the activities (what is being done) will be addressed when describing who implements them.

The subject of the evaluation: the evaluands

Research evaluation can involve different evaluands. The focus may be on individual researchers, groups of researchers, whole institutions, research projects, groups of projects “wrapped” in a program, research support policies, or on the research system as a whole. Obviously, the activity to be conducted and the potential uses of the evaluation task differ greatly depending on the evaluand. However, the overall approach to research evaluation is bound to “favor” some sets of evaluands to others. It has been argued that a systemic approach to evaluation should involve a multilevel strategy addressing, in a planned and structured way, all levels of the innovation system (Arnold, 2004). This is a normative statement, but in practice, different countries are likely to strike different balances in the composition of the evaluands they primarily address in their research evaluation practice.

Who is responsible for organizing and performing the evaluation?

Evaluation work involves many participants playing different roles: clients, designers, coordinators, caseworkers, respondents, technical and information specialists, evaluation trainers, researchers, and developers, and “metaevaluators” (Stufflebeam & Shinkfield, 1985). Importantly, not all these different roles will be present in every evaluation and many of the tasks can, at times, be carried out by a single individual or group of individuals. In all cases, however, there must be a “client” for the evaluation (somebody who commissions the work), somebody who defines the work to be done, somebody who carries out the work, and a potential user. Again, the same group of individuals may undertake these roles or there may be a division of work. Further, the position of those carrying out the evaluation in relation to what is being evaluated can affect whether and how the evaluation results are used; for instance, analyzing the early years of research program evaluations in the European Commission, Ben Martin concludes that if the evaluators are close to the organization in charge of implementing the program under evaluation, the evaluation results are more likely to be considered in policy definition and implementation but the evaluators are also more vulnerable to political pressure (Martin, 1997).

Who should be involved in evaluation is an enduring question in science policy. The role of peer review remains for many a key-defining component of research evaluation; what this means in terms of roles is that, at least, the design of the evaluation and its implementation should be carried out by scientists knowledgeable in the relevant field. Peer review is typically applied to the evaluation of projects and their outputs, the academic performance of individuals and to the assessment of the value and merit of research results but can be extended to the evaluation of programs or whole policy strategies. The scope of academic peer review and the extent to which it should be complemented with the contributions of other evaluators has been the subject of much debate. At one end of the spectrum is the argument that, although research can be funded with a strategic end in mind, scientific activity should not be steered from outside the scientific community. As Polanyi stated any “attempt at guiding scientific research toward a purpose other than its own is an attempt to deflect it from the advancement of science” (Polanyi, 1962, p. 62). In this “republic of science,” the governance of science is best left to scientists. From this perspective, research evaluation is understood mainly as the assessment of scientific merit and quality by peers and the participation of “outsiders” is regarded with suspicion. The underlying assumption is that scientific research will lead, incontestably but in ways that cannot be predicted or controlled, to the generation of new knowledge, a public good from which technological advances and eventually socioeconomic development will grow.

The challenge to this perspective came from several fronts and has gradually increased since the 1970s. The notion of “strategic science” combines the long-term view of the socioeconomic returns to research investment with a call for expected relevance (Rip, 2003, p. 35). When expected relevance is added to the set of criteria that guide public investment in research, external assessment becomes necessary to take into account the relevance of the scientific activities. This change, as Rip notes, has important implications for the evaluation of science: Evaluation needs to identify both the expected and the unexpected impacts of R&D (a notoriously difficult endeavor) and has to deal with new dimensions of research and broader stakeholder communities. Evaluation thus becomes a key component in the “new social contract for science” (Rip, 2003) and must involve stakeholder communities beyond the scientists themselves. Professional evaluators are also required as the challenges posed by the identification and assessment of research impacts call for specialized expertise. The resulting evaluation systems will tend to combine the contributions of scientific peers (experts evaluating), evaluation professionals (evaluation experts), and other stakeholders; the way these combinations are made will vary across countries and across organizations. Determining and comparing who performs the main roles in research evaluation activities, and in particular where and how peer scientists and evaluation professionals participate in them, will be used to characterize and differentiate research evaluation systems.

The uses of evaluation

The new “social contract” implies not only a widening of the groups involved in research evaluation but also of its goals and uses. In the “republic of science,” the policy contributions generated by peer-review evaluation are mainly geared to determining the distribution of resources through the selection of meritorious projects worthy of funding or of individuals to be employed as researchers in academic organizations. When new actors get involved in research policy and its evaluation, and there is concern about the applied value of research outcomes, the potential roles of research evaluation expand to cover the many different uses identified by mainstream evaluation literature. These include, among others, the legitimation of policies and past initiatives, the provision of an accountability mechanism to the legislature and the public, the support of policy formulation and design, the allocation of resources, the provision of evidence to implement operational improvements, the provision of a negotiating arena for different stakeholders, a way to support broader policy participation by providing a forum for debate, the development of agreement on policy goals and strategies, and the provision of policy management data and tools.

This variety has brought about a debate about the relative merits and viability of the various purposes (Chelimsky, 2006, p. 33). Many evaluators argue that, out of its many potential purposes, the essential function of evaluation lies in its contribution to organizational learning (Rich, 1979, p. 80), or the improvement of the planning, implementation, and effectiveness of programs (Chen, 2005, p. 3). This view of evaluation as an activity that contributes to learning and improvement is often preferred by evaluators and constitutes a defining characteristic of the “formative” (Scriven, 1967) approach to evaluation. This preference, however, does not remove the variety of potential uses and analysts typically summarize them into a few categories, either as a way to structure the analysis or as a conclusion of an empirical enquiry into the practice of evaluation in a specific context. An example of the latter in the field of research evaluation is the classifications that Cruz-Castro and Sanz-Menéndez offer when analyzing the Spanish case: The two main uses of evaluation results they encounter are as a steering and management tool and as a distributive process to allocate resources (Cruz-Castro & Sanz-Menéndez, 2008). These categories may be adequate to describe a specific context but do not encompass all the uses listed above; significantly, they do not cover the potential use of evaluation as a learning tool.

Coryn proposes a more comprehensive classification of fundamental purposes for evaluating research: accountability and efficiency; resource allocation; improvement; synthesis; and decision making (Coryn, 2007, p. 73ff.). These categories are comprehensive but, as the author points out, partially overlapping. To structure a comparative analysis, it would be more helpful to use fewer, more sharply delineated categories. An example of a generic classification that is often used is the distinction between three main uses: improvement, accountability, and enlightment (Stufflebeam & Shinkfield, 1985. p. 7). Improvement refers to the provision of information to assure the quality of a service or improving it. “The second main role of evaluation is to produce accountability or summative reports” (Stufflebeam & Shinkfield, 1985. p. 7). Finally, the information produced by evaluation efforts can be used to evolve and test theory; this is equivalent to the ascriptive purpose of evaluation in which its function is merely the generation of knowledge. This distinction has been broadly used but, for my purposes here, it has some limitations. First, my focus is on the policy use of evaluation. The enlightment or ascriptive purpose is important but it is not directly applied to the policy process; as such it falls beyond the scope of this article. Second, although accountability is present in many of the categorizations of evaluation uses, it is a problematic concept. Accountability is widely recognized to be a key policy use of evaluation, often in contradistinction with learning or improvement functions (Anderson, 2002; Kuhlmann, 2003). The role of accountability has been argued to be particularly important in research policy. The introduction of university research evaluation systems is said to reflect “global demands for greater accountability” (Geuna & Martin, 2003, p. 277). Yet, as with many overused terms, accountability has acquired a rather vague meaning. Stufflebeam and Shinkfield equate accountability to the production of “accountability or summative reports” (Stufflebeam & Shinkfield, 1985, p. 7). This is a somewhat circular way of going about things but reflects the dictionary definition of accountability as the situation of being obliged to report, explain, or justify something (being responsible is another, very different, commonly accepted meaning). Instead of asking the objective of such reporting, the emphasis is placed on the means or implications of such accountability-as-reporting. In the field of research evaluation, it has been argued that evaluation can be used to “increase the accountability of researchers, policymakers and funding bodies, by making the research process, its outputs and impacts more transparent” (Marjanovic, Hanney, & Wooding, 2009, p. 6); in other words, accountability becomes tantamount to transparency. Yet, other authors have perceived a tension between accountability and autonomy (Cozzens, 2003), thus implicitly equating accountability with auditing and control. Transparency and control are very different things, but the problem does not end here. I have just mentioned that another accepted meaning of accountability is “being responsible”; from this perspective, other authors have argued that a culture of accountability should empower individuals and eliminate control systems (Lebow & Spitzer, 2002). The same term is therefore being used to refer to the imposition of controls and to their elimination.

The preceding discussion serves as a justification of why, if we want to consider the different uses of evaluation as one way of comparing different evaluation systems, we need a slightly different classification of uses from the ones normally found in the literature. My suggestion is to focus on the policy tasks to which an evaluation activity can contribute; this is a simple approach but delivers a clear classification of different uses. Any policy, at any level of aggregation will require:

Resources to be allocated.

Activities to be conducted to reach the policy goals.

Controls on the application of the allocated resources to the activities to be performed.

Policy practice may invest differential attention to each of these tasks, and, accordingly, evaluation practice may contribute differentially to each of them, resulting in three main purposes:

A distributive use will seek to inform or determine the distribution of resources across the potential actors and beneficiaries of a specific policy or program. The allocation of resources can be decided according to the merit attributed by the evaluation to different individuals, groups, or organizations. Examples of this type of evaluation include, but are not limited to, the ex ante evaluation of research projects, and the distribution of rewards to individuals or groups that have done well according to performance assessments based on preestablished criteria.

An improvement use will focus on deriving lessons from the past experience to adapt the activities conducted to what evaluation studies will conclude is better practice. The improvement purpose is therefore relying on the existence of feedback mechanisms and the operational flexibility needed to function as a learning organization.

A controlling use will scrutinize how organizations and individuals use public resources to carry out activities to achieve public policy objectives. It focuses on the direct audit of how resources are spent. In the case of research policy, the controlling purpose will typically focus on the analysis of inputs and the audit of direct research outputs, and will fit with traditional bureaucratic models of administration.

Research Evaluation From a Comparative Perspective

The United Kingdom

The evaluands

The U.K. research evaluation system is directly affected by the way in which academic research is funded. The United Kingdom is characterized by a “dual-support” funding structure composed of (1) a stream of core funding administered by the Funding Councils, allowing universities to fund infrastructural investments and support long-term, open-ended research strategies and (2) funding for clearly defined, time-bounded specific research initiatives (projects, centers, etc.) administered through the Research Councils.

Core funding is distributed according to a formula approach that allocates money to universities according to their past research performance. The ratings that the “Research Assessment Exercise” (RAE; which is being replaced by the Research Excellence Framework [REF]) awards to universities for 67 (in the 2008 RAE) fields of research (called Units of Assessment) are key parameters in the formula. Typically, the assessment exercises have taken place every 4–7 years and have addressed the research performance of university departments; these are therefore the main subjects of this large periodic evaluation exercise. The size of the exercise, the resources it consumes and its centrality in the U.K. university and research policy have grown overtime (Barker, 2007; Martin, 2011; Martin & Whitley, 2010) and have placed the university department as a main evaluand. Submissions to the “exercised” are prepared at departmental level and it is, ultimately, the work of individual academics that is assessed (more than 50,000 in the last exercise). In this way, the whole of the U.K. academic world down to the individual researchers are affected by the assessment, although it must be noted that this is not an exercise that intends, at least directly and formally, to assess individuals.

The second stream of financial support, project funding, uses a variety of instruments spanning from individual doctoral grants to funding for specific research projects, programs bringing together several related projects, and multimillion pound, multiyear research centers. These activities of varying size and scope are the units that are being evaluated, either before funding or ex post through a variety of impact assessment and evaluation initiatives, which are briefly described in the following section.

Organization and performers

The two streams of funding described above bring to the evaluation tasks different sets of participants. The RAE/REF evaluation approach is managed by the Higher Education Funding Councils in the various U.K. regions that are also in charge of defining and implementing the funding instruments. Actual assessment is the responsibility of panels (one for each of the 67 Units of Assessment in the 2008 RAE) of academics and a few experts from industry or government. The panels review the evidence of scientific production and value presented by the universities according to the guidelines defined by the Funding Councils but each establishes their own criteria and working methods. It must be noted that, although the process aims to assess the value of the scientific production reflected in the submissions, this is not a pure peer-review system. The structure of the exercise and its main criteria and guidelines are defined by the Funding Councils with a determining contribution from their officials. The assessments themselves are carried out mostly by academics but the majority of panels include one or two experts from industry or government (of the total 10–20 members per panel).

The second stream, project funding, is managed mainly by the U.K. Research Councils. Again, the organizations in charge of policy implementation are responsible for evaluation. Ex ante proposal appraisal and ex post assessment of final reports is through a peer-review system organized by the Research Councils than can include, again, nonacademic experts. Reviewers’ comments are usually detailed and are distributed to project applicants; some Councils allow applicants to respond with comments before funding decisions are made. In addition to the necessary appraisals of projects, the Research Councils have implemented substantial ex post evaluations focusing on the impacts of specific investments. These are often carried out under contract by specialist consultants. Research Councils United Kingdom, an organization that brings together the main U.K. academic research funding organizations, has a “Performance and Evaluation Group” that is responsible for “providing strategic direction on all issues relating to evaluation and benchmarking including the evaluation of Science Budget investments in research, training, knowledge transfer, science and society activities and operational performance” (http://www.rcuk.ac.uk/aboutrcuk/executivegroup/subgroups/ pegroup.htm). Among other objectives, the group seeks to coordinate the evaluation activities of the different Research Councils and share best practice. Within the Research Councils, there are different groups in charge of different evaluation tasks; for instance, in the Economic and Social Research Council (ESRC), ex post evaluation is organized by the Research Evaluation Committee, and in the Engineering and Physical Sciences Research Council a similar task is performed by the “Performance and Evaluation Team.” These departments are in charge of commissioning ex post evaluations of their organizations and have paid considerable attention to the development of evaluation research methodologies, which are generally based on building a detailed understanding of the processes through which impact takes place. For instance, a review of the economic impact assessments commissioned by the Research Councils yields dozens of publicly available reports that include methodological reflections or novel methodological developments (Luiz de Campos, 2010). As a consequence, a competitive evaluation marketplace has evolved with a number of consultancy companies and university groups and departments actively vying to provide evaluation services to the various Research Councils.

In short, research evaluation is the responsibility of organizations in charge of policy implementation and is carried out in a decentralized way, involving mainly academic peers in the case of project appraisals and often independent, paid consultants for the ex post impact assessments.

The uses

The main role of the U.K. research evaluation system is distributive, and there is also a secondary, but still important, improvement use. The RAE/REF exercises are very large evaluation activities the results of which determine the distribution of core research funding to U.K. universities. The ex ante assessment of research proposals also plays a strong distributive role. Significantly, a large share of these grants is used to fund personnel costs: Council-funded researchers contracted to carry out specific research projects are an important component of the U.K. academic system. An improvement purpose is also present in the ex ante proposal evaluation routines. Peer reviews of proposals tend to be detailed assessments and are always distributed to the researchers concerned. Although their primary role is distributive (to support decisions related to the allocation of funds), they are delivered in such a form that researchers can use the information contained in the assessments to derive lessons for future proposals, and therefore, to adapt their research strategies. Ex post evaluations to assess the impacts of Research Councils’ investments and the processes through which these impacts materialize play an improvement and distributive role: They seek to acquire information on impact processes and to use this information to inform the design of research support programs, and the results can be used in the process of arguing for future budgetary allocations.

Spain

The evaluands

The Spanish research system is characterized by the prominence of core funding channeled through the salaries paid to tenured academics working in public universities and several public research establishments, the most important being the Spanish Council for Scientific Research (Consejo Superior de Investigaciones Científicas [CSIC]), which employs some 2,200 full-time tenured researchers (the Spanish Council for Scientific Research is a research performance establishment while the U.K. Research Councils are funding organizations). It is common for early-career academics to orient their efforts toward obtaining a tenured position and the stability that comes with it. Within this structure, the Spanish research evaluation system revolves around the assessment of individuals. Formal procedures rule the progression of individuals through the different stages and accreditation levels that allow the academic to enter the public sector as a civil servant. Once they have obtained the coveted tenure position, the main tool for their evaluation is the “sexenio” process: All Spanish tenured academics can submit, every 6 years, evidence of the results of their research activity. A specialized agency (see below) is in charge of assessing whether the evidence (crucially a list of the five most relevant publications during the period) shows that the individual has been “research active”; if so it awards a “sexenio,” an official confirmation of research activity that carries with it a modest permanent salary increase (Cruz-Castro & Sanz-Menéndez, 2008; Jiménez-Contreras, de Moya Anegón, & López-Cózar, 2003). Overtime, the “sexenio” has started to be seen as a basic assessment of quality. Having research activity recognized by the award of a sexenio has become a de facto precondition for promotion and participation in selection committees. The “sexenio” system is today one of the main tools in the catalog of Spanish academic evaluation processes.

Project evaluation plays a much less important role than in the United Kingdom but still consumes sizable resources. Spanish funding of academic research projects revolves around the “National R&D&I Plan.” Since 1988, the National Plan been the main instrument establishing the goals and priorities of Spain’s Research, Development and Innovation policy (Ministerio de Ciencia e Innovación, 2009). The Plan includes a National Program for Fundamental Research Projects, which is the main central government instrument to fund research projects submitted by academics. It funds a large number of projects submitted by academics from the university system, CSIC, and other groups. The projects cover mostly marginal costs and occasionally doctoral grants associated with research projects. This is a very important program in terms of the volume of projects submitted and funded and makes of the project an important evaluand in the Spanish research evaluation system. However, the role and importance of the project is different from the United Kingdom. It is rare for projects within this program to fund the salaries of researchers other than doctoral students and therefore, unlike the United Kingdom, research projects funded by public national sources are not an important source of employment for researchers. Projects are very small: An average 2–3 years project in the social sciences involving a sizable research team will usually receive an average budget between US$26,000 and US$40,000, the largest projects will receive no more than €155,000 (in comparison similar projects funded by the U.K. ESRC will typically receive more than US$250,000, and the funding given to programs and centers will run into several million Dollars). The projects are however important in the research system because participating and leading them is considered a necessary merit for job promotions, and they are the main source of research funding open to Spanish tenured academics to face the marginal costs of research projects and attendance to conferences and meetings.

Performance evaluations at group level are sometimes organized by universities or research organizations and practices differ across institutions. These assessments are part of the internal management strategies developed at organizational level. Some universities, for instance, evaluate the research and teaching performance of their departments and research institutes using batteries of indicators. CSIC has also launched its own internal evaluation system addressing the performance and research strategies of its institutes. Every 4 years, the whole organization develops a new strategic plan informed by the individual plans drafted by all CSIC research centers and institutes.

There is also an annual evaluation exercise that takes as its subject the whole of the National Plan. The so-called Integral Monitoring and Evaluation System (SISE) has produced five SISE reports since 2005, mainly based on descriptive statistics: number of calls for tender versus planned calls, budgets planned and spent, publication dates of calls, number of proposals submitted and funded, distribution of projects and funds across ministries and regions, and so on. There are no evaluations of specific program outputs or outcomes; the National Plan sets very general targets for the whole of the Spanish innovation system, many of them referring also to inputs to the innovation system (percentage of R&D/gross domestic product, growth in firms’ R&D, etc.). These main indicators are monitored by SISE and if the country as a whole does not reach the levels set up by the National Plan as “targets”, this is interpreted as a failure of the Plan; however, the extent to which a set of policy interventions might be expected to affect nation-wide indicators, and how, is not analyzed.

Organization and performers

In contrast to the United Kingdom, the evaluation tasks described above are the responsibility of a set of specialized evaluation agencies and foundations. The individual evaluations that underpin the “sexenios” are carried out by the “National Commission for the Evaluation of Research Activity” (CNEAI). CNEAI is staffed by a rotating group of seconded academics appointed for fixed periods, therefore deploying a form of peer review.

At the project level, the planning and evaluation of the programs that constitute the National Plan is the responsibility of the Ministry of Economy and Competitiveness, which in 2011 absorbed the Ministry of Science and Innovation. The organization that the Ministry entrusts with the evaluation of proposals and the monitoring of projects is another agency: the National Evaluation and Prospective Agency (ANEP). ANEP reports to the Directorate General for Research and Management of the National Plan in the Ministry of Economy, organizes the peer-review evaluation of research proposals, and reviews the interim and final project reports. It provides similar evaluation services to other ministries, regional governments, foundations, and universities. ANEP is also led by seconded academics and has a lean core organization of some 190 seconded experts organized into 26 coordinating teams. They draw on ANEP’s database of more than 30,000 academics to deal with approximately 25,000 funding applications per year to National Plan programs and other initiatives launched by central and regional governments and private foundations. Like CNEAI, ANEP is an agency that organizes a peer-review system.

The autonomous evaluations carried out by universities and research organizations vary across institutions but are also dominated by forms of peer review. At CSIC, for instance, every institute presents its 4-year strategic plans that are then assessed by committees of foreign peer reviewers who can suggest changes to the objectives and targets defined. The process is organized internally.

The evaluation of the National Plan through the “Integral Monitoring and Evaluation System” (SISE) is carried out by the Spanish Foundation for Science and Technology (FECYT), an organization set up in 2001 and answerable to the Ministry of Economy. The Foundation is responsible for a variety of support tasks related with science and technology; it is not a specialized evaluation agency and draws its staff from both the private and the public sector.

The uses

The way the results yielded by the evaluations described above are used depends on the type of evaluation. The “sexenio” works mainly as a salary bonus scheme and it plays, therefore, a distributive role. The resources available to CNEAI to assess thousands of academics per year are very limited, and its decision cannot be used for purposes other than justifying the concession or denial of a “sexenio.” In practice, CNEAI’s decisions are driven by publication indicators, mainly the number of journal articles published in journals listed in the Web of Science and other academic publications, preferably foreign-language scholarly books and journals. There are thresholds that candidates must achieve, but assessment of the individual quality of the publications submitted is effectively precluded by lack of resources. Accordingly, the way decisions are communicated to candidates is extremely brief: A phrase justifying the decision is accompanied by a long pro forma paragraph explaining how, in the case of a nonaward, the academic can seek redress, and providing the legal basis for this process (specifically three legal articles from two different pieces of legislation, stemming from two different ministries). The way in which the decision is communicated reflects the administrative nature of the process, which, in practice, focuses many of its efforts on confirming the authenticity of the academics’ claims. In this way, the distributive role providing an incentive for academics is accompanied by a controlling function: The submissions are carefully audited and as most tenured academics will make at one point or another of their careers at least one submission and sometimes up to six, the sexenio plays an important role as a way of checking that the research that academics are assumed to be doing is, in fact, carried out.

Project evaluations, carried out mainly by the ANEP, have as their goal the distribution of financial resources. The system here is also shaped by the large throughput of proposals and project assessments that need to be dealt by an organization that manages on very limited resources compared to the size of the task it performs. The evaluations received by the applicants will vary across programs, but they tend to be cursory and, for some programs, reviewers’ comments are only forwarded to unsuccessful applicants. In practice, the review is not seen as a source of advice to researchers but as a piece of evidence to support an administrative decision. The interaction between applicants, the managing agency and peer reviewers is minimized; researchers and reviewers do not engage in any exchange of views. The reviewing process provides very little information to applicants about how to improve their future project proposals, and therefore has few, if any, chances of being used to improve practice. Monitoring and ex post evaluation focus on controlling the use of resources and operate mainly as an audit mechanism.

Group evaluations carried out by universities and research organizations seem, in principle, moved by an improvement goal. Instead of adding to the, already heavy burden of administrative compliance, the main objective of such organization-driven evaluations should be to guide the development of research strategies and policies at the organization level. Yet, a lot of effort is invested again in comprehensive data entering and in verifying researchers’ claims. At CSIC, for instance, the strategic plan formally pursues improvement purposes and adds to them a distributive goal. The strategic plans that are adopted after peer-review set objectives down to the level of institutes and centers and establish a program for the allocation of resources (including, importantly, tenured positions) to each of them. In practice, however, the approved strategic plan does not represent a commitment from the participating partners (central CSIC administration, the management structure for the different research areas, and the research centers and institutes), does not determine the allocation of researcher positions, and there is no established mechanism to follow the implementation of its recommendations. In its implementation the process has lead to a limited set of annual quantitative targets (publications, funding raised, etc.) against which groups and institutes are measured each year; if deemed successful their employees are awarded a small productivity bonus. Again claims need to be verified and what was designed as an evaluation strategy with a strong improvement function has developed into an auditing system adding to the control mechanisms already in place, and backed up by a weak resource distribution tool in the shape of an economic incentive.

At a higher level, the “Integral Monitoring and Evaluation System,” which as we have discussed generates an annual evaluation of the National Plan, describes its objectives as follows:

The Integral Monitoring and Evaluation System (SISE) is the tool designed for management control of all public programs in support of R&D&I, to improve transparency and the publicity given to all interventions, so that Spanish citizens are kept informed about the activities being supported by public funds (my translation of the Spanish version available at www.plannacionalidi.es/gestion/seguimiento.php).

This paragraph is clear about the main function allocated to the “system”: SISE is a monitoring tool to provide control of public actions. It is apparent therefore that the improvement function is not currently an objective of current evaluation practice, nor is it one of the goals for the development of an evaluation system. Although it is sometimes presented as a “Permanent Observatory of the Spanish Science-Technology-Society System,” SISE has its roots in a group set up to monitor the National Plan (the National Plan Monitoring Commission), and its activity focuses on continuing this monitoring role.

Discussion: Evaluation and Governance

The two cases presented above show the different evaluation practices that have emerged in the United Kingdom and Spain and how they are linked to the ways in which science is organized and governed. The systems reviewed are complex. The United Kingdom and Spain are large countries with complicated administrative structures implementing a variety of research policies and many institutional actors who can deploy different evaluation strategies. Nevertheless, comparison of the United Kingdom and Spanish research evaluation systems reveals that they have little in common in terms of their dominant practices and goals. Research groups and projects are the main evaluands in the U.K. system, while the Spanish addresses all evaluands but one of its most important tools focuses directly on the individual researcher. The responsibility for organizing large evaluation exercises in the United Kingdom tends to lie with the same funding agencies, while this role in Spain is transferred to specialized agencies, probably to provide a degree of independence that can legitimize their controlling functions. Spain’s processes rely mostly on peer review, while the United Kingdom deploys some modified peer-review approaches (with a small participation of experts from industry and government) and makes an extensive use of contracted evaluation specialists. But perhaps the most important difference between both systems lies in the use that is made of evaluation results. While the U.K. system extensively uses evaluation for distributive and improvement purposes, the Spanish system does not attend to the improvement function of evaluation being, for instance, unconcerned with analyzing the processes through which outcomes arise. The Spanish research evaluation system has some weak distributive functions and has developed a thicket of administrative processes serving controlling purposes.

These differences can be linked to the research governance structures. The British dual funding system, where project funding plays a key role and core funding allocations are reviewed periodically, can be more easily shaped and steered. It is not uncommon for even senior academics in the United Kingdom to derive a part or the whole of their salaries from research project funding, during extensive periods of their careers. The research project, then, is a crucial tool to sustain the individual activity of many academics and to define research priorities and activities. Since they must support salaries as well as all other research costs, U.K. research grants tend to be large in relation to the staff formally involved, particularly if compared to countries such as Spain where the majority of academic personnel formally participating in research projects is tenured and their costs are not charged to the project. Consequently, in the United Kingdom, research grants are an important determinant of research activity, research activity tends to be project-based and large projects result in less fragmented portfolios of research activities in terms of a relatively (to the size of the budget) lower throughput of proposals and projects. In this context, the evaluation of both research proposals and final reports and ex post program evaluation need to play an improvement as well as a distributive purpose. Detailed peer reports are important for the development of future projects, and the lessons from previous research initiatives identified through program evaluations can be translated into new priorities and funding mechanisms and can have a direct effect on the overall research strategy of the academic community. The importance of projects makes the system more flexible and responsive, and therefore the lessons derived from evaluation can be conveyed into practice. The controlling purpose of evaluation is not significant: it is through the constant redistribution of resources that the system can ensure accountability.

The governance of academic research in Spain is very different. First, it is a system dominated by tenured appointments: most established academics are tenured public employees, working either for their public universities or for public research establishments such as CSIC. There are some private Spanish universities but their relative importance is small except in some specific disciplines such as business administration. Access to academic tenured jobs is regulated, as in the case of all public servants, by standard bureaucratic controls administered centrally with the formal objective of ensuring fair and equal access conditions. Additional controls are established to implement a degree of accountability for the performance of the functions that are the responsibility of scientists/public employees. The controlling purpose becomes a cornerstone of the management system, including evaluation practice. In addition, the bureaucratic administration of research imposes further constraints on the ways in which academics can organize their activities. Hiring processes, in particular, are cumbersome and subject to numerous conditions. Consequently, the personnel structure in Spanish academia is very rigid. The system that emerges from these practices is difficult to adapt to the recommendations that could emerge from evaluations with an improvement purpose. Recommendations related to a perceived necessity to incorporate new skills or to change research priorities, would be difficult to implement in practice, at least in the short term.

Further, within this structure, the role of the research project is different from its function in the U.K. case. Projects such as those funded by the National Plan and other academic research programs implemented by regional authorities and other organizations cover only “marginal costs.” Consequently, research projects tend to be relatively small in financial terms, although the size of projects will naturally vary across subjects. More importantly, because in many disciplines, the majority of research resources are financed through “core funding” streams, academics are unlikely to organize their activities on a project basis. This situation makes it more difficult to influence research agendas by changes to project-funding programs. Taken together, the relatively minor role of projects and the tenured status of research scientists combine to constitute a system that is not responsive to attempts to “fine tune” its management or to steer its priorities. The improvement and distributive purposes of evaluation cannot be as effective as in a more flexible governance structure; instead, more bureaucratic research management systems call for auditing and verification processes, and therefore for an evaluation strategy that emphasizes controlling purposes. In comparison, the distributive functions are relatively weak: The dominance of a stable core of tenured academics in the civil service has so far relegated the type of flexible funding schemes that can be affected by evaluation results to the role of incentive systems or to the funding of relatively small research projects covering only marginal costs.

This article is therefore consistent with the insights of analysts who more than a decade ago were suggesting that national approaches to evaluation reflect the local administrative culture and practices. More specifically, the main purposes of evaluation practice are influenced by the research governance structure. Research evaluation tools may become similar across countries (like, for instance, the widespread use of bibliometric indicators) but the way in which these techniques are applied and the purposes of evaluation practice will vary. Therefore, learning from evaluation practices applied in other countries requires much more than the straightforward adoption of evaluation tools and techniques; it calls for an understanding of the different contexts within which research evaluation practice will develop.

Further, the differences between the United Kingdom and Spain we have identified are so profound that they can lead to a different understanding of what “evaluation” means. The dominant notion of evaluation in the U.K. context is aligned to current program evaluation practices that emphasize its distributive and improvement functions. In Spain, evaluation is seen predominantly as part of a broader system to ensure “accountability” through controlling mechanisms. The increasingly common calls to “strengthen evaluation” are likely to be interpreted very differently by communities of research policy practitioners in different countries. There is considerable potential for confusion if we assume, as we often implicitly do, that we are all speaking the same evaluation language.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The author of this article is a full-time employee of the Spanish Council for Scientific Research (CSIC).

References

ab Iorwerth

(2005). Methods of evaluating university research around the world. Ottawa, ON: Department of Finance.

Anderson

(2002). Evaluation, policy learning and evidence-based policy making. Public Administration, 80, 1–22.

Arnold

(2004). Evaluating research and innovation policy: A systems world needs systems evaluations. Research Evaluation, 13, 3–17.

Barker

(2007). The UK research assessment exercise: The evolution of a national research evaluation system. Research Evaluation, 16, 3–12.

Bustelo

(2006). The potential role of standards and guidelines in the development of an evaluation culture in Spain. Evaluation, 12, 437–453.

Chelimsky

(2006). The purposes of evaluation in a democratic society. In Shaw

I. F.

Greene

J. C.

Mark

M. M.

(Eds.), The SAGE handbook of evaluation (pp. 33–55). London, England: Thousand Oaks.

Chen

H.-T.

(2005). Practical program evaluation. Assessing and improving planning, implementation and effectiveness. London, England: Thousand Oaks.

Coryn

C. L. S.

(2007). Evaluation of researchers and their research: Toward making the implicit explicit. Kalamazoo: Western Michigan University.

Coryn

C. L. S.

Hattie

J. A.

Scriven

Hartmann

D. J.

(2007). Models and mechanisms for evaluation government-funded research: An international comparison. American Journal of Evaluation, 28, 437–457.

10.

Cozzens

S. E.

(2003). Frameworks for evaluating S&T policy in the United States. In Shapira

Kuhlmann

(Eds.), Learning form science and technology evaluation. Experiences from the United States and Europe (pp. 54–64). Cheltenham, England: Edward Elgar.

11.

Cozzens

S. E.

Turpin

(2000). Processes and mechanisms for evaluating and monitoring research outcomes from higher education: International comparisons. Research Evaluation, 9, 3–4.

12.

Cruz-Castro

Sanz-Menéndez

. (2008). Research evaluation in transition. individual versus organisational assessment in Spain. In Whitley

Glaser

(Eds.), The changing governance of sciences. The advent of research evaluation systems (pp. 205–223). Dordrecht, The Netherlands: Springer.

13.

Darvas

(1997). The political and economic context of research evaluation in Eastern Europe. In Frankel

M. S.

Cave

(Eds.), Evaluating science and scientists. An East-West Dialogue on research evaluation in post-communist Europe (pp. 18–27). Budapest, Hungary: Central European University Press.

14.

De Campos

A. L.

(2010, October). Economic impact assessment within the research councils. Report to RCUK Strategy Unit and Performance Evaluation Group.

15.

Féron

Crowley

(2003). From research policy to the governance of research? A theoretical framework and some empirical conclusions. Innovation, 16, 369–393.

16.

Frankel

M. S.

Cave

(1997). Introduction. In Frankel

M. S.

Cave

(Eds.), Evaluating science and scientists. An East-West dialogue on research evaluation in post-communist Europe (pp. 1–6). Budapest, Hungary: Central European University Press.

17.

Frederiksen

L. F.

Hansson

Wenneberg

S. B.

(2003). The agora and the role of research evaluation. Evaluation, 9, 149–172.

18.

Georghiou

(1995). Research evaluation in European national science and technology systems. Research Evaluation, 5, 3–10.

19.

Geuna

Martin

B. R.

(2003). University research evaluation and funding: An international comparison. Minerva, 41, 277–304.

20.

Gibbons

Georghiou

(1987). Evaluation of research: A selection of current practices. Paris, France: OECD.

21.

Jiménez-Contreras

de Moya Anegón

López-Cózar

E. D.

(2003). The evolution of research activity in Spain: The impact of the national commission for the evaluation of research activity (CNEAI). Research Policy, 32, 123–142.

22.

Jordan

(2008). The governance of sustainable development: Taking stock and looking forwards. Environment and Planning D. Government and Policy, 26, 17–33.

23.

Kuhlmann

(2003). Evaluation as a source of ‘strategic intelligence'. In Shapira

Kuhlmann

(Eds.), Learning from science and technology policy evaluation. Experiences from the United States and Europe (pp. 352–375). Cheltenham, England: Edward Elgar.

24.

Lascoumes

(1998). Pratiques et Modèles de l'Evaluation. In Kessler

M.-C.

Lascoumes

Setbon

Thoenig

J.-C.

(Eds.), Evaluation des Politiques Publiques (pp. 23–34). Paris, France: L’Harmattan.

25.

Lebow

Spitzer

(2002). Accountability. Freedom and responsibility without control. San Francisco, CA: Berret-Koehler.

26.

Marjanovic

Hanney

Wooding

(2009). A historical reflection on research evaluation studies, their current themes and challenges. Santa Monica, CA: RAND.

27.

Martin

(1997). Factors affecting the acceptance of evaluation results. In Frankel

M. S.

Cave

(Eds.), Evaluating science and scientists. An East-West dialogue on research evaluation in post-communist Europe (pp. 28–45). Budapest, Hungary: Central European University Press.

28.

Martin

(2011). The research excellence framework and the impact agenda: Are we creating a Frankenstein monster? Research Evaluation, 20, 247–254.

29.

Martin

Whitley

(2010). The UK research assessment exercise: A case of regulatory capture? In Whitley

Glässer

Engwall

(Eds.), Reconfiguring knowledge production. Changing authority relationships in the science and their consequences for intellectual innovation (pp. 51–80). Oxford, England: Oxford University Press.

30.

Ministerio de Ciencia e Innovación. (2009). Programa de Trabajo '09. Plan Nacional de I + D + I. Madrid: Fundación Española para la Ciencia y la Tecnología (FECYT).

31.

Polanyi. (1962). The republic of science: Its political and economic theory. Minerva, 1, 54–74.

32.

Rich

R. F.

(1979). Translating evaluation into policy. London, England: Sage.

33.

Rip

(2003). Societal challenges for R&D evaluation. In Shapira

Kuhlmann

(Eds.), Learning form science and technology evaluation. Experiences from the United States and Europe (pp. 32–53). Cheltenham, England: Edward Elgar.

34.

Sanz-Menéndez

(1995). Research actors and the state: Research evaluation and evaluation of science and technology policies in Spain. Research Evaluation, 5, 79–88.

35.

Scriven

(1967). The methodology of evaluation. In Tyler

R. W.

Gagné

R. M.

Scriven

(Eds.), Perspectives of curriculum evaluation (Vol. 1, pp. 39–83). Chicago, IL: Rand McNally.

36.

Scriven

(1991). Evaluation Thesaurus (4th ed.). Newbury Park, CA: Sage.

37.

Stufflebeam

D. L.

Shinkfield

A. J.

(1985). Systematic evaluation. A self-instructional guide to theory and practice. Boston/Dordrecht/Lancaster: Kluwer-Nijhoff.

38.

Teirlinck

. (Ed.). (2011). Optimizing the research and innovation policy mix: The practice and challenges of impact assessment in Europe. Brussels.

39.

Toulemonde

. (2000). Evaluation culture(s) in Europe: Differences and convergence between national practices. Retrieved March 7, 2008, from http://www.danskevalueringsselskab.dk/pdf/Toulemonde_paper.pdf

40.

von Tunzelmann

Mbula

E. K.

(2003). Changes in research assessment practices in other countries since 1999 (Report for the Joint Funding Bodieś Review of Research Assessments). Brighton, England: SPRU.

41.

Whitley

Glässer

(Eds.). (2007). The changing governance of the sciences: the advent of research evaluation systems. Dordrecht, The Netherlands: Springer.