Abstract
Organizational evaluation capacity building has been a topic of increasing interest in recent years. However, the actual dimensions of evaluation capacity have not been clearly articulated through empirical research. This study sought to address this gap by identifying the key dimensions of evaluation capacity in Canadian federal government organizations. The methodology used, based on Leithwood and Montgomery’s Innovation Profile approach, featured semistructured interviews with evaluation experts and a validating exercise conducted in four government organizations. The framework developed as a result of the study identifies six main dimensions of evaluation capacity (human resources, organizational resources, evaluation planning and activities, evaluation literacy, organizational decision making, and learning benefits), each one broken down into further subdimensions. The evaluation capacity of organizations on each of these dimensions and subdimensions can be described using four levels: low, developing, intermediate, and exemplary. The study found that government organizations vary in terms of their capacity from one dimension to the next, and indeed, from one subdimension to the next.
Keywords
Introduction
Interest in evaluation capacity building (ECB) has increased in recent years, following an initial treatment of the issue in a volume of New Directions for Evaluation published by Compton, Baizerman, and Stockdill in 2002. Much of this work has focused on ECB in organizations and there is a growing body of conceptual and empirical work on the topic (see, e.g., Cousins, Goh, Clark, & Lee, 2004; Preskill & Boyle, 2008a). Yet, although knowledge is advancing about building the capacity of organizations to do evaluation and, to a lesser extent, use evaluation, little attention has been directed toward defining organizational evaluation capacity itself. In this article, we develop and empirically validate a framework for organizational evaluation capacity and consider implications of the framework for ongoing research and practice.
Results-based management (RBM) is an important feature of a new public management government framework applied in service organizations around the world. Managing for results requires a comprehensive system of performance measurement and program evaluation to foster increased accountability in public organizations (Jorjani, 2008; Mayne, 2009). Despite RBM’s potential, in practice many challenges exist in its implementation. For example, in the Government of Canada, the responsibility for performance measurement is placed in the hands of program managers because of their substantive knowledge (Treasury Board Secretariat, 2010). However, program managers often have neither the appropriate expertise nor guidance to undertake complex performance measurement exercises. This results in a scarcity of high-quality performance measurement data. Similarly, in the United States, the passage of the Government Performance and Results Act (GPRA) in 1993 and the implementation of the Program Assessment Rating Tool (PART) in 2004 required federal agencies to focus on establishing quantifiable measures of progress and reporting on their success. Although promising, these initiatives have not fully achieved their objectives; studies show that even if they have resulted in an increased availability of performance information, questions remain as to the tool’s use for budgetary allocation and program decision making (Mark & Pfeiffer, 2011; Mathison, 2011). More recent initiatives, such as the Performance Improvement Council (PIC), aim at making the PART process more transparent and incorporating input from various sources. These new initiatives further recognize the need to increase the capacity of organizations and individuals to use data to make fundamental program decisions (Mark & Pfeiffer, 2011). Other countries have also moved in the direction of increasingly more sophisticated performance measurement or centralized national evaluation functions, but have not necessarily been successful at integrating performance data and evaluation findings into budgetary allocation processes (see, e.g., Talbot’s presentation of the United Kingdom’s performance and evaluation system, 2010, and a discussion of the Spanish context by Feinstein & Zapico-Goni, 2010).
Aside from budgetary allocations and ongoing program administration, one of the main uses of performance measurement data in RBM systems is for periodic evaluation studies. Authentic engagement with evaluation, however, may be easier said than done. In Canada, for example, given increased requirements for evaluation coverage (as per the Treasury Board’s Policy on Evaluation, 2009) and a relatively conservative level of resources allocated to the evaluation function, departmental evaluators must use available data whenever possible to increase their efficiency. The implementation of ECB initiatives in this and other federal government contexts, therefore, offers a potential bridge between the technical expertise required to conduct evaluative activities and the substantive knowledge of program managers and staff.
ECB refers to the changes undertaken by organizations to integrate evaluation practice and use at all levels (Boyle, Lemaire & Rist, 1999; Cousins et al., 2004; Sanders, 2002; Stockdill, Baizerman, & Compton, 2002). One of the most commonly used definitions of ECB is provided by Stockdill and her colleagues (2002): … a context-dependent, intentional action system of guided processes and practices for bringing about and sustaining a state of affairs in which quality program evaluation and its appropriate uses are ordinary and ongoing practices within and/or between one or more organizations/programs/sites. (p. 8)
Added to greater concerns about evaluator recruitment and training in the federal community, ECB has become an issue of interest in recent years (Mayne, 2009; Preskill & Boyle, 2008a, 2008b). This is also true of other jurisdictions; for example, Compton and MacDonald (2008) propose ECB as a strategy to strengthen evaluation services and program effectiveness in the face of fluctuating program funding.
In their comprehensive review of the literature on the integration of evaluation into organizational culture, Cousins and his colleagues (2004) identify two types of ECB: direct ECB, which involves planned ECB activities that occur either within or outside of actual evaluation projects (e.g., training on statistical data analysis), and indirect ECB, which results from involvement of stakeholders in processes that produce evaluation knowledge. In essence, indirect ECB is akin to participatory evaluation, that is, evaluations that are conducted in partnership between those trained in evaluation logic and methods and members of the program or stakeholder organization community (Cousins & Chouinard, 2012). However, these ECB processes differ from participatory evaluation approaches in two ways: They are typically integrated into the organization’s practices and they are ongoing rather than episodic or event-driven (Preskill & Torres, 1999; Rowe & Jacobs, 1998; Stockdill et al., 2002).
ECB processes have been linked to two consequences for organizations: evaluation use and organizational learning (Cousins et al., 2004). Evaluation becomes better understood and more useful in organizations that implement intentional ECB strategies. In this way, ECB initiatives foster the development of a culture of systematic self-assessment and reflection (Cousins et al., 2004) that, in turn, can lead to increased organizational learning, referred to as “the vehicle for utilizing past experiences, adapting to environmental changes and enabling future options” (Berends, Boersma, & Weggerman, 2003, p. 1036). Thus, ECB represents one of the ways through which individual-level learning may be transferred to the organizational level (Berends et al., 2003; Popper & Lipshitz, 2000) and sheds light on how organizations can move beyond single-loop (or incremental) learning into double-loop learning (Argyris & Schon, 1978).
Organizational Factors Contributing to the Success of ECB
A number of factors or conditions leading to successful ECB in organizations have been identified in recent years. In order to clarify and organize these factors, we have classified them into the four categories outlined below.
External environment. External accountability requirements often create a demand for evaluation results and so act as a motivator for developing evaluation capacity (Gibbs, Napp, Jolly, Westover, & Uhl, 2002; Katz, Sutherland, & Earl, 2002; Mackay, 2002; Stockdill et al., 2002; Sutherland, 2004; Toulemonde, 1999).
Organizational structure. The systems and staffing structures of organizations mediate organizational members’ ability to interact, collaborate, and communicate with each other (Preskill & Torres, 2000). Successful ECB depends on the flexibility of organizational roles, since individuals must be able to step away from their main responsibilities to participate in evaluation activities (Torres & Preskill, 2001).
Organizational culture. The culture of an organization reflects the traditions, values, and basic assumptions shared by its members and that establish its behavioral norms. The culture of an organization involved in ECB must encourage questioning of organizational processes and experimenting with new approaches (Goh, 2003; Preskill & Torres, 1999; Rowe & Jacobs, 1998; Torres & Preskill, 2001; Toulemonde, 1999).
Organizational leadership. Managerial support is necessary to the implementation and sustainability of evaluation capacity within an organization (Cousins et al., 2004; Goh, 2003; Goh & Richards, 1997; King, 2002; Milstein, Chapel, Wetterhall, & Cotton, 2002; Owen & Lambert, 1995).
Although there is general support for these categories in the literature, a stronger empirical basis is warranted.
State of Research on ECB
As we have shown, the factors likely to influence the success of ECB in an organization, as well as its ultimate consequences, have been identified in the theoretical evaluation literature. In addition to the anecdotal reports of ECB that have been published (see, e.g., Diaz-Puente, Yague, & Afonso, 2008; Garcia-Iriarte, Suarez-Balcozar, Taylor-Ritzler, & Luna, 2011; Lawrenz, Thomas, Huffman, & Covington Clarkson, 2008; Taut, 2007; Volkov, 2008), work has been done to identify the stages through which organizations move as they develop their evaluation capacity (Bourgeois & Cousins, 2008), and how ECB might best be conceptualized (Huffman, Thomas, & Lawrenz, 2008; Preskill & Boyle, 2008a; Taylor-Powell & Boyd, 2008). However, few empirical studies have focused on how evaluation capacity is manifested in organizations and how it can be assessed (one recent example is found in Nielsen, Lemire, & Skov, 2011). Such information would advance our knowledge and provide a backdrop for further work. Thus, in this article we attempt to identify the key dimensions of evaluation capacity in organizations, operationalized through a framework based on the Innovation Profile approach developed by Leithwood and Montgomery (1987). From a practical perspective, this framework offers organizations a model for its members to reflect on their capacity development activities. The framework can also be used as the basis for the development of an instrument focusing on organizational self-assessment of evaluation capacity. Accordingly, we addressed the following research questions in the current study: What are the essential dimensions of evaluation capacity in Canadian federal government organizations? How are minimal and exemplary performance on each of these dimensions characterized? What are the steps required to move from minimal to exemplary performance?
Method
Data collection encompassed three phases, reflecting an adaptation of the innovation profile approach (Leithwood & Montgomery, 1987). Conceptually, this approach—which was developed in the education sector within the context of implementing planned changes in classroom practices—focuses on growth defined by observable change from a current state of practice toward an ideal state. The process involves identifying concrete behavioral manifestations of the current state and building a series of manageable steps for multiple dimensions of the desired innovation. These steps should be challenging enough to represent observable change from the previous state, but be feasible in order to enable step attainment or success in moving from one step to the next (Leithwood & Montgomery, 1987). The descriptions developed for each behavioral change are generally based on a qualitative data collection process. Application of the innovation profile approach thus results in a multidimensional matrix describing growth in performance or, in the case of this study, evaluation capacity development in organizations.
The innovation profile strategy was used by Cousins, Aubry, Smith-Fowler, and Smith (2004) as an alternative approach to process evaluation in their study of mental health case management (Cousins et al. refer to the approach as key component profiles.). We argue that it is well suited to the study of organizational evaluation capacity because of its focus on the incremental steps required to move from low to high capacity and its flexibility, defined in terms of the inclusion of varying numbers of levels across dimensions as well as its accommodation of a wide array of dimensions (and subdimensions). The three phases undertaken as part of the current study are summarized below.
Phase 1: Identification of Key Dimensions of Evaluation Capacity (Divergent Phase)
The first phase focused on identifying the key dimensions of evaluation capacity through an in-depth literature review and a series of expert interviews. An important aspect of the literature review involved moving beyond descriptions of capacity building initiatives undertaken in various organizations to definitions and features of evaluation capacity itself.
Once the literature review was completed, we conducted semistructured interviews with expert informants who have a broad view of evaluation in the Canadian federal government. We recruited four individuals for the first phase of the study; two were external consultants who have worked with several departments and agencies on evaluation studies and two were former or current senior officials of a central agency of the government of Canada who have worked on interdepartmental evaluation issues and are familiar with the challenges faced by different departments and agencies as they develop their evaluation capacity. Their point of view, as insiders of the federal evaluation community but outsiders with respect to the evaluation function of specific departments and agencies, informs their overall vision of how evaluation capacity appears in various organizations. The purpose of these interviews was to obtain these experts’ definitions of evaluation capacity as well as to solicit their views on behavioral manifestations of capacity.
In our content analysis of the literature review and interview data, potential dimensions and markers of evaluation capacity were used to identify the main categories for coding purposes. We summarized the results of this analysis in a draft framework of evaluation capacity.
Phase 2: Review and Feedback on Draft Framework (Convergent Phase)
The second phase of data collection focused on confirming the key dimensions of evaluation capacity derived from Phase 1. We once again used key informant interviews with the four experts consulted in the first phase of the study. We asked participants to review the draft framework and provide feedback on its clarity and contents. Based on this review, we could confirm existing dimensions and subdimensions or identify challenges that warranted changes to the framework.
Phase 3: Triangulation of Findings Included in the Framework
The third phase was a validation exercise undertaken to finalize the draft evaluation capacity framework. It focused on key informant interviews with evaluators and decision makers from four federal government departments and agencies. The participating organizations were selected on the advice of the experts consulted previously and were chosen to ensure varying levels of evaluation capacity as assessed by the experts. The representatives were asked to implement the framework in their own settings and provide feedback on its utility in terms of organizational reflection and improvement. We contacted three individuals in each organization: the Head of Evaluation, a senior evaluator, and a decision maker. We conducted 11 interviews in this phase of the study.
As with the previous interviews, we used a qualitative content analysis to identify trends in the data. Because of the increased complexity associated with the use of four different organizations and three different organizational roles, data coding and analysis were more detailed than in the first two phases and took these types of variables into account. First, the data were aggregated by organizational role; this analysis enabled us to validate and further refine the categories of evaluation capacity included in the draft framework. Second, data were aggregated and analyzed by organization; the findings from this analysis have been reported elsewhere (Bourgeois & Cousins, 2008).
Results
The final version of the framework, presented in Tables 1–6, provides a summary of our key findings. A more detailed description of these results follows.
Structure of Framework
The framework presents the dimensions of evaluation capacity as identified in Canadian federal government organizations. Several structural elements were utilized to ensure clarity and consistency. Six main dimensions emerged from the three data collection phases, which we divided into two broad categories: “capacity to do” evaluation and “capacity to use” evaluation. Most participants focused on the “capacity to do” category, likely because the dimensions included here are easier to control and speak to the more operational facets of evaluation. Each dimension is further organized into a number of subdimensions; again, these were based on interview data and focus on more specific descriptions of the dimension. The final components of the framework distinguish the differing levels of evaluation capacity: “low capacity,” “developing capacity,” “intermediate capacity,” and “exemplary capacity.”
The first main dimension (see Table 1), Human Resources, addresses the composition of the evaluation unit itself and is divided into five subdimensions. The first subdimension, Staffing, refers to the balance of evaluation positions within the organization and whether these are sufficient to manage the workload identified in the evaluation plan. It also includes career progression for evaluators, which deals with employee retention, and succession planning, two issues crucial to capacity building and maintenance. The second and third subdimensions focus on the technical and interpersonal skills required of evaluators. Skills related to the identification of evaluation issues, the use of appropriate data collection methods, the generation of evidence-based recommendations, and project management are part of the technical abilities required of evaluators. “Softer” skills such as building client trust, communicating evaluation messages in a clear and transparent way, and meeting program stakeholders’ informational needs are part of the communications and interpersonal skills used by evaluators. The fourth subdimension involves professional development and includes elements related to both internal and external professional development activities, as well as the development of learning plans for evaluation staff members and ongoing assessments of the skill set that exists within the evaluation unit. Finally, the fifth subdimension refers to the quality of the leadership within the evaluation unit. Good leaders should have both evaluation and management experience, be able to translate the information needs of senior managers into concrete project plans, and act as mentors or coaches for team members.
Capacity to Do Evaluation, Dimension 1: Human Resources.
Participants focused heavily on the Human Resources dimension during the interviews, especially those directly involved with evaluation. This observation suggests that, in their view, the essence of evaluation capacity may be more heavily aligned with the Human Resources dimension, rather than a more balanced perspective including all six dimensions.
The second dimension (Table 2) is Organizational Resources. Three subdimensions are included: budget, ongoing data collection, and organizational infrastructure. Budget refers to the stability of the evaluation budget and whether it provides sufficient funding to complete the activities outlined in the evaluation plan. Ongoing Data Collection speaks to the performance measurement systems that are in place within the organization and that produce information that is fed into evaluation studies. Organizational Infrastructure is the stability of the governance structure, the existence of organizational evaluation policies, and the organizational supports that help or hinder the work of evaluators, such as procurement services.
Capacity to Do Evaluation, Dimension 2: Organizational Resources.
Note. RPP = Report on Plans and Priorities.
The third dimension (Table 3) focuses on the activities undertaken by evaluators as part of their regular duties. The development of an organization-wide evaluation plan is key among the subdimensions that make up this section. It is characterized by the development of an evaluation plan in consultation with other stakeholders, the inclusion of a risk assessment process in the identification of evaluation priorities, ongoing intelligence gathering, and a systematic review of the evaluation unit itself. Evaluators in most departments use consultants to some extent, so it was included as a subdimension. Information sharing within the unit was included here as well, since evaluation staff members spend a considerable amount of time sharing with their colleagues information related to their progress on certain files or on general project management issues. Evaluators in some organizations also establish linkages with external supports such as professional associations, program stakeholders, and other organizations likely to provide assistance, such as the Treasury Board Secretariat. In addition, evaluation staff may establish linkages within their own organizations through formal or informal ties in order to remain informed regarding policy decisions likely to affect their work and to better share the results of evaluations conducted by members of the unit.
Capacity to Do Evaluation, Dimension 3: Evaluation Planning and Activities.
The fourth dimension is the first one included under the overarching “capacity to use” evaluation category and reflects a less operational perspective (see Table 4). It focuses on Evaluation Literacy within the organization and is divided into two subdimensions: Involvement in evaluation and results-management orientation. Involvement in evaluation is the participation of program staff and other stakeholders in the evaluation process. Participatory evaluation theory holds that the greater the involvement of stakeholders in all phases of an evaluation, the greater the instrumental, conceptual, and process use of evaluation (Cousins & Chouinard, 2012). Therefore, in order to build evaluation capacity, organizations must pay attention to the involvement of staff members in the evaluation process. Results-management orientation refers to the larger organizational culture and the messages that are brought forward by senior managers. A results-management orientation can be manifested through the development of results chains for programs and the implementation of performance measurement strategies.
Capacity to Use Evaluation, Dimension 4: Evaluation Literacy.
The fifth dimension (Table 5) focuses on the integration of evaluation information with organizational decision-making processes. At the outset, management processes such as the development of Memoranda to Cabinet (MC) and Treasury Board (TB) submissions should consider evaluation in order to ensure that sufficient resources are provided for the eventual evaluation of new initiatives. At the final stage of the evaluation process, the findings and recommendations made in an evaluation study should be clearly linked to budget allocation and other high-level organizational and policy decisions. An organization with exemplary capacity searches out evaluation information in its decision-making process and relies on this information on an ongoing basis.
Capacity to Use Evaluation, Dimension 5: Integration With Organizational Decision Making.
Finally, the sixth dimension, Learning Benefits, addresses the types of uses that can be made of evaluation information within an organization (see Table 6). At a more operational level, the evaluation findings can be used as a basis for action and change through the implementation of evaluation recommendations (instrumental use). The evaluation findings can also have an impact on stakeholders’ understanding of, and attitudes toward, a program by clarifying certain operational aspects or by highlighting specific program results (conceptual use). At a broader level, participation of organizational members in the evaluation process can result in behavioral or cognitive changes within these individuals based on their exposure to evaluation (process use).
Capacity to Use Evaluation, Dimension 6: Learning Benefits.
Organizational Variation
The specific elements included in each level of evaluation capacity (i.e., the bullets within each cell in the matrix) varied somewhat over the course of the development of the framework. Elements were added as necessary to increase the clarity of the description and to differentiate between levels. It is probable that there would be within-organization variation in the profile of any given organization. The purpose of the framework is to describe organizational evaluation capacity and to provide organizations with a means of generating information that can be used to identify the particular elements that require improvement in order to reach desired levels of evaluation capacity. Therefore, variation that may be observed within an organization between its levels of evaluation capacity on different subdimensions is to be expected, and may facilitate discussion of next steps for the organization in terms of developing its capacity.
Validation Exercise
In the third phase of the study, four different federal government departments were asked to assess their organization based on the dimensions and subdimensions developed in the first two phases. The purpose of this exercise was 2-fold: First, it helped us identify missing elements and verify the clarity of the wording used; second, it enabled us to test the framework as a complete organizational self-assessment of evaluation capacity (as reported in Bourgeois & Cousins, 2008). Overall, data obtained in this phase of the study validated the framework: The capacity levels of the participating organizations that had been identified by the experts consulted in the first phase of the study were consistent with the results produced through the application of the framework. Further, participants felt that the framework enabled them to document specific resource requirements based on their vision of evaluation in their respective organizations, and provided them with a guide for measuring the success of their ECB activities. Participants expressed an interest in obtaining a final version of the framework for use in their organizations, and stated that a self-assessment tool based on a more quantitative measure of organizational evaluation capacity could be useful. A longer term recommendation for both research and practice, therefore, is the transformation of the framework into an instrument for assessing such capacity. This work is currently underway. Broader methodological issues to be addressed include the instrument’s reliability, as well as the weightings of subdimensions based on their importance to the organization, as was done by Cousins, Aubry, Smith-Fowler, and Smith (2004).
This last element is important because the structure of the framework assumes that the dimensions and subdimensions are equally weighted. In practice, this may not be true. One can imagine, for example, an organization at an early stage of evaluation capacity development being more interested in focusing on the capacity to do evaluation rather than the capacity to use it. Once evaluation systems and functions are developed, implemented, and to some preliminary degree, institutionalized, we might expect more pronounced interest in improving organizational capacity to use evaluation.
Conclusion
Although much has been published on ECB, the actual characteristics and attributes of evaluation capacity itself have rarely been defined and described based on empirical data. This study concluded that evaluation capacity in Canadian federal government departments and agencies can be described functionally and operationally through six main dimensions that reflect an organization’s ability to do evaluation and use evaluation: human resources, organizational resources, evaluation planning and activities, evaluation literacy, organizational decision making, and learning benefits. Each of these dimensions was broken down into a number of subdimensions, with evaluation capacity being assessed using four levels: low, developing, intermediate, and exemplary. Although the Leithwood and Montgomery (1987) approach permits variation across dimensions in terms of the number of levels, interview respondents felt that a common structure across all dimensions would provide a clearer picture of evaluation capacity and make the resulting framework more useful. The number of subdimensions varies from one dimension to the next, in an attempt to develop a comprehensive framework of evaluation capacity.
The study yields important clues as to what a theory of change of evaluation capacity might look like, by suggesting that organizational development in this domain does not occur in linear fashion across a series of elements or dimensions. In addition, the framework enhances our understanding of the potential impacts of targeted organizational improvement initiatives by showing the steps required to move between levels of capacity. These lessons extend well beyond a discussion of organizational evaluation capacity.
Continuing research may focus on expanding the scope of the framework to other types of organizations or government organizations in different jurisdictions and contexts. It seems likely that the dimensions and subdimensions identified here would generalize well, given the commonalities in application of measurement and evaluation systems in governance frameworks that embrace RBM and new public management. It would be instructive to examine the applicability of the framework to the voluntary sector. Preliminary findings from other research on evaluation capacity suggest that governmental and nongovernmental (voluntary sector) organizations differ significantly in their capacity to conduct and use evaluation. Despite higher ratings of capacity to do evaluation in government settings, the capacity to use it was seen as lower than in the voluntary sector (Cousins, Goh, Elliott, & Aubry, 2008). This finding may be at least partly attributable to the fact that many voluntary organizations, due to their smaller scale, would directly assign managers and decision makers to evaluation roles, rather than having a self-standing evaluation unit or function. One can imagine process use being higher in such instances, since evaluation would be more integrated into the organizational decision-making function. In any case, additional research is required to determine the applicability and relevance of the framework across organizational sectors.
The context within which this study was undertaken poses certain limitations to the interpretation of its findings. The focus on Canadian federal government organizations, in particular, generated findings that are applicable to these organizations but may not be appropriate in other contexts. Further, the small number of participating organizations has resulted in some data loss, especially in the case of the low capacity organization, in which a suitable evaluation user could not be found who might offer a balancing perspective to the assessment of the Head of Evaluation and senior evaluator.
As discussed previously, the major practical implication of this study is the potential transformation of the proposed framework into an instrument for assessing evaluation capacity in government organizations. Such an instrument could serve as a valuable self-reflection tool within organizations, generating serious discussion and debate about evaluation capacity, and optimal strategies for improving it. As is the case with innovation profiles, the use of such a tool would best be restricted to formative, developmental challenges within the organization, as opposed to more summative, accountability-oriented demands. Ongoing research on the use of such a tool and its associated benefits and drawbacks would further knowledge development in this area, and represents another valuable avenue to pursue.
Footnotes
Acknowledgments
The authors would like to thank Eleanor Toews for her support in updating the literature review.
Authors’ Notes
The opinions expressed in this article are those of the authors and do not reflect the views of the Government of Canada. An earlier version of this article was presented at the American Evaluation Association’s Annual Conference 2008 in Denver, Colorado.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
