Abstract
Researchers have conducted numerous empirical studies on evaluation capacity (EC) and evaluation capacity building (ECB) in Western cultural settings. However, little is known about these practices in non-Western contexts. To that end, this study identified the major dimensions of EC and feasible ECB approaches in Taiwanese elementary and junior high schools. Using a Delphi technique with 23 experts, the research sought consensus on the components of EC organized in three categories (evaluation culture, evaluation infrastructure, and human resources) and on approaches to building it in Taiwanese schools. The study also identified school-driven and government-driven approaches to ECB in this context. Although the findings support the major dimensions and approaches identified in the Western literature, unique differences emerged in the Taiwanese context. The article concludes with implications for theory and practice.
Evaluation capacity building (ECB)—the intentional work to increase the ability of an organization to conduct and use evaluation—has recently attracted a great deal of attention in the field of evaluation. The accountability movement has placed demands on public organizations, including schools, to prove their effectiveness, and program evaluation is one effective approach to improving educational programs and responding to accountability demands (Nevo, 2009). ECB attempts to address the problems inherent in doing and using evaluation in organizations (King & Volkov, 2005). Clinton (2014) proposes that willingness to engage in evaluation and to use its findings increases the likelihood of program outcomes and sustainability, making ECB a worthwhile endeavor. However, the question of how to build evaluation capacity (EC) has become the focus of an increasing body of empirical research. Scholars are clear that more research is needed (Leviton, 2014; Naccarella et al., 2007; Nielsen, Lemire, & Skov, 2011; Preskill, 2014), and one obvious gap in the literature relates to the lack of empirical studies that explicitly examine ECB in non-Western settings.
Although they are related, the concept of EC differs from that of ECB. Increased EC is an outcome of ECB. Milstein and Cotton (2000) proposed a conceptual framework, and several empirical studies of EC were conducted in the context of American, Canadian, and Danish organizations (i.e., Bourgeois & Cousins, 2013; Cousins et al., 2008; General Accountability Office [GAO], 2003; Nielsen et al., 2011; Taylor-Ritzler, Suarez-Balcazar, Garcia-Iriarte, Henry, & Balcazar, 2013). Additionally, several studies on EC or ECB have been conducted in schools or districts, including the United States and Israel (i.e., Huffman, Thomas, & Lawrenz, 2008; King, 2002; Rosenstein & Englert, 2008; Trevisan, 2002). However, ECB is a complex and contextual process (Cousins & Bourgeois, 2014; Cousins, Goh, Elliott, Aubry, & Gilbert, 2014; Naccarella et al., 2007; Stockdill, Baizerman, & Compton, 2002), and little is known about EC and approaches to ECB that may prove valid in other diverse contexts. The field could benefit from research to validate current frameworks of EC and to provide guidance for ECB in settings around the world (Rosenstein & Englert, 2008; Suarez-Balcazar & Taylor-Ritzler, 2014).
To address the gap, this study sought to identify the major dimensions of EC in an Asian setting, that is, in Taiwanese elementary and junior high schools, and feasible approaches to building such capacity in the schools. Using a Delphi technique, this study addressed two research questions:
This article begins with a brief literature review that distinguishes between EC and ECB, followed by a description of the research context and methods used. The results of the three-stage Delphi study are then presented, followed by a discussion of the results and conclusions.
EC and ECB
As noted, the outcome of ECB is EC, making a thorough analysis of EC important for studying ECB. Naccarella et al. (2007) and Nielsen, Lemire, and Skov (2011) argue that, to date, many studies have either blurred or failed to attend to this distinction between EC and ECB. The research reported here paid systematic attention to the difference. The following analysis includes studies that were published prior to 2015 that explicitly focused on the dimensions of EC as well as the ECB process.
EC
Frameworks for EC have been proposed at different levels, including the individual (e.g., Tseng, 2011), program (e.g., Martin & Carey, 2012), organizational (e.g., Nielsen et al., 2011), and societal (e.g., Boyle, Lemaire, & Rist, 1999). Naccarella et al. (2007) note that differing views about what is being built (EC) will inevitably result in varying conceptualizations of ECB. Definitions of EC share common features, including both the capacity to conduct evaluation and to use it. The capacity to use evaluation usually includes using findings as well as teaching/learning by involving people in evaluation, that is, process use (Bourgeois & Cousins, 2013; King & Volkov, 2005). Put a different way, some conceptualize EC in terms of the supply side, and others in terms of the demand side (Boyle et al., 1999; McDonald, Rogers, & Kefford, 2003), which implies a necessary balance between being able to do and to use evaluation (Cousins et al., 2008; Cousins, Goh, Elliott, & Bourgeois, 2014).
Numerous studies have been conducted on ECB; however, most of them pay little attention to the concept of EC. Several of them address the objectives/outcomes of ECB as individual evaluation knowledge, skills, attitudes, and/or sustainable evaluation practices (Bourgeois, Chouinard, & Cousins, 2008; King & Volkov, 2005; Labin, Duffy, Meyers, Wandersman, & Lesesne, 2012; Mackay, 2002; Preskill & Boyle, 2008; Suarez-Balcazar & Taylor-Ritzler, 2014). Their viewpoints to some extent provide insight into the analysis of EC.
Only a few of the studies have specifically characterized dimensions of EC. A review of these studies shows that all of the scholars identify EC as a multidimensional construct. Milstein and Cotton (2000), for example, initiated an exploration of EC as five interrelated dimensions: motivational forces, organizational environment, workforce and professional development, resources and support, and learning from experience. By contrast and based on data from American federal agencies, the GAO (2003) suggested four dimensions of EC: evaluation culture, data quality, analytic expertise, and collaborative partnerships. Taylor-Ritzler, Suarez-Balcazar, Garcia-Iriarte, Henry, and Balcazar (2013) developed a validated EC model for nonprofit organizations in the U.S. context. In this model, EC had three dimensions: EC outcomes, individual factors, and organizational factors, the latter of which mediate the individual factors and EC outcomes.
In addition, Nielsen et al. (2011) developed an evidence-based EC model for the Danish local administration using four dimensions: objectives, structures and processes, technology, and human capital. In the Canadian context, Cousins et al. (2008) proposed a conceptual framework of the capacity to do and use evaluation. The former included formal requirements, tools/resources, and support for evaluation practices; the latter contained use of findings and use of evaluation processes. Furthermore, Bourgeois and Cousins (2013) broadened the capacity to do and to use evaluation using six dimensions: human resources, organizational resources, evaluation planning and activities, and evaluation literacy, integration with organizational decision-making, and learning benefits. Several empirical studies have been conducted based on the latter two frameworks (i.e., Bourgeois, Whynot, & Thériault, 2015; Cousins, Bourgeois, & Associates, 2014; Cousins, Goh, Elliott, Aubry, et al., 2014). The following highlights four overarching themes across the extant studies that directly address the dimensions of EC, each of which is discussed below: evaluation structure, organizational resources, human resources, and evaluation culture.
Evaluation structure
First, scholars identify the structure that supports evaluation in their frameworks, and these are included in their dimensions of objectives, motivational forces, planning and activities, organizational environment, resources, structures and process, organizational factors, and the capacity to do evaluation (Bourgeois & Cousins, 2013; Cousins et al., 2008; GAO, 2003; Milstein & Cotton, 2000; Nielsen et al., 2011; Taylor-Ritzler et al., 2013). Labin, Duffy, Meyers, Wandersman, and Lesesne (2012) and Labin (2014) also propose that processes, policies, and practices are the outcomes of ECB.
Several scholars include evaluation policy, planning, guidelines, and professional standards for quality evaluation as part of evaluation structure (Bourgeois & Cousins, 2013; Cousins et al., 2008; Milstein & Cotton, 2000; Nielsen et al., 2011), and some suggest the importance of an evaluation unit and positions in EC (Bourgeois & Cousins, 2013; Milstein & Cotton, 2000; Nielsen et al., 2011). Some also stress gaining access to experts and networking/partnership with governments or the evaluation community (Bourgeois & Cousins, 2013; GAO, 2003; Milstein & Cotton, 2000; Taylor-Ritzler et al., 2013), which is sometimes viewed as one approach to ECB. Some scholars identify professional development, which is also frequently identified as an approach to ECB, as an aspect of EC (Bourgeois & Cousins, 2013; Milstein & Cotton, 2000; Taylor-Ritzler et al., 2013). In addition, Bourgeois and Cousins (2013) and Taylor-Ritzler et al. (2013) propose information sharing within the evaluation unit or the organization. Bourgeois and Cousins (2013) pinpoint organizational linkages between evaluators and program managers in the EC of the Canadian federal government.
Organizational resources
Second, scholars identify the importance of technical resources for data collection and analysis to assure the quality of evaluation, including evaluation models, methods, tools, data systems, databases, software, or hardware (Bourgeois & Cousins, 2013; Cousins et al., 2008; GAO, 2003; Milstein & Cotton, 2000; Nielsen et al., 2011; Taylor-Ritzler et al., 2013). Some also suggest the importance of financial resources in EC (Bourgeois & Cousins, 2013; Cousins et al., 2008; Milstein & Cotton, 2000; Nielsen et al., 2011). Moreover, time provided is viewed as another item in the EC framework (Cousins et al., 2008; Taylor-Ritzler et al., 2013). As noted earlier, Labin et al. (2012) and Labin (2014) also propose that resources are the outcomes of ECB.
Human resources
Third, scholars include the human aspects of EC in their dimensions of workforce and professional development, human capital, human resources, individual factors, and the capacity to do evaluation (Bourgeois & Cousins, 2013; Cousins et al., 2008; GAO, 2003; Milstein & Cotton, 2000; Nielsen et al., 2011; Taylor-Ritzler et al., 2013). Several scholars also propose that knowledge, skills/behaviors, and attitudes are the outcomes of ECB (Bourgeois et al., 2008; Labin, 2014; Labin et al., 2012; Preskill & Boyle, 2008). Some emphasize the evaluation knowledge and skills of evaluators (Bourgeois & Cousins, 2013; Milstein & Cotton, 2000; Nielsen et al., 2011). Nielsen et al. (2011) also focus on the evaluators’ experience and qualifications. Milstein and Cotton (2000) go further to include the stakeholders’ knowledge and skills required for involvement in evaluation. Additionally, some scholars propose leadership as a component of EC. One of them focuses on the leader advocate (Milstein & Cotton, 2000), another stresses the leadership quality of the evaluation unit (Bourgeois & Cousins, 2013), and the third includes effective leadership of program managers (Taylor-Ritzler et al., 2013). Labin et al. (2012) and Labin (2014) also propose that leadership is one of the outcomes of ECB. Nevertheless, Nielsen et al. (2011) claim that the role of leadership is more an aspect of the process of building EC, rather than of EC itself.
Evaluation culture
Fourth, among EC studies, the GAO study (2003) identified evaluation culture as one dimension of EC. In this study, evaluation culture was indicative of routinely planning, implementing, and using evaluation results to inform program improvement. Although Nielsen et al. (2011) do not propose the dimension of evaluation culture in their framework, they do suggest a future revision including it since they perceive a lack of a belief system as a weakness of their model. Additionally, Cousins and Bourgeois (2014) frame EC as an integration of evaluation into organizational culture. Labin et al. (2012) propose culture as one of the outcomes of ECB. Other scholars also propose that regular and sustainable evaluation practice is the long-term, fundamental goal for ECB (Preskill & Boyle, 2008; Suarez-Balcazar & Taylor-Ritzler, 2014).
In addition to these components, scholars propose the capacity to use evaluation as part of EC, which is relevant to evaluation culture (regular use of evaluation) identified by the GAO (2003). For example, Bourgeois et al. (2008) and Cousins et al. (2008) identify the capacity to use evaluation in terms of the use of evaluation findings and process. In their dimensions of the individual factors and the EC outcomes, Taylor-Ritzler et al. (2013) focus both on individual awareness and motivation and on mainstreaming evaluation practices into work processes and use of findings. Bourgeois and Cousins (2013) broaden the capacity of use to three components: evaluation literacy, integration with organizational decision-making, and learning benefits. To illustrate their “objective” dimension, Nielsen et al. (2011) advocate the routine use of evaluation practices and findings to inform decision-making and the policy cycle, which is similar to “integration with organizational decision-making” mentioned previously. Additionally, Milstein and Cotton (2000) propose learning from experience through process uses and finding uses, which is also consistent with “learning benefits.”
ECB
As noted earlier, ECB is the process that results in EC. Scholars have proposed numerous definitions for the term, and a review of these definitions suggests common threads. ECB is regarded as a complex and contextual process with the aim of making proper evaluation and its use integrated into the life of organizations (Cousins et al., 2008; Labin et al., 2012; Preskill & Boyle, 2008; Stockdill et al., 2002). Due to its intentionality, ECB is distinct from evaluation, although ECB has similarities to some evaluation approaches, such as participatory, collaborative, or empowerment evaluation (Huffman et al., 2008).
To describe the ECB process, scholars emphasize the cycle of design, implementation, and evaluation (Labin et al., 2012; Preskill & Boyle, 2008). Several ECB frameworks focus on the antecedents, forces, or organizational factors that affect ECB activities and their effectiveness (Bourgeois et al., 2008; Cousins & Bourgeois, 2014; Labin, 2014). It is essential to assess the organizations both externally and internally (King, 2005; King & Volkov, 2005). For instance, Preskill and Torres (1999) proposed the readiness for organizational learning and evaluation instrument, which could reflect the organizational conditions to support ECB (Preskill & Boyle, 2008). The assessment of readiness provides guidelines for planning the ECB approaches aligning with the contexts (King, 2005; King & Volkov, 2005; Labin et al., 2012; Sobeck & Agius, 2007; Stevenson, Florin, Mills, & Andrade, 2002; Volkov & King, 2007).
The ECB process generally involves certain activities. According to a synthesis of empirical studies, Labin et al. (2012) identify that the most frequently adopted approaches are training, involvement in evaluation, and technical assistance/coaching/support. Training, categorized as a direct ECB activity (Cousins & Bourgeois, 2014), is usually offered by experts on a range of issues associated with evaluation skills and knowledge (Huffman et al., 2008). Considering the participants’ previous training and job requirements, Arnold (2006), for example, developed a training framework, including logic models for program planning and real projects on a one-to-one, small team, and large-scale multisite basis. Besides creating common ground, the training provided participants with techniques, such as how to ask evaluation questions and analyze data. Several authors suggest that training should be customized to the organizational context (Bourgeois et al., 2008; King & Volkov, 2005; Preskill & Boyle, 2008) and integrated with sharing, discussion, and debate to maximize the benefits of the activities (Cousins & Bourgeois, 2014).
Involving participants in the evaluation process, categorized as indirect ECB activities (Cousins & Bourgeois, 2014), is related to a high frequency of individual knowledge, skills, and valuing learning (Bourgeois et al., 2008; Cousins & Bourgeois, 2014; Labin et al., 2012) as well as system change (Huffman et al., 2008). To illustrate, King and Volkov (2005) used experiential learning as an approach to involve administrators and teachers in one American district by defining their evaluation objectives, questions, methods, and uses. In their approach to ECB, Huffman, Thomas, and Lawrenz (2008) created a team comprised of teachers and school administrators to collectively focus on evaluation, collect and analyze data, develop a plan for action, and monitor results. Through the immersion process, school personnel not only learned how to do evaluation but also increased motivation to develop better understanding and the skills of evaluation. Besides individual learning, the involvement further helped the school to develop a new process to collect, store, analyze, and report data (Huffman et al., 2008).
Additionally, technical assistance/coaching/support that Labin et al. (2012) categorize plays a critical role in developing and sustaining EC. One example is to offer coaching and on-site, web-based, and/or telephone technical assistance, including data collection, analysis, and reporting (Gilliam et al., 2003; Naccarella et al., 2007; Preskill & Boyle, 2008; Stevenson et al., 2002). Another example is to share access to evaluation resources, including evaluation tools, instruments, manuals, and “best” evaluation practices (Bourgeois et al., 2008; Carman & Fredericks, 2010; Gilliam et al., 2003; King & Volkov, 2005; Naccarella et al., 2007). Atkinson, Wilson, and Avula (2005) also advocate that databases provide organizations with objective data to analyze. Carman and Fredericks (2010) suggest that EC builders should identify the problems that organizations have faced before proposing their approaches. Other activities that may be essential to build organizational EC include consistent expectations, communication, providing incentives, access to evaluation experts and information, and, as noted earlier, fiscal resources (Arnold, 2006; King & Volkov, 2005; Martin & Carey, 2012).
Evaluation plays an important role in tracking how effective the ECB design and implementation are, as well as the extent to which its approaches have attained intended and unintended outcomes (Preskill & Boyle, 2008). The focus of such evaluation is to assure that EC is increasing and to provide feedback for improvement (Preskill, 2014; Preskill & Boyle, 2008; Volkov & King, 2007).
Several scholars emphasize the importance of leaders’ commitment to ECB. The leaders have power to provide resources for evaluation and promote the use of evaluation. They can also help shape policies for support ECB (Bourgeois et al., 2008; Cousins & Bourgeois, 2014; King & Volkov, 2005; Labin, 2014; Preskill, 2014). Moreover, the mutual commitment of different stakeholders is essential in the ECB process. For example, in the study of state-level school counseling programs, Martin and Carey (2012) noted that state leadership played a key role in communicating both the importance of evaluation and the expectations that evaluation is embedded into all of the programs for continuous improvement. Besides the state, the district, schools, and individuals all shared responsibility for ECB. Other examples include a school–university partnership (Haeffele, Hood, & Feldman, 2011) and collaboration between funders and programs (Labin et al., 2012), which acknowledges the collaborative process of ECB.
As discussed earlier, EC studies share the four common themes of evaluation structure, organizational resources, human resources, and evaluation culture. A synthesis of current research on ECB includes the cycle of design, implementation, and evaluation, highlighting various activities, including training, involvement in evaluation, and technical assistance/coaching/support. As helpful as they are, however, the current EC frameworks are grounded in Western settings; thus, they are unable to provide a sound basis for analyzing ECB in geographically diverse contexts. This study was positioned in Eastern Asian elementary and junior high schools, addressing the gap in the extant literature.
Research Context and Method
As is true in numerous East Asian countries, central and/or local governments commission most school evaluations in Taiwan (Peng & Lee, 2009). Evaluation of elementary and junior high schools in Taiwan was initiated in the 1970s and has expanded since the 1990s, with an increasing focus on accountability (Guo, 2000). Unlike the United States and some Western countries that focus on student achievement, evaluation in Taiwan requires a comprehensive analysis of school systems, including leadership, curricula, instruction, counseling, professional development, physical resources and environment, community relationships, and accountability performance.
Responding to calls for developmental and self-motivated evaluation, there is a need to build schools’ EC since Taiwanese schools have faced difficulties in conducting and using self-evaluation (Guo, 2007; Pan, 2005; Wang, 2003). School personnel are expected to evaluate themselves in the school evaluation; however, administrators and teachers not only have limited evaluation skills but also lack training opportunities and resources to conduct quality evaluation. In large part, teachers have not recognized the value of school self-evaluation, and their willingness to participate is typically low. Evaluation leading to school-based improvement, there, is limited (Pan, 2005; Sheu & Liu, 2008). Thus, it is safe to say that the current practice of self-evaluation in Taiwanese schools is dramatically in need of capacity building (Guo, 2007; Huang, Tsai, & Chang, 2007; Sheu & Liu, 2008).
This study used the Delphi technique to explore the essential dimensions of EC and feasible approaches to ECB in Taiwanese elementary and junior high schools. The Delphi is a technique designed to seek consensus of opinions among a panel of experts through sequential questionnaires targeting a certain issue. The respondents are encouraged to reassess and explain why they disagree with others, based on group opinions and individual ratings from the previous round of survey (Delbecq, Van de Ven, & Gustafson, 1975; Wiersma & Jurs, 2009). The reasons for choosing the Delphi were as follows. First, it is commonly used in a variety of settings, including education (Hung, Altschuld, & Lee, 2008), and is particularly appropriate for achieving consensus on the new inquiry of EC and ECB in Taiwanese schools through structural group communication (Wiersma & Jurs, 2009). Second, the Delphi assures anonymity for experts with different positions to freely express opinions and disagree with the others (Sori & Sprenkle, 2004). Third, it provides efficient group communication among experts, in this case across different areas of Taiwan.
Sample
The sample consisted of panelists who were knowledgeable about school evaluation in Taiwan. Local government officials, principals, and scholars are the major people involved in the design and implementation of Taiwanese elementary and junior high school evaluation. Thus, to ensure a diversity of opinions, panelists were sampled from these three groups: Taiwanese scholars in the field of school evaluation, elementary and junior high school principals, and local government officials.
Evaluation scholars
The researchers used a search engine of the National Central Library in Taiwan to identify the authors of Chinese books and academic journal articles on educational evaluation. The search used two criteria: (a) at least three evaluation publications (to exclude authors whose scholarly focus is not primarily in evaluation) and (b) experience in school evaluation practice. The initial search resulted in 16 scholars, 11 of whom were contacted successfully by individual phone calls or in person. Ten of them agreed to participate.
Principals
The researchers initially consulted with the participating scholars to identify school principals who were committed to self-evaluation. This process yielded the names of two principals whose evaluation practices were evident in the publications. Next, using a database of Excellent Principal Awards, the researchers identified an additional 42 elementary principals and 20 junior high school principals who had received awards from 2007 to 2011, assuming that award-winning principals would likely have sound practices for school improvement and accountability. To increase the principals’ commitment to the study, the first author not only used personal networks but asked a former education minister to recommend potential participants from the sampling list. Based on these 2 sources of the recommendations, individual phone calls were made to 12 of these principals across the different areas of Taiwan. Ten of them agreed to participate.
Officials
Officials were sampled based on three criteria: (a) holding a position in local government with a reputation for sound governance and systematic school evaluation as determined by the participating scholars, (b) having responsibilities directly related to supervising elementary and junior high schools, and (c) diversity across various cities and positions. To assure the commitment of the officials, who were extremely busy, the first author asked the former education minister to recommend potential participants based on the sampling criteria. Four officials, comprising one commissioner, one supervisor, and two section directors, were called, and they all agreed to participate in the study.
Besides oral invitations, follow-up letters were sent inviting people to join the panel, indicating the outline and objective of the study, the time commitment, guaranteed confidentiality, and a confirmation of acceptance. Overall, 24 agreed to participate, and the recruitment process ceased, as the number of panelists met the Delphi criteria suggested by Wiersma and Jurs (2009). Eventually, 23 completed the entire study. The panel ultimately included nine scholars, five elementary school principals, five junior high school principals, and four officials in bureaus of education. The nine participating scholars had at least four academic publications on educational evaluation, and the principals and officials each had more than 10 years of involvement in school evaluation.
Data Collection and Analysis
The Delphi technique included three rounds of mail surveys. Before the Delphi began, interviews were conducted to gain experts’ ideas of EC in terms of essential frameworks and their perception of feasible approaches to ECB in Taiwanese schools for developing the first survey. These interviews included a group interview with four scholars and two principals as well as individual interviews with three scholars and one principal. All of the interviewees were invited to participate in the Delphi, and ultimately, 8 of the 10 consented (see sample description above).
Since EC or ECB was a new concept to Taiwan, during the interviews, the first author provided both a summary of the literature review and representative frameworks of EC and ECB. This brief review was summarized from the literature of Boyle, Lemaire, and Rist (1999), Milstein and Cotton (2000), the GAO (2003), Bourgeois (2008), Preskill and Boyle (2008), Nielsen et al. (2011), and Volkov and King (2007). While leadership was included as an item of EC only in some research, it was usually perceived as significant in school-based improvement. Thus, the initial framework of EC was drafted as five dimensions with illustrated items, including leadership and the other four dimensions that were consistent with the literature (evaluation structure, organizational resources, human resources, and evaluation culture). The approaches to ECB were drafted as partnerships among governments, schools, and professional organizations.
After the ideas of EC or ECB were explored and categorized from the interview transcripts, the major themes that resulted were used to develop the first-round survey. First, the potential frameworks of EC were restructured into three dimensions and ordered as human resources, evaluation culture, and evaluation infrastructure. One consideration for revision was to make the framework concise; evaluation structure and organizational resources were combined. Another consideration was to better differentiate EC from ECB. To illustrate, leadership initially included leaders’ attitude and support; the former item was then combined with evaluation culture, and the latter was removed from EC since to some extent it was reflected in the school-driven approach to ECB. Second, professional development and access to experts were regarded more as ECB than EC; therefore, these were moved to the ECB approaches. Third, competencies for the different stages of evaluation were detailed in the dimension of human resources. Fourth, the approaches to ECB were restructured into two dimensions and labeled as government driven and school driven, which integrated the role of the professional organizations with the efforts of the governments and schools. Fifth, certification of self-evaluators, which was not evident in the ECB literature, was proposed and added into one approach under the government-driven category.
The first survey had four sections, including the background of the study, rating scales of the EC dimensions, rating scales of ECB approaches, and open-ended questions for comments. A Likert-type scale with a 5-point continuum was used to judge the fit of the items to Taiwanese elementary and junior high schools, ranging from 1 (very low) to 5 (very high). The second and third surveys were structurally the same as the first, but, following Delphi format, with information about group ratings and comments for each item from the previous round. Referring to the feedback, panelists were invited to reassess and explain when their opinions differed from others’. The three rounds of survey were conducted between April and June 2012.
The Delphi method requires both quantitative and qualitative feedback to revise items for the next round. Quantitative analysis was employed using SPSS 19.0 (IBM Corp., 2010) by measuring central tendency (median [Mdn]) and dispersion (interquartile range [IQR]) for each item. The literature uses different standards for consensus. In evaluation research, Hung, Altschuld, and Lee (2008) suggest the Mdn with the IQR as a good criterion for consensus. This study used two inclusion criteria: (a) an IQR below 1 (Wang, Wu, & Wu, 2008) and (b) consensus for 80% of the responses within the IQR around the Mdn (Hung et al., 2008). Regarding the qualitative data collected from the open-ended questions, a content analysis identified the major themes of the panel’s comments. The quantitative criteria accompanied by qualitative feedback were used to decide on which items to retain, remove, modify, or add. Finally, to further verify the consensus achieved from the responses of scholars, principals, and officials, Kruskal–Wallis H tests were conducted to evaluate the mean differences among these three groups because of the small number of panelists.
Results
The first-round Delphi survey was developed based on the results of initial interviews and revised in terms of quantitative and qualitative feedback. After three survey rounds, Delphi participants identified the dimensions of EC, in order, as evaluation culture, evaluation infrastructure, and human resources. The approaches to ECB were grouped into two categories: school driven and government driven. The final profile, including the first and second survey results, is discussed below (see Tables 1 –3).
Medians (Mdn) and Interquartile Ranges (IQR) From Rounds 1–3 for Items Dealing With the Concept of Evaluation Capacity.
Note. The items are based on the third survey. Some of the columns for first round have no figures because they were added in second round.
Medians (Mdn) and Interquartile Ranges (IQR) From Rounds 1–3 for Items Dealing With the Approaches to Evaluation Capacity Building.
Note. The items are based on Survey 3. ECB = evaluation capacity building.
Kruskal–Wallis Test Results for Third Round.
Regarding EC, the first survey contained three dimensions with 21 items to be rated. Eight items did not meet the quantitative criteria. After an analysis of the quantitative and qualitative data, 5 items that were regarded as either unimportant or overlapping were removed, including a self-evaluation schedule, experiences, advocacy, duties, and the capability to define the scope and criteria of self-evaluation. Three other items were modified to include software/databases as well as access to financial resources to conduct self-evaluation and deal with its needs identified by the self-evaluation. Additionally, considering the qualitative data, 2 similar items, which address self-evaluation support and commitment, were merged, and 3 items related to a regulation, learning about and from self-evaluation, and a system for data access and sharing were added. In particular, a big change in the first survey occurred in the dimension of human resources. Four of the items focusing on the school personnel’s competencies required in the evaluation process were grouped into 2 items related to the skills/knowledge of (1) designing evaluations and (2) implementing evaluations. Two more items related to a sufficient number of personnel for designing and implementing evaluations were added. The EC was reordered as evaluation culture, evaluation infrastructure, and human resources, which stressed the major importance of evaluation culture in EC.
The second EC survey consisted of three dimensions with 18 items. Three items that were added or modified in the first round did not meet the quantitative criteria. Based on the panelists’ feedback, “a self-evaluation regulation” was deleted, and the other two were rephrased to the organized/integrated human resources for designing as well as implementing self-evaluation. In addition, 2 other items associated with evaluation software and a system for data access, sharing, and analysis were merged, as the panelists suggested making the framework more concise.
In the third EC survey, 16 items grouped by the three dimensions all achieved the criteria of inclusion; all of the items had a Mdn of 4 or 5 and IQR equal or below 1. Finally, Kruskal–Wallis H tests showed that the three groups of panelists’ (scholars, principals, and officials) responses were not statistically different in the final round (p > .05). The opinions of three groups of stakeholders achieved consensus (see Table 3).
Regarding approaches to ECB, the first survey contained 19 items classified as government driven and school driven. Five items did not meet the quantitative criteria of inclusion. Considering the panelists’ comments, 1 item about self-evaluator credentials was removed. Three items, which were unclear to the panel in terms of advocacy for self-evaluation, implementation of an ECB plan, and conducting learning activities, were modified. Additionally, one of the items, related to planning according to assessment of the factors that affect EC, was merged with the item, focusing on planning based on its level of EC. ECB was reordered as school- and government-driven approaches, which stressed the schools as a principal part of ECB for self-evaluation.
In the second survey, there were 17 ECB approaches grouped into two dimensions. Four items did not achieve 80% of responses within the IQR. Referring to the panelists’ comments, 3 items related to resources, evaluation, and financial resources were rephrased to enhance their clarity. One item was modified to “encouraging the enthusiasm staff” instead of “skilled ones” to be involved in the self-evaluation team. Besides, since the panelists further suggested a concise framework of ECB, planning and its implementation of the ECB approaches were merged, and a system for consultation and sharing were also combined.
In the third survey of ECB, all 15 items achieved the quantitative criteria, and the panelists did not suggest any removal, modification, or addition. The items had Mdns of 4 or 5, and their IQR was 1 or below. Finally, Kruskal–Wallis H tests again showed that the three groups of panelists’ (scholars, principals, and officials) responses were not statistically different in the final round (p > .05), which verified that the three groups of stakeholders had reached consensus.
Discussion
The overall development of the EC dimensions and ECB approaches across different stages of research is shown in Tables 4 and 5. The final version of EC that was identified as suitable for Taiwanese schools had three dimensions: evaluation culture, evaluation infrastructure, and human resources. The Taiwanese approaches to ECB were grouped into two categories: school driven and government driven.
Development of the EC Dimensions Across Different Stages of Research.
Note. EC = evaluation capacity.
Development of the ECB Approaches Across Different Stages of Research.
Note. ECB = evaluation capacity building.
EC
The first dimension of EC was identified as evaluation culture, which depicts a commitment to and routine practices of evaluation to inform decision-making for improvement. In their conceptual framework depicting organization capacity to do and use evaluation, Cousins et al. (2008) frame EC as an integration of evaluation with organizational culture, which reflects the cultural aspect of EC. McCoy, Rose, and Connolly (2013, p. 16) proposed evaluation culture as a “commitment to the role of evaluation in organizational decision-making” and an “integral and valued part of the organization’s activities and purpose.” Several scholars identify individual evaluation attitudes and sustainable evaluation practices as objectives/outcomes of ECB, which are also related to EC’s cultural dimension for this study (Labin et al., 2012; Mackay, 2002; Preskill & Boyle, 2008; Suarez-Balcazar & Taylor-Ritzler, 2014; Urban, Burgermaster, Archibald, & Byrne, 2015).
According to the panelists, the Taiwanese schools were required to conduct self-evaluation before external evaluation, a state similar to the compliance stage that Gibbs, Napp, Jolly, Westover, and Uhl (2002) observed. The schools were also expected to conduct self-evaluation as part of their school-based management. In such a situation, evaluation culture is perceived as a major dimension that provides motivation for conducting and using sound self-evaluation. However, the Taiwanese framework of EC did not address the detailed evaluation uses that Cousins et al. (2008) and Bourgeois and Cousins (2013) did. One possible reason is that self-evaluation use does not occur frequently in Taiwanese schools; perhaps detailed descriptions of evaluation use may not matter in such situations.
The second dimension of EC was identified as evaluation infrastructure, which includes planning, an evaluation team, financial resources, data system and software, and a system of meta-evaluation that supports evaluation and its use. According to King (2005), creating an evaluation advisory committee in a school district is essential for helping design, monitor, and reflect on evaluation activities. In the political context in Taiwan, the Delphi results suggested that a school should have a team, that is, a task force to take charge of evaluation, instead of hiring full-time evaluators as Milstein and Cotton (2000) and Bourgeois and Cousins (2013) suggested.
Aligned with the extant literature, the evaluation infrastructure identified in this study covered hardware, software, and financial resources, but in a slightly different way. According to the panelists, on the one hand, since schools usually have limited technical and financial resources, the government should provide most of them, which are categorized as government-driven ECB approaches. On the other hand, schools should have the basic technical resources and access to financial resources as part of their EC. Moreover, the study showed that financial resources were essential not only for conducting self-evaluation but also for supporting the schools’ needs identified by the evaluation. The data also proposed that an idea of meta-evaluation should be built in the self-evaluation infrastructure, which was rarely mentioned in the existing EC literature.
The third dimension of EC was identified as human resources, which comprises an organized and a sufficient number of personnel who can design, implement, and use evaluation. In Taiwanese schools currently, where full-time external evaluators are rarely used, the panelists proposed that evaluation experience or qualifications were not essential to the schools, which is inconsistent with Nielsen et al. (2011). Moreover, the Delphi panel consensus indicated that school personnel all need to understand the purpose and concept of evaluation, no matter what teaching or administrative tasks they have. However, the specific skills and knowledge that are required depend on people’s roles on the evaluation team or other evaluation-related activities. The panelists proposed that the evaluation team should be effectively organized for sound information sharing to increase the depth of involvement. Moreover, schools should be able to integrate human resources from different sources (particularly for small-sized schools), such as collaborating with teachers or administrators in other schools. Similarly, Cousins and Bourgeois (2014) propose that partnering with external agencies was particularly beneficial for small-sized organization with limited resources. This study’s findings reflect the reality of the extensive use of administrators and teachers as functional evaluators and the limited human resources available in Taiwan. Using internal school personnel as evaluation facilitators may be a solution for organizations that are unable to hire external evaluators (Bourgeois et al., 2008).
ECB
The panelists’ consensus suggests that school staff play the most important role in building capacity for self-evaluation and that governments should provide guidance and support for the process. The school- and government-driven categories of the approaches appear to be feasible for Taiwanese schools. These two categories also make up the ECB cycle of design, implementation, and evaluation (King & Volkov, 2005; Labin et al., 2012). Moreover, the study supports the appropriateness of the frequently used approaches of training, involvement, and technical assistance/coaching/support identified in Western settings (Labin, 2014; Labin et al., 2012). Particularly, the panelists acknowledged the importance of encouraging participation in evaluation and forming collaborative communities to learn about evaluation, which supports the importance of evaluation involvement (Cousins & Bourgeois, 2014; Huffman et al., 2008; King, 2005; King & Volkov, 2005). Furthermore, communication is regarded as an essential approach to ECB in Taiwanese schools, which also appears in some of the Western literature (King, 2002; Martin & Carey, 2012).
It is worthwhile to mention that school leaders were perceived to play an essential role in school-driven ECB approaches. As Preskill (2014) and Bourgeois et al. (2015) indicate, leaders have the power for ensuring resources for evaluation and promoting its uses. This study refers not only to principals and other school administrators; in the Taiwanese context, teacher leaders are also included. Similarly, King and Volkov (2005) suggest that both supportive leadership and evaluation champions are essential for ECB. In their study, Lawrenz, Thomas, Huffman, and Clarkson (2008) conclude that a combination of administrator- and teacher-led approaches would be the best type of school leadership. Thus, the study not only supports the importance of supportive leadership but also stresses shared leadership in the ECB process (King & Volkov, 2005; Lawrenz, Thomas, Huffman, & Clarkson, 2008).
In terms of government-driven approaches, this study confirms the important role of the government in guiding and supporting ECB (Martin & Carey, 2012). The first approach is about authorizing the self-evaluation that is embedded in school operations. Bourgeois et al. (2015) propose that policy instruments could enhance the level of evaluation institutionalization, which would increase EC. According to the panelists’ feedback, the policy includes an expectation of self-evaluation; however, the government should allow schools to adjust the evaluation to their individual contexts, instead of following one common evaluation model. This finding is consistent with the idea of ECB, whose process is adjusted to organizational needs and strengths (King, 2005; King & Volkov, 2005; Labin et al., 2012; Preskill & Boyle, 2008; Stockdill et al., 2002). Indeed, policy makes up a major part of schools’ external environment, as King and Volkov (2005) propose. As Peng and Lee (2009) comment, East Asian schools particularly tend to operate based on the official evaluation regulation system. Thus, policy will likely play an essential role in building Taiwanese schools’ EC.
Besides self-evaluation policy, this study suggests that the government should provide support to schools, including training, tools/methods/information, technical assistance, and financial resources. These approaches are not only necessary for Taiwanese schools with limited human and physical resources but also support an expectation of ECB. The approaches identified here support the frequently adopted ones shown in the extant literature (Labin et al., 2012). Additionally, the study results included building a common database and using self-evaluation findings as government-driven ECB approaches. Atkinson et al. (2005) argue that a common database adds objective data for analyzing evaluation results. According to Martin and Carey (2012), governments can increase buy-in by using self-evaluation findings for assisting schools.
ECB in the Taiwanese context not only reflects the necessity of multiple approaches but also the importance of partnership between schools and the government (Cousins & Bourgeois, 2014; Labin et al., 2012; Martin & Carey, 2012). Most of the ECB approaches proposed in the Western studies apply equally well to Taiwanese school settings. However, this Delphi study effectively provides a comprehensive framework, including approaches used by the schools and governments, which are particularly important for countries with national school evaluation systems. Recently, collaboration and partnership in ECB have gained increasing attention (Labin et al., 2012). It is worthwhile to mention that, aligned with the Western literature, the approaches identified in the Taiwanese schools did not address some of the details of ECB, including what to consider in the ECB plan and implementation process, along with strategies and theories, for example, internships and appreciative inquiry (Labin, 2014; Labin et al., 2012; Preskill & Boyle, 2008; Volkov & King, 2007). The analysis suggests that numerous details may be essential, but such discussion goes beyond the scope of the current stage of ECB in Taiwanese schools.
A growing body of research has been conducted on ECB; however, little is known about the capacity and ways to build it in contexts outside Western settings. This study using the Delphi technique concluded that the major dimensions of EC in Taiwanese schools were evaluation culture, evaluation infrastructure, and human resources. The approaches to ECB were divided between school driven and government driven. To a high degree, this study supported the essential dimensions of EC and frequently used approaches in existing Western research.
The empirical evidence suggested four lessons. First, the study was conducted in the context of the self-evaluation of the Taiwanese schools, which research has shown are in need of EC and ECB. Concerning EC, this study characterized evaluation culture as a commitment to evaluation and the routine practice and use of evaluation. Aligned with other frameworks, this study identified evaluation culture as a major dimension of the Taiwanese schools’ EC, which is particularly important for schools moving to conduct evaluation in Taiwan, as they move from a compliance oriented to a self-motivated developmental approach.
Second, relying on data from interviews and the Delphi surveys, this study did not ultimately retain some components of EC proposed in Western frameworks, such as evaluators’ experience and qualifications, organizational linkages, and detailed classification of evaluation use. The results may be due to the differences existing in the organizational structures and the developmental stage of evaluation in Taiwanese schools. The factors affecting how people viewed EC may reflect the early state of self-evaluation in the Taiwanese schools. Given this, the panelists preferred a concise framework of EC.
Third, some scholars propose that EC includes professional development, access to the experts, models, methods, tools, and leadership support. This study argues that these are more a building process of ECB than EC itself. One reason is that EC is the outcome of ECB. The documented goals of ECB in this study were evaluation culture, evaluation infrastructure, and human resources. In Taiwanese schools, professional development or training, access to evaluation expertise, and leadership support are not outcomes but approaches to building EC. Moreover, since Taiwanese schools have insufficient resources for multiple models, methods, and tools, governments may play a meaningful role by providing them resources, which is regarded as the government-driven approach.
Finally, this study supported multiple approaches to ECB and stressed the importance of collaboration between schools and government sectors, similar to the findings of Cousins and Bourgeois (2014), Labin et al. (2012), and Martin and Carey (2012). In a government-controlled school system, the government’s policy and support are particularly essential for initiating and sustaining ECB efforts. Nevertheless, schools should be active agents in their self-ECB.
Given the context of this study, certain limitations should be considered when interpreting the findings. First, the study focused on Taiwanese elementary and junior high schools. Although this was a diverse context in which to explore EC and the findings may be applicable to other schools with a national evaluation system, they may not be applicable to other settings. Second, the Delphi panelists included scholars, principals, and officials who had school evaluation expertise; however, no information was collected from teachers and other stakeholders, such as parents and community members. Our major consideration was that these stakeholders’ low level of involvement in school evaluation might have limited their understanding of EC. Besides, the Delphi study led to a small sample that favored those with expertise in school evaluation; their viewpoints may not represent the full population of principals and officials in Taiwan. With little experience with EC or ECB, we would argue that not all principals and officials were suitable as panel candidates. Finally, our study used the Delphi method to gain the consensus of 23 Taiwanese experts on their ideal notion of EC and on feasible ECB activities. The Delphi survey was developed using Western frameworks as a starting point; however, this study did not attempt to compare the extant frameworks of EC or ECB with our findings.
Conclusion
This study marks an initial step in identifying the unique characteristics of EC and ECB in one non-Western educational context. The results of the study suggest at least three implications for practice. First, the essential dimensions of EC that were identified in this study may be helpful, as schools or other educational institutions reflect on the status of EC in their organization and the goals of ECB efforts for self-evaluation. Second, the multiple and collaborative approaches to ECB that this study identified may be useful as educational leaders work to customize self-evaluation in their context. Evaluators may also consider how to encourage different stakeholders to share a common commitment and responsibilities both to initiate and to sustain the ECB process. Finally, as EC builders design specific approaches, they should consider the framing aspects of the context in which they are working, including, for example, national evaluation policy, people’s levels of evaluation experience and professionalism, and organizational structures and resources that could potentially affect the nature of EC.
Continuing research could focus on verifying or modifying this non-Western framework. Regarding various approaches to ECB, future research may investigate their application, difficulties encountered, and actual effects occurring in schools both in Taiwan and elsewhere. These results could also potentially be transformed into an assessment tool for self-diagnosis and measurement for ECB efforts in schools in countries that have different forms of governmental evaluation policy. Additionally, future research may expand to include more panelists, particularly teachers and other evaluation stakeholders.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported under a Ministry of Science and Technology Grant in Taiwan (NSC 100-2410-H-003-115).
