Abstract

Evaluating Democracy Assistance draws on Krishna Kumar’s three decades of experience in the evaluation of development and democracy assistance programming, much of the time with the U.S. State Department and U.S. Agency for International Development. The goal of the book is to provide guidance on each step of the evaluation process, including defining democracy indicators, designing realistic monitoring systems, planning and managing evaluations, selecting methods of data collection, and communicating findings and recommendations. He also includes a detailed discussion of the uses and limitations of experimental and quasi-experimental designs in evaluating democracy assistance. He has achieved this goal and provides us with a very thorough and easily understood handbook on the evaluation of democracy programs, which is richly illustrated with case studies that are presented with sufficient detail to help readers understand the specific political, economic, and often emergency contexts within which each evaluation was designed and implemented. In fact, the very thorough way in which each stage of the evaluation process is discussed also makes this a very useful introductory text for evaluators working in other sectors.
Kumar begins by pointing out the special challenges of evaluating democracy programs. First, the implementation models are often not clearly defined and are rarely articulated through a theory of change (TOC). Second, attribution is a particular challenge. For example, democracy interventions are only one of the many factors that affect voting rates or the fairness of elections, and it is rarely possible to use an evaluation design that can completely isolate these different factors. Third, the political environment frequently makes it difficult to collect the required data. While none of these issues is unique, their combination creates special challenges and the collection of reliable and unbiased data is particularly challenging. One of the valuable contributions of the book is to present a wide range of creative approaches that have been used, with varying degrees of success, to strengthen evaluation designs.
The chapter on democracy indicators provides a very valuable reference source, covering in detail the large number of indices that are available at the macro-, meso-, and micro levels. The thorough review of validity and reliability assessment for each indicator makes this an excellent resource both for practitioners in the democracy sector and for academics teaching about the construction of indices for measuring social, economic, and political development. One important methodological point that is also applicable in many other sectors concerns the use of rating scales. Kumar points out that many typical 5-point scales classify a high proportion of countries (or other units) with perfect negative or positive scores and are thereby unable to distinguish between the often very different characteristics of countries in these extreme categories. For example, the Freedom in the World Index gave a perfect score of 1 for political rights to countries as diverse as Barbados, Estonia, Malta, Suriname, Uruguay, and the United States. He also points out that many indices are not sufficiently calibrated to capture small incremental changes over time. This latter problem is illustrated with a series of charts for Albania illustrating the practical problems in using indices to monitor changes over time.
The chapter on monitoring provides a thorough and useful introduction to the different approaches that can be used. One minor question concerns the number of indicators on which data should be collected. Kumar makes the valid point that many agencies collect far too many monitoring indicators, thus increasing the cost and complexity of the monitoring system and often reducing the quality of the data. However, it should be pointed out that the pressure to reduce the number of indicators often means that the information needs of some stakeholders are not addressed.
The chapter on planning and managing evaluations is well presented and will be found useful by people working in almost any sector.
The chapter on experimental and quasi-experimental designs provides a useful overview of the strengths and limitations of these approaches for the evaluation of sectors, such as democracy development, for which indicators are difficult to define and measure. The chapter provides examples of the few situations in which it has been possible to use experimental designs. One example is the use of radio for civic education (the Let’s Talk civic education program in Sudan). A second example is a youth civics education program in Cambodia, and a third is a governance program to change citizens’ perceptions of democracy in Colombia. While all of these studies provided useful information, there were serious methodological challenges in each case. Kumar identifies the theoretical and practical limitations of both experimental and quasi-experimental designs. The limitations are as follows: it is difficult to use the designs for programs that cover the whole country; many expected outcomes are difficult to quantify; there should be no significant changes in the programs during implementation (a major problem for most programs); it is difficult to construct a control or comparison group; problems are posed by the “contamination effect”; the control group should not receive similar treatments; the program should be effectively implemented (another major challenge); there may be overlap with complementary or competing programs; baseline data are frequently not collected or are not of acceptable quality; the desire to have measurable results may limit the utility of the findings to policy makers; and, finally, the costs and complexity of the evaluations will be high.
I would take issue with Kumar’s statement, “There is no doubt that the experimental design is the most rigorous design in the arsenal of evaluators” (p. 92). It would have been helpful to explain that while there is widespread recognition that experimental designs provide the best statistical approach to addressing problems of selection bias, the evaluation community remains divided on the real-world applicability and utility of randomized controlled trials (RCTs). So there is an important distinction to be made between statistical rigor and broader issues of overall methodological rigor associated with different evaluation designs. 1
Kumar concludes that while there is very limited potential for applying experimental and quasi-experimental designs to evaluate complex, countrywide, long-term programs, there are often opportunities to use these designs in a more limited way, for example, to measure the impacts of short-term interventions or specific components, to compare the effectiveness of different ways to deliver services, and to test hypotheses that underlie different components of a democracy program (although not usually the whole program). It might also have been useful to refer to research agencies such as the Poverty Action Lab and the International Initiative for Impact Evaluation (3ie), both of which have funded RCTs in the fields of governance and democracy development. 2 However, many of these evaluations were only funded during the past 2–3 years, so the number of completed evaluations is still quite limited.
One quantitative evaluation design that would have been useful to mention is regression discontinuity (RD). This design addresses some of the ethical and policy concerns arising from not being able to target programs to the priority population groups, 3 and there are numerous instances where national stakeholders who object to RCTs on ethical or political grounds have accepted RD designs. The design uses a cutoff point along a “ predictor variable” such as income, test scores, hours of vocational training that criminals have received in jail, and so on, to determine who will receive the experimental treatment (e.g., all units below the cutoff receive the treatment and all those above do not). An advantage of this design for evaluating democracy assistance (where precise quantitative indicators are often not available) is that the cutoff point can be defined in terms of a qualitative indicator, such as the average rating of a group of experts on an indicator of democratic development. The only requirement is that the ratings can be ranked on an ordinal scale. When strictly administered, RDs can provide unbiased estimates of project impact—but the challenge is to ensure strict adherence to the defined cutoff point for determining who does and does not receive the treatment.
Kumar argues that, in practice, most democracy evaluations rely mainly on nonexperimental designs. These include pretest–posttest comparisons without a comparison group as well as cross-sectional designs, but the most common and potentially the most powerful designs are case studies. He distinguishes between the use of case studies for exploratory versus explanatory purposes, and describes some of the main varieties of case study design. While the discussion presents a good overview of conventional case study designs, it would have been useful to discuss some of the newer case study approaches that focus on causal analysis. 4
Chapter 8 provides a useful overview of all of the major quantitative and qualitative data collection methods. One of the widely used methods that is described in some detail is focus groups. In my opinion, it would have been useful to point out that this is one of the most widely abused data collection methods in international development. While there are rigorous methods for the design and implementation of focus groups, 5 in international development focus groups are frequently (although certainly not always) used as a fast and economical way to collect data when operating on a tight budget and time line. In many cases, little attention is paid to how respondents are selected. Often the local coordinator of the international evaluator’s schedule will be asked to contact someone in the town or area to be visited, requesting that a group of mothers with young children, farmers using the new technology, and so on, be invited to participate in a focus group. In many cases, the evaluator will have almost no information as to who the participants are or how they were selected. It is also not uncommon for the analysis to be presented in the form of a few quotations that are cherry-picked to support a particular point of view. It should be stressed that this comment is not intended to downplay the great value of focus groups when they are properly designed, implemented, and interpreted.
Chapter 9 presents a useful review of the wide range of methods available for communicating the findings of the evaluation. Kumar stresses that different methods of writing reports and communicating findings are required for different audiences and that the evaluator must often go beyond conventional written reports and PowerPoint presentations, to make use of verbal communication and take advantage of modern synchronous methods that permit simultaneous communication with audiences across different locations.
One final comment: While the book does refer to logic models, it would have been useful to include a discussion of TOCs, as democracy assistance is an area where TOCs can help articulate the evaluation framework, particularly for the many cases in which nonexperimental designs are used.
In conclusion, this is a book that will be found useful both by evaluators working in democracy development and related fields such as humanitarian and emergency assistance and by readers who are looking for an easily understandable introduction to the practice of program evaluation.
