Abstract

Observations on the Nature of Gao's Research
Dr. Miller suggests that readily available data “eluded” us and questions our reliance on the words of state officials responsible for maintaining specific sources of data. GAO requires high standards of reliability for the data we use in our products, and Dr. Miller may have interpreted our decision not to use data that did not meet these standards as unnecessary obstacles to inference. Depending on the data, our reliability checks include investigating the source of the data, checking for out-of-range values and missing data, checking for consistency between related data fields, cross-checking data with alternative data sources when possible, and communicating with officials responsible for collecting and maintaining the data about their collection, editing and quality control procedures. We are sometimes forced to conclude that data are not sufficiently reliable to inform a particular policy decision, given the risks of providing inaccurate information to Congress, even when the same data might be sufficient for academic or theoretical investigations of the policy or program in question.
Dr. Miller characterizes GAO's analysis as “suboptimal” and suggests that our research does not “meet benchmark standards” in light of current academic research. The supplement to our report, an e-supplement released concurrently on the Internet (GAO 2010b), describes in detail our approach, the alternatives we considered, and the implications of our choices. We emphasize that we chose our methods to address the issues that Congress identified, while also taking into account our time constraints and the reliability of the available data. Congress mandated that we update our previous report on public financing in Arizona and Maine (GAO 2003), and this fact constrained our choice of research questions and methods. We did not design our research to answer all of the many policy questions related to public campaign financing.
We developed our research methods only after a careful review of existing literature on campaign finance, including many of the articles Dr. Miller references. We chose not to use the specific methods that other scholars have used due to data limitations and problems with causal inferences. Instead, we decided to use a multi-state “intent-to-treat” analysis that we felt was appropriate for evaluating policy implementation at the state level. Our report uses techniques, such as fixed-effects regression analysis, that are at the forefront of advanced academic analysis. Methods different from those that other researchers would use are not necessarily “supoptimal,” and the available data are not always sufficiently reliable to answer every question that researchers find interesting. This is why we carefully and explicitly note the limitations of our analysis throughout our report.
Spending
Dr. Miller suggests that Arizona spending data prior to 2000 “eluded” us and recommends using data from the Wisconsin Campaign Finance Project (WCFP) to draw conclusions about spending for the 1996 to 1998 period. According to the WCFP website, WCFP data for Arizona derived information on campaign spending from the Arizona Secretary of State, including paper records for 1996 and Arizona's online database for 1998. Rather than use a secondary data source, we chose to use the original data from the state agency responsible for administering elections in our report. As part of our reliability assessment of these data, we contacted the Office of the Secretary of State to ask about its data collection, entry, editing, and maintenance procedures as well as other issues that might affect the integrity of their data. According to the network administrator responsible for maintaining the data, electronic data from 1996 and 1998 were incomplete, had been manually entered into an older computer system, and could not be transferred into the new system. Although the WCFP data for Arizona in 1996 and 1998 may be appropriate for some research questions, we did not assess whether the WCFP archive is more complete than the source data upon which it relies. Instead, as noted in the report, we limited our analysis to those years for which we could determine that the Arizona campaign spending data were reliable, which were 2000 and beyond. We regret that our report did not describe in detail the reliability problems of publicly available data sources we explored but did not use. Dr. Miller's analysis of WCFP data, which finds sharp increases in spending in Arizona but not Maine, is consistent with what might be found with incomplete data for Arizona in the years prior to 2000.
Dr. Miller also disagrees with our decision to include uncontested races in our analysis of campaign finance spending. We chose to analyze the potential impact of campaign finance laws on the spending of all candidates, not just those in contested races, because we agree with Dr. Miller's later statement that public financing of campaigns “appears to alter the behavior of all candidates, not just those that accept subsidies.” Additionally, from the perspective of state appropriators, the costs of publicly financed election campaigns will include all candidates, regardless of whether the election is contested. The research that Dr. Miller presents parallels several of the analyses already presented in Figures 12 through 18 of our report and Tables 39 and 40 of our e-supplement, including breakdowns of candidate spending in races with incumbents, challengers running against incumbents, and challengers running for open seats.
With respect to our analysis of independent expenditures in Arizona, Dr. Miller questions our decision to accept the word of state officials. He asserts that our “anecdotal” evidence is insufficient, despite the fact that it is based on information from the two state entities responsible for maintaining data on independent expenditures, the Arizona Office of the Secretary of State, and the Arizona Citizens Clean Election Commission (CCEC).
In the introduction to our report and our appendices, we describe how we were unable to draw conclusions about changes in independent expenditures in Arizona because of limitations with available electronic data from official sources. According to the official responsible for maintaining data in Arizona, the office unsuccessfully attempted to retain candidate names during a computer upgrade and therefore lost information on intended beneficiaries of independent expenditures from 2000 to 2006. Additionally, the system prior to 2008 did not distinguish independent expenditures made on behalf of specific candidates or ballot initiatives. Officials from the Arizona CCEC suggested that prior to 2008, candidates did not inform the CCEC of independent expenditures by opponents in a systematic way. For example, a participating candidate might bring a mailer or a recording of a robo-call as evidence to trigger matching funds, and the CCEC would use this information to try to find the campaign finance report and candidate name associated with the expenditure through the Secretary of State. Additionally, officials at the CCEC told us that prior to 2008 they could not track independent expenditures used in allocating matching funds to participating candidates.
Dr. Miller's statement that a “very high percentage of benefiting candidates were identified online between 2000 and 2006” is compatible with our finding that Arizona did not systematically collect data for all candidates prior to 2008. Although analyzing incomplete data on independent expenditures may be useful as a starting point for academic research, we were unwilling to risk making erroneous policy recommendations when the people who collected the data cautioned us about gaps and inconsistencies in the computerized data available for analysis.
Candidate participation
When discussing our analysis of candidate participation in public financing, Dr. Miller suggests that the fact that participating candidates in Maine and non-participating candidates in Arizona had more success in their electoral bids “says little about the relationship between public funding and electoral outcomes,” because it does not account for the “potential role of obvious factors such as partisan affiliation or incumbency status of candidates.” We agree that a multivariate analysis capable of ruling out confounding factors would provide more persuasive evidence on participation in the program and electoral success. Our analyses were designed to investigate changes over time in participation, as well as changes in the competitiveness of elections and the advantage of incumbency status. They were not intended to conclusively estimate the impact of public financing on the probability that individual candidates with specific characteristics won or lost. For that reason, the text of the report does not claim that participation causes success, and the tables on participation by success includes a note that the data do not “provide evidence that program participation influences an individual candidate's likelihood of winning” (see, for example, GAO-10-390, p. 23). We agree that such an analysis would require reliable longitudinal demographic information on state legislative candidates, along with additional measures of race and district characteristics, but we do not believe that readily available sources of longitudinal data are sufficiently reliable for this purpose.
Dr. Miller suggests that to gauge the perceptions of all candidates, one must survey all candidates. We agree and laud the research he presents, and note that in our previous report on public financing of campaigns, we did survey all candidates in Maine and Arizona. However, in this report we did not intend to measure the perceptions of all candidates or generalize beyond the small number of candidates we interviewed. Rather, we sought to gather a range of example opinions about campaign financing from participating and non-participating candidates, winning and losing candidates, and candidates from different parties. We are sensitive to Dr. Miller's criticism that readers lacking statistical training could mistakenly assume that the results of our candidate interviews are generalizable. For that reason, the results we highlight are accompanied by the number of candidates making such a statement; we do not calculate percentages or imply that we spoke to more than a handful of candidates in each state.
Interest groups
Dr. Miller questions the validity of the method we used to assess public perceptions of interest group influence, because we did not test respondents' knowledge about the campaign finance laws in each state. He argues that our method expands our sample “beyond realistic dimensions” and that few citizens can accurately gauge the role of interest groups in their states. We agree that few citizens can accurately gauge the role of interest groups in their states; however, our survey did not ask them to offer such an assessment, largely because we did not seek to estimate the actual influence of interest groups in each state. Rather, we screened survey respondents to limit our sample to those individuals who said they were familiar with the financing law in each state, and allowed them to report on perceptions including whether they felt the law affected their personal confidence in government. The accuracy of citizen perceptions about the laws does not necessarily diminish the value of measuring them.
Competition
We agree that it would be valuable to further evaluate the effects of public funding on the composition of the candidate pool. However, some of the research Dr. Miller cites uses data that do not meet GAO's standards of reliability to support findings and policy recommendations to Congress. To collect data on candidate demographics, GAO would need to consult reliable, nonpartisan, and official sources, such as state governments, or undertake an independent survey of candidates. Information from newspapers, websites, political parties, and other secondary sources would require extensive corroboration and testing before we could determine whether they were sufficiently reliable for this purpose. We did not assess the reliability of all available data on candidate demographics, or seek to collect original data on these topics, because an analysis of candidate demographics was beyond the scope of our review.
We disagree with Dr. Miller that the “proper” definition of the treatment analyzed should have been whether a challenger accepted public funding. The program evaluation and causal inference literatures distinguish between policy treatments that are assigned and treatments that are taken by program participants (candidates in this case). Evaluations of how an available but optional program achieves a desired outcome among eligible participants are generally known as “intention-to-treat (ITT)” analyses, while evaluations of the effect of treatments among actual program participants are known as “treatment-on-the-treated (TOT)” analyses. Shadish, Cook, and Campbell (2002, 319) define related concepts of “effectiveness” and “efficacy” analyses. In effectiveness analyses, researchers evaluate the impact of a program as implemented in a particular time and place among eligible participants. Effectiveness analyses estimate the impact of a program as implemented, recognizing that “haphazard standardization and implementation are so characteristic of many social interventions that stringent standardization would not well represent treatment in practice” (Shadish, Cook, and Campbell, 319). By contrast, in efficacy analyses, researchers aim to validate a set of practices in ideal circumstances while ignoring implementation problems such as low participation rates that might affect the impact of the program in practice among program participants.
For campaign finance reform in Maine and Arizona, we can distinguish between the effect of making public funding available and the effect of one or more candidates' accepting public funding in a particular election. The former is an effectiveness or ITT analysis, while the latter is an efficacy or TOT analysis. These are separate treatments of interest, with unique theoretical effects on competition. Neither type of analysis is inherently “suboptimal” or incorrect, as Dr. Miller argues. Each analysis simply answers a different research question.
Policymakers, such as our congressional clients, are often more interested in an ITT analysis of policy changes that have already occurred. Once a set of practices has been implemented, policymakers typically want to know whether the particular implementation of the program achieved its intended outcomes. To this end, Shadish, Cook, and Campbell (2002, 320) note that “[t]he inference yielded by the intent-to-treat analysis is often of great policy interest because if a treatment is implemented widely as a matter of policy…imperfect treatment implementation will occur. So the intent-to-treat analysis gives an idea of the likely effects of the treatment-as-implemented in policy.” Implementation problems are particularly important for campaign finance reform, because policymakers are better able to control whether public funding is available than whether candidates accept it. To assess effectiveness, therefore, it is appropriate to conduct an ITT analysis that defines the policy treatment in Maine and Arizona as elections for which public funding was available, as we did in our report. This approach simply recognizes that one of the characteristics of a policy or program that must be assessed in evaluating its effectiveness is how broadly, or successfully, it was implemented.
Dr. Miller's definition of the treatment, along with his recommended analysis of variation over time in the use of public funding within Maine and Arizona, would ignore failures of program implementation. To see this, imagine that there was only one district out of several hundred districts in a state in which a challenger accepted public funding. In this case, the statistical method that Dr. Miller recommends would compare the difference between the change in competitiveness in this one district versus the average change in all other districts.1 Assume that the lower bound of the impact estimate is large and positive and that self-selection into the treatment is ignorable. Should we conclude that public funding “worked”? From an effectiveness perspective, the answer likely would be “no.” Although the one election with a publicly funded challenger was more competitive, the reform failed to persuade many candidates to participate and thus failed to deliver the program as intended. From an efficacy perspective, however, we might conclude that the program itself was effective in the one election where a candidate took the treatment. Each analysis answers a different research question, and neither approach is methodologically incorrect. We welcome the type of efficacy analysis that Dr. Miller proposes, since “it would make little sense to pursue a treatment that does not perform satisfactorily under optimal conditions,” such as full candidate participation (Shadish, Cook, and Campbell 2002, 319). An analysis that considers a candidate's decision to participate, in addition to the availability of public funding, would help build the evidence base on the particular institutions used in Maine and Arizona. Within-state research designs, similar to Malhotra (2008), are valuable for their ability to hold constant the many state-level variables that affect election outcomes. We simply disagree with Dr. Miller that our ITT analysis of the policy change as implemented was “suboptimal” in any absolute sense.
The definition of the policy treatment matters for the choice of research design. A district fixed-effects analysis within Maine or Arizona, as Dr. Miller recommends, would use variation in the acceptance of public funding over time to estimate the impact of the program. Since the availability of public funding varies over time but not across districts within Maine and Arizona, respectively, a district fixed effects analysis of effectiveness limited to those two states would not estimate a difference-in-difference but, instead, would estimate the change in mean competitiveness before and after the policy change. The analysis would become an interrupted time-series and lose the critical control for all unobserved, time-constant heterogeneity that fixed-effects analysis provides, and thus would threaten the internal validity of the analysis. To restore this control, we compared the change in competitiveness in Maine and Arizona to the change in several comparison states. The comparison states provided variation in the use of public funding over time and across governments, which is required for a fixed-effects analysis to estimate a difference-in-difference and control for all time-invariant heterogeneity. The intervention of interest was a state-level policy change that requires state-level variation and, therefore, the use of comparison states.
For a state-level research design, the most critical covariates are those that influence the timing of the adoption of public funding. This reduces to modeling the decisions of referendum voters to reform their campaign finance systems at a particular time. Table 20 of our report showed that the comparison states were similar to Maine and Arizona (the treatment states in our work) on several state-level variables that could plausibly affect the timing of adoption. Figures 8 through 11 showed that the trends in various measures of competition were generally similar before 2000 in the treatment and comparison states. In effect, we used a matched pair design and provided evidence of covariate balance, as Dr. Miller recommends, but we controlled for variables that could have influenced assignment to the availability of public funding at the state level, not the acceptance of public funding at the district level. Our report acknowledges that, like any quasi-experimental design, our analysis may be sensitive to the choice of comparison states, as Dr. Miller argues. To lessen this source of bias, we used several comparison states to account for uncontrolled variation over time in any one pair, even though the additional data required substantially more effort to compile and analyze.
The “observable characteristics of both the politician and the district” that Dr. Miller mentions are arguably less relevant controls for a state-level analysis. It is hard to imagine how the changing political experience, demographics, or partisanship of particular candidates or districts would have affected the aggregate voting decisions of the referendum voters and legislators. In fact, the state-mandated nature of the policy change is an important source of control. Since candidates or voters in a particular district could not choose whether public funding was available, district-level variables were unlikely to be correlated with treatment assignment. Aggregate trends in the state (and thus across all districts over time) might have plausibly affected the timing of the policy change, but our analysis controls for these trends with fixed effects for each election cycle (see GAO-10-391SP, p. 51). In any case, reliable longitudinal data on partisanship, demographics, and campaign experience for each year and district were neither available nor necessary for our use.
Campaign strategy and candidate behavior
The overall potential impact of public financing of congressional campaigns on candidate strategy is important but largely outside the scope of our study. Our interviews with limited numbers of candidates and interest groups in Maine and Arizona provided examples of changes the interviewees attributed to public financing, but we cannot generalize these examples to the population of candidates in those states. We are pleased to hear that our findings are consistent with additional ongoing survey research on candidate strategy.
Mass political behavior
Our 2003 and 2010 reports identified five goals of public financing programs, including increasing voter participation, based on our review of the history of Maine and Arizona public financing programs and discussions with key officials in each state. We were not asked to justify the goals of public financing in Maine and Arizona, such as increased voter turnout. We agree with Dr. Miller that an examination of statewide turnout statistics is unlikely to capture the casual relationships between public funding and voting turnout. Accordingly, our report states “changes in voter turnout cannot be attributed directly to public funding as there are a number of factors that affect voter turnout.” If consistent and reliable data exist, the analysis of ballot roll-off that Dr. Miller suggests might prove interesting but still could not be attributed directly to public financing.
Discussion
In summary, we agree with Dr. Miller that while our report is a wide-ranging evaluation of public financing in state legislative elections in Maine and Arizona, it is not (and was not intended to be) the definitive study of public financing programs. We disagree, however, with his assessment that these limitations are due to existing data sources eluding us or to suboptimal methodological choices. Instead, our report adheres to GAO's strict standards of data reliability, addresses limitations in current academic practice, and uses methods that are appropriate to respond to our congressional mandate. In his article, Dr. Miller highlights many additional and interesting avenues of research that would help inform the debate on public financing and political behavior as he defines them. As he notes, many of these avenues will require labor-intensive efforts to collect updated or enhanced data, such as information from local newspapers or party websites. We welcome scholarship that expands the data that are available for analysis, particularly the creation of reliable, longitudinal databases on candidates and state-level elections. Because GAO's role was to address specific issues of interest to Congress, we did not undertake such an endeavor. We believe that our analysis of competition is designed to facilitate causal inference, and that we openly and fairly disclose the limitations of our analysis. We appreciate that Dr. Miller has taken the time to review our study in such depth and welcome further efforts by academic and policy researchers to evaluate public campaign funding programs.
Footnotes
1
We recognize that Dr. Miller likely would not apply this method with only one candidate participating in the program, but the example helps illustrate the differences between his proposed method and the method we used in the report.
