Abstract
Background:
Maintaining the independence of contract government program evaluation presents significant contracting challenges. The ideal outcome for an agency is often both the impression of an independent evaluation and a glowing report. In this, independent evaluation is like financial statement audits: firm management wants both a public accounting firm to attest to the fairness of its financial accounts and to be allowed to account for transactions as it sees fit. In both cases, the evaluation or audit is being conducted on behalf of outsiders–the public or shareholders–but is overseen by a party with significant interests at stake in the outcome–the agency being evaluated or executive management of the firm.
Method:
We review the contracting strategies developed to maintain independence in auditing. We examine evidence on the effectiveness of professionalism, reputation, liability and owner oversight in constraining behavior in auditing. We then establish parallels with contracting for evaluations and apply these insights to changes that might maintain and improve evaluator independence.
Conclusions and Recommendations:
By analogy with the Sarbanes Oxley Act of 2002 reforms in auditing, we recommend exploring using a reformulated Technical Working Group to encourage more prompt release of more evaluation results and to help insulate evaluators from inappropriate pressure to change their results or analysis approach.
Introduction
In both the private sector and the public sector, good decision making and accountability require good information about outcomes and their causes. In the private sector, a crucial information source is the annual financial statement; in the public sector, a similarly crucial source is the impact evaluation. In the case of the financial statement, the validity and fairness of management’s statements are primarily ensured by the financial statement audit. In the case of a public sector impact evaluation, an expert evaluator carries out the entire assessment.
There are many parallels between the two activities, both in their intent (e.g., information production) and in the structure of the relationships and interests involved. In particular, auditing and evaluation share the problem of a disjunction between who oversees the contract and who uses the results of the audit or evaluation. These parallels have also been noted by Klerman (2010) and Picciotto (2008).
Traditionally, corporate management has overseen the audit, though the results are used by the broader financial community—current shareholders, potential future shareholders, creditors, competitors, and government regulators. 1 Such management oversight of the audit is potentially problematic: Corporate management wants both the required, certified set of accounts and the ability to characterize results favorably. Despite their extensiveness, accounting rules give the auditors and management considerable discretion as to how to report financial results (O’Reilly et al. 1998). Experience and analyses of the incentive structure suggests that auditors may sometimes allow that discretion to serve the narrow needs of management over the needs of the intended audience of the financial statements (Fudenberg and Tirole 1995; Healy and Palepu 2001; Brown 2007, 2011a). Partially in response to these perceptions that auditors were not sufficiently constraining management’s self-serving accounting choices, the 2002 Sarbanes–Oxley Act (SOX) required moving oversight of the annual financial statement audit from management to an independent subcommittee of the Board of Directors.
Similar issues exist in evaluation. Federal impact evaluations are traditionally overseen by a government agency which is closely related to the agency whose program is being evaluated, while the results will be used by the broader policy community to set funding levels and to specify program strategies. Learning what truly works is a major goal of government-sponsored impact evaluations. However, impact evaluations also have implications for the job security and professional status of overseeing staff, and for the political goals of the administration that they serve. Such concerns about the findings of an evaluation are especially salient when the evaluation is required by some external organization (e.g., a Congressionally mandated study) or will be used by some external agency (e.g., the Office of Management and Budget (OMB), Congressional Budget Office (CBO), General Accountability Office (GAO), and Congressional committees). For some parties, the ideal outcome will be both an independent evaluation and a glowing report—whether or not the glowing report is justified—and this conflict of interest can cause the entity overseeing the evaluation contract to pressure the evaluators to produce results that advance that entity’s interests even when, in the professional opinion of the evaluator, the totality of the findings do not support such a conclusion.
As in the case of financial reporting, the agency’s and the evaluator’s decision making is complicated by the reality that evaluation results are rarely unambiguous. What to report is—as it is in accounting—subject to considerable discretion, so agency officials may perceive their own decisions as disinterested when, in fact, they are affected by self-interest. Analogously, evaluators may perceive that their own decisions are disinterested when, in fact, they have been affected by pressure from the client—explicit, implicit, or simply as projected from the evaluator onto the client—toward the desired finding. The limited available evidence suggests that, as in auditing, this disjunction between the interests of those providing formal oversight and those of the broader audience can influence what is reported publicly, and when, in a way that sometimes causes problems.
These conflicts of interest between who provides oversight and the broader audience for the product have been far more intensively studied in accounting than in evaluation. Similar questions have more recently been raised in other fields as well. 2 Managing this particular type of conflict of interest, however, has been a central concern of accountants and policy makers for most of the last century and was the impetus for major legislation, from the Securities Acts of 1933 and 1934 through SOX in 2002. 3 This article reviews the auditing literature and extracts insights for how to structure evaluations to minimize problems resulting from this disjunction. Our main recommendation follows directly from the audit experience: increase the alignment between those providing oversight and the broader audience for the evaluation by changing who provides at least some important aspects of the contract oversight.
The article makes its argument in six sections. The next section presents a review of the issues in contracting for independent impact evaluation (primarily of government programs). The third section provides a brief overview of the financial statement audit process and discusses its parallel contracting challenges. The fourth section presents a review of the avenues by which auditors and auditing regulation have sought to improve independence, the core of the article.
The last two sections return to the issues related to contracting for impact evaluations, which are the motivating issues for the readers of Evaluation Review. In the fifth section, we make the analogy from auditing to evaluation and recommend several changes to current practice. The article concludes with a brief review of the argument and the relation of the possible benefits to the likely costs of reform.
Finally, some brief notes about language. With respect to evaluation, we use the term agency to refer to the government agency that oversees the evaluation. That agency is often the “evaluation shop” of the agency (or cabinet department) that runs the program being evaluated. We use the term evaluator to refer to the corporate entity that holds the contract from the agency to perform the evaluation.
With respect to auditing, we use the term audit to refer to the mandatory external review of the annual statements of the financial condition of a publicly traded corporation. We use the term auditors to describe the people and firms who provide audits, and “accountants” to describe the employees of the audited company who help corporate executives prepare the financial statements. Both sets of people generally have earned the professional designation of Certified Public Accountant (CPA). 4 While audits are also required for government agencies and many nonprofits, we focus solely on corporate audits. Since the regulatory requirements and nature of the failures are somewhat different in each domain, covering the nuances across the fields is beyond the scope of the article. We focus on the corporate audit because it was the subject of SOX, whose reforms provide a starting point for our consideration of reforms in evaluation. However, the issues surrounding the challenges of maintaining auditor independence remain similar, regardless of the sector of the organization hiring the auditor.
We use the term management to refer to the people who run a for-profit company on a day-to-day basis. Management would usually include the Chief Executive Officer, the Chief Financial Officer, and other senior officers. We use the terms owner or ownership to refer to those who own stock in the company, most of whom have little direct contact with the company and can only exert control via proxy votes at board meetings or by buying and selling shares in the company.
Challenges to Contracting for Independent Evaluation
The OMB has emphasized the importance of “rigorous, independent program evaluation” in helping “the Administration to determine how to spend taxpayer dollars efficiently” (Orszag 2009). Similarly, when funding pilot programs, Congress often mandates an independent and impartial evaluation of the impact of the program.
The twin considerations of lack of in-house expertise and concerns about conflict of interest lead to the contracting out of most major impact evaluations. There are two challenges in such contracts. First, there are the standard contracting issues. A bidding process must be established leading to the selection of the “best” proposal. Then, that contract needs to be overseen to assure delivery of the rigorous evaluation specified in the best proposal.
Second, one of the motivations for contracting out major impact evaluations was to address the potential conflict of interest in an agency evaluating itself (Metcalf 2008). However, external contracting on its own may not be sufficient to address the problem (Reingold 2008). In as much as the government employees overseeing the contract want to influence the findings in a particular direction, the tools needed to address normal contracting challenges provide multiple ways for the agency to induce the contractor to cooperate (Klerman 2010):
At the contract stage, the agency can give preference to contractors with a reputation for client service or cooperation—rather than independence.
During the term of the contract, standard contractual language prohibits the evaluator from releasing any results without the agency’s explicit permission.
The agency can refuse to accept a final report if it is unsatisfied with the analysis, what results are reported, or how those results are characterized.
Similarly, the agency can accept the final report but never publicly release it.
At the end of the contract, the agency can deem an evaluator uncooperative—for example, for having pushed back as to how to do the evaluation, what results to report, and/or how to characterize those results—and therefore deny the contractor a performance bonus.
After the contract, the agency can penalize uncooperative evaluators with negative past performance reports, thereby harming the evaluator’s ability to get future work.
Even though most evaluators are professionals and committed to scientific integrity, evaluation is a business. Future business for the firm and future employment for individual staff members provide a strong incentive to avoid triggering agency retaliation by withholding the payment or bonus for the current contract or providing a less than glowing performance review.
If evaluation were an exact, uniform service, designing remedies would be easier, since it would be straightforward to arbitrate disputes about whether the agency’s contracting leverage was used appropriately or not. However, two factors make designing remedies difficult. First, evaluation is an art. Reasonable and disinterested evaluators will differ about how to do the analysis, what results to report, and how to characterize those results. This ambiguity makes it harder to tell when an agency request is reasonable and when it is advancing some external agenda. It also makes it impossible to have a set of unambiguous and impartial standards that would dictate all choices made in the course of the evaluation.
Second, evaluators’ work quality is sometimes substandard. Sometimes they do a lousy job; sometimes they also attempt to advance interests—business, professional, political—beyond what is appropriate for their role or required by the results alone. It follows that conventional contractual oversight is needed; simply leaving the evaluator to do what it deems best is not a solution.
The prevalence and nature of any manipulation or suppression of evaluations are inherently difficult to measure. Contractual language and the desire for future business make contractors extremely reluctant to talk about attempts at such manipulation (and the success of such attempts). Though the extent of the problem is not fully known, from the instances of manipulation that have been made public, the existence of a problem is clear.
Metcalf (2008) describes in detail two publicly known examples of manipulation. In the first example, made public via a Freedom of Information Act (FOIA) request, U.S. Department of Agriculture (USDA) staff requested that the contractor delete one set of analyses—the analyses that the contractor believed were more appropriate, but that reflected poorly on the program under evaluation (the National School Lunch Program). The contractor refused. In response and without the evaluator’s permission, USDA staff simply deleted the analysis in question and released the report under the authorship of the contractor staff. Then, in reaction to contractor staff noting this change in a meeting with senior USDA officials, USDA submitted past performance reports for the contractor such that the contractor would be less likely to get future evaluation business.
In the second Metcalf (2008) example, the contractor delivered unfavorable results on the long-term impacts of a high-profile Department of Labor (DOL) program (Job Corps). DOL suppressed the results for more than 2 years. Suppressing results is a common form of manipulation. In this case, the results were only publicly released after DOL staff mistakenly posted them on a DOL website. Metcalf (2008) also discusses several other instances of attempted or successful manipulation or suppression of results.
The GAO surveyed the time to public release from final report for all evaluations submitted to the DOL’s Employment and Training Administration in 2008 (GAO 2010, 2011). They found that 20 of the 34 report releases were delayed between 2 and 5 years. Given that a common response in the field to unfavorable information seems to be delay in release or suppression of results (GAO 2011), the extent and length of these delays are troubling. Frumkin and Reingold (2004) and Reingold (2008) highlight the role political pressure and ideology can play in confusing the scientific inquiry process.
Closely related problems of government agencies burying scientific studies that run contrary to the agencies’ interests have been documented more widely, often by the press. For example, the Agency for Toxic Substances and Disease Registry (ATSDR) attempted to block the release of a report on environmental hazards in the Great Lakes states until it was released via a FOIA request (Kaplan 2008b). Federal Emergency Management Agency (FEMA) suppressed a report on the risks of formaldehyde exposure from post-Katrina trailers until whistleblowers, the Sierra Club’s independent testing, and Congressional pressure brought the story to light (Kaplan 2008a). The formaldehyde event triggered a Congressional investigation into past problems with ATSDR, which consulted with FEMA on the issue; that investigation documented many instances of ATSDR suppressing or ignoring environmental health risk evidence (Majority Staff of the Subcommittee on Investigations and Oversight 2009). In the medical field, the Food and Drug Administration initially suppressed evidence that Vioxx significantly increased risk of heart attacks (Graham 2004).
Even though there are more documented cases in health and safety research, it is unlikely that government agencies only exert pressure when they oversee the hard sciences, but not the social sciences. If anything, one might expect more pressure in the social sciences. Furthermore, as such incidents are regular subjects of private (off the record) discussion within evaluation firms and in the evaluation community more broadly, it seems clear that such incidents are far from uncommon.
The balance of this article argues that fundamentally this problem with contracting for independent evaluation arises from the disjunction between the true consumers of the evaluations—the Executive, Congress, the public—and the bureaucracy charged with managing the contract for the evaluation—often an agency closely aligned with the bureaucracy running the program being evaluated. In this, even though they are discussed using different terms, contracting for independent evaluation is like contracting for a financial statement audit.
Challenges to Contracting for Independent Audits
Companies, like government agencies, need to assess the effectiveness of their enterprise in reaching its goal—in this case, maximizing shareholder value—and to share that information with their owners, the shareholders. The primary vehicle for communicating with shareholders is the annual financial statement (along with the quarterly reports), which is mandated by the 1933 and 1934 Securities Acts for all publicly traded companies. As with an impact evaluation, the information conveyed in the financial statement is used both to understand the effectiveness of the company (i.e., its profitability) and to hold the management of the company accountable for their job performance. The accountability mechanisms tied to the financial statements are both direct—bonus payments and retention decisions are often tied to the content of the statements—and indirect—executive compensation is often largely in the form of company stock and stock options and therefore depends on the stock price, which reacts to the information in financial statements.
Management is responsible for preparing the financial statements, but because so much of their personal wealth and career prospects rests on the content of these statements, there is a huge temptation to misreport, especially if the company is doing poorly. Recognizing this, securities law requires that an external, independent auditor audit the financial statements. Auditors are required to conduct their audits according to Generally Accepted Auditing Standards (GAAS), 5 and, when warranted, to certify that the financial statements present the company’s financial position “fairly in conformity with Generally Accepted Accounting Principals” or GAAP (e.g., O’Reilly et al. 1998; Zeff 2007).
As with evaluation, however, simply moving the audit function outside of the company does not, on its own, guarantee that the auditor will conduct a tough and independent audit. Particularly since historically management was largely responsible for selecting the auditor, management was thought to use its contract oversight role as leverage to pressure the auditor to accede to their accounting choices:
At the selection stage, management can choose an auditor who is known to work with management to achieve their accounting goals.
Management can fire an auditor from the audit, and in some cases the auditor will not recover the fee, even if much of the work has already been done.
Management can fire the audit firm from its nonaudit service engagements (tax or management consulting).
Management can offer additional nonaudit business if the auditor acquiesces to desired accounting treatments.
Members of management can spread word through their social networks to hire—or avoid—an audit firm, depending on its cooperativeness.
Auditors are vulnerable to the financial pressure this leverage can put on their firm. They need to cover costs, have sufficient profits to retain top-quality partners 6 and retain market share to remain competitive. Particularly prior to the 1970s, auditors were very vulnerable to social pressures: they were barred by professional ethics rules from advertising or engaging in competitive bidding and had to use aggressively personal connections and word of mouth to build their client base (Brown 2007). The 2002 SOX reforms (discussed in detail below) limit some forms of management influence, but considerable management influence remains.
Again, like evaluation methods, GAAP and GAAS do not completely specify how a transaction should be treated or an audit conducted (O’Reilly et al. 1998; Zeff 2007). More generally, in a complicated modern corporation, financial statements are the product of an accumulation of small decisions. GAAP and GAAS provide guidance for management’s accounting choices and auditors’ response, but that guidance leaves to management and the auditor considerable discretion. Armed with that discretion, what does the auditor do? Does the auditor acquiesce to the treatment most favorable to management? Or insist on the treatment she believes is most revealing to owners and other stakeholders of the true state of the corporation’s finances? In such situations, the psychology literature suggests that even professionals will sometimes lose perspective on appropriate behavior (Bazerman and Tenbrunsel 2011; Moore et al. 2006; Bazerman, Morgan, and Loewenstein 1997; Kunda 1990).
Two recent, well-publicized examples demonstrate how financial reports are manipulable in ways that significantly affect their informativeness while remaining legal (i.e., in nominal compliance with GAAP and GAAS) or at least in a gray area. The first example concerns Lehman Brothers, its auditor Ernst & Young, and the transaction known as Repo 105, which appears to have hidden the extent of the risk Lehman was bearing in 2007 and through September 2008, when its bankruptcy helped trigger the global financial crisis. 7 Regulators and financial markets monitor a bank’s ratio of debt to assets as a key proxy for the bank’s ability to absorb downturns in the market. The poor performance of Lehman’s assets (which included toxic assets such as mortgage backed securities) was pushing its debt ratios up sharply. To hide the increase in debt ratios, Lehman engaged in a transaction it called Repo 105. Before each quarterly report, Lehman would “sell” some of its assets to an external entity (up to $50 billion worth), with an accompanying agreement to repurchase, with interest, that debt a few days after the reporting period was over. The bank then used the cash raised to pay off some of its other debt, lowering its debt ratio. Then, a few days after the reporting period ended, Lehman would go back to the market and reborrow the money in order to meet its repurchase agreements. This returned the debt ratio to its earlier (and higher) level.
The transaction has several notable aspects. First, such repo transactions are usually accounted for as loans, though there is an allowance in GAAP for a repo transaction to be accounted for as a sale—hence the room for discretion. Whether Lehman’s transaction met the technical requirements for it to be treated as a sale is unclear—at best, it was on the edge of legality, since Lehman could only find a U.K. law firm willing to attest to it being a sale, rather than a loan. Second, even assuming nominal compliance with GAAP, the clear intent of the transaction was to hide the magnitude of Lehman’s debt. Third, owners, creditors, and regulators would have wanted to know about the transaction. Finally, the auditor, Ernst & Young, was aware of the transaction (though apparently not its scale), understood its intent, and nevertheless allowed Lehman to treat the transaction as a sale. Ernst & Young could have, but did not, either require Lehman to treat the Repo 105 program as a loan or, at least, require it to disclose the unusual treatment and report the impact on the financial accounts of doing so. In summary, the auditor made a series of decisions that served the interests of management at the expense of the interests of the owners and other stakeholders.
Our second example concerns earnings smoothing at General Electric (GE). Beginning in 1975, GE reported more than 100 continuous quarters of steady earnings growth. Some of this pattern was due to true earnings growth, but some of the growth and most of its consistency was due to the aggressive use of flexibility in accounting rules (Birger 2000; Quinn 2008). Specifically, GE and its auditor used the flexibility and judgment that is required in accounting for aspects of GE Capital (its financial services arm) to decide when to recognize earnings. In practice, earnings were recognized in leaner periods and deferred in more flush periods, thereby significantly reducing the variance in earnings. Most methods of shifting earnings across time would constitute misreporting, but by manipulating the size of reserve accounts for adverse events (such as bad debts), GE could achieve the same effect legally.
Again, note the details of the transaction. First, accounting rules allow considerable discretion, and the individual decisions were within allowable bounds. Second, in aggregate, the resulting set of financial statements created a misleading impression that GE’s earnings had little or no volatility. Despite the appearance of minimal risk that extended well into the 2000s, GE accessed $51 billion in loan guarantees from the Temporary Liquidity Guarantee Program that was initiated in response to the financial crisis (Malone 2009). It also had to raise emergency capital by selling $3 billion in preferred shares with a 10% dividend—very expensive terms for a company with such a long history of steady, predictable growth—to Berkshire Hathaway at the height of the crisis (Craig and Protess 2011). For most of GE’s run of improbably smooth earnings growth, no one has accused it of any illegal accounting. 8
Note that while neither example is an instance of clear-cut fraud on the part of the corporate management, in both cases, the managers exercised their scope for discretion within GAAP to their private benefit, at the cost of a fair representation of the condition of the company. While the extent of auditors’ responsibility has been vociferously debated for almost a century, since at least 1969 in the Continental Vending Supreme Court decision ( United States v. Simon, 1969) auditors have been supposed to certify that the financial statements are a “fair representation” of the economics of the company in a way that goes beyond a technical compliance with GAAP (Zeff 2007). This implies an obligation on the auditor to force management to change accounting treatments that, while compliant with the rules of GAAP, are not, in the professional judgment of the auditor, a fair representation of the financial condition of the company, given the context of the decision.
Presumably the evaluation profession holds itself to an equivalent standard: It is not merely sufficient to conduct or report an analysis that applies an accepted method in the evaluation literature, it is necessary that the method be appropriate to the question being researched and the available data and that the results appropriately characterize the effectiveness of the program being evaluated (or at least that limitations of methods are explicitly acknowledged). However, this standard requires judgment combined with a close understanding of the context in which the analysis is taking place. The ambiguity of the situation—along with the potential for such judgment calls to have significant effects on the relevant outcomes 9 —offers an opportunity for corporate management or government agencies to exercise their contracting leverage to achieve their goals without needing the auditor or evaluator to commit clear acts of criminal fraud in the case of auditing, or scientific misconduct in the case of a program evaluation. Furthermore, it is difficult for outsiders, lacking the necessary context or the technical skills, to adjudicate the appropriateness of the decisions made.
Avenues to Auditor Independence
This then is the dual challenge: Why should we expect auditors to work in the interests of owners and other stakeholders, rather than in the interests of management that directly oversees their work? And, what institutional changes could we make to increase the influence of owners and other stakeholders (relative to the influence of management) on how auditors do their work? The received literature 10 includes four lines of argument as to why the private market might deliver high-quality, independent audits—that is, that serve the interests of owners and the broader financial community—even when the conflicted manager is responsible for contracting for the service: (i) commitment to professionalism; (ii) the market-disciplining effects of reputation; (iii) the deterrence of legal liability; and (iv) owner oversight of auditors. We now discuss, in turn, each of these lines of argument, considering the basic theories and their potential weaknesses.
Professionalism
The rhetoric of the accounting profession surrounding issues of independence has traditionally emphasized the notion of professionalism and character. Thus, for example, as a representative of the accounting profession, Gary Shamis testified before the Securities and Exchange Commission (SEC) in 2000 that rules limiting the ability of auditors to cross-sell consulting services were unnecessary because: We are professionals that follow our code of ethics and practice by the highest moral values. We would never be influenced by our own personal financial well being versus our professional ethics. (Shamis 2000) A distinguishing mark of a profession is acceptance of its responsibility to the public… In discharging their professional responsibilities, members may encounter conflicting pressures from among [their different stakeholders]. In resolving those conflicts, members should act with integrity, guided by the precept that when members fulfill their responsibility to the public, clients’ and employers’ interests are best served… Those who rely on certified public accountants expect them to discharge their responsibilities with integrity, objectivity, due professional care, and a genuine interest in serving the public. (AICPA 1988, ET §53.04)
Indeed, social psychology research helps explain how even well-intentioned groups have trouble enforcing norms and practices that run counter to their self-interest and have trouble even recognizing that there is a problem. Moore et al. (2006) apply a comprehensive review of that literature to the context of auditing. They demonstrate how such phenomena as motivated reasoning and strategic attitude shifting can undermine auditors’ intentions to serve the public interest.
We can see this play out in archival material from the internal communications of auditors. 11 There, much of the rhetoric around the definition of professionalism in accounting is not about independence but instead about client service—where the client is management, not owners, the public, or other stakeholders. In 1989, the head of Price Waterhouse expressed a representative sentiment: “Within the limits set by independence requirements, we have a professional responsibility to serve our clients’ needs. To use my favorite phrase, we have to be obsessed with client service” (O’Malley 1989). 12 However, such an emphasis on client service can induce selective perception and strategic attitude shifting when faced with potential evidence of a client manipulating its accounting (Moore et al. 2006).
Furthermore, in accounting, even the demands of professionalism as defined are conflicting in practice. For example, while the AICPA’s Standards of Conduct clearly imply that professionalism includes a duty to serve the public interest, professionalism also carries a requirement of client confidentiality, and confidentiality includes not sharing management–auditor communications with the board, shareholders, or the public, except in well-defined situations (O’Reilly et al. 1998). Even required communications between auditors and the board are generally carefully stage-managed and preapproved by management (Brown 2007).
This concept of confidentiality creates a socially undesirable incentive for management: Management is free to suggest an aggressive accounting treatment, since it is protected by the auditor’s professionalism from having such proposed aggressive treatment revealed to owners—who might punish management for attempting to hide an unfortunate reality from owners. Since there is little harm in attempting its preferred treatment, management is more likely to try, and will sometimes succeed, even under the scrutiny of the most independent auditor. This institutional detail can be shown to significantly reduce the value of an audit via the reduction of ex ante incentives to tell the truth (Brown 2011a, 2011b).
Reputation
Professionalism is not the only reason for auditors to “do the right thing.” A reputation for independence might lead to new business. Conversely, a loss of that reputation could lead to a catastrophic loss of current or future clients.
The argument is a straightforward extension of the argument for the need for an audit (Jensen and Meckling 1976; Watts and Zimmerman 1979, 1983; DeAngelo 1981). Suppose that there were two types of audit firms—independent ones and nonindependent ones. Now suppose that owners observe that management hired a known nonindependent auditor. Owners should take that as a signal that management has something to hide and fire management. In equilibrium, we should only observe managers hiring auditors with a reputation for independence.
Now consider a manager who tries to induce a reputable auditor to ignore problems in the financial statement or to go along with questionable accounting judgments. Beyond the obligations of professionalism, the auditor might weigh the benefits that management can offer—future auditing and consulting work, perhaps bribes—against loss of some or all audit work if the nonindependent behavior is detected and the auditor’s reputation is diminished. As long as the probability of detection is nontrivial, this line of argument should induce auditors to safeguard vigilantly their independence.
The collapse in 2002 of one of the largest accounting firms, Arthur Andersen, as a consequence of its indictment for its role in the Enron fraud seems to support this theory well. As soon as it appeared to the market that Andersen’s audits were less credible than those of its competitors, Andersen’s client list melted away (Jensen 2006).
Yet, Enron was neither the first nor the last of major accounting scandals that impugned the independence and/or the competence of a major accounting firm (Eisenberg and Macey 2004). Nevertheless, Enron was the only scandal to cause the collapse of a large accounting firm due to reputational concerns. The revelation of Lehman’s Repo 105 strategy and Ernst & Young’s approval of it has not resulted in their loss of business or a notable loss of reputation; 13 certainly, no one fired KPMG—or discounted the stocks of KPMG clients—because GE succeeded in smoothing its earnings. Similarly, in most major scandals, the audit firm involved has not suffered any major losses to either its client roster or its ability to charge fee premiums. This suggests that serving management at the expense of owners and the broader financial community does not hurt an audit firm’s reputation as much as is commonly thought. Indeed, a closer examination of the circumstances surrounding the collapse of Andersen suggests that the criminal indictment, not the reputational damage, was key to its demise.
It appears that in 2002 Congress did not judge reputation to be a sufficient constraint on auditor behavior. Historically, accounting firms sold both auditing services and consulting services to the same clients. If reputation were an effective constraint, this should not have been problematic. However, some observers expressed concern that the potential profits from consulting outweighed reputational concerns (Coffee 2006). To address that concern, the SOX legislation restricted companies’ ability to purchase some types of nonaudit services from their auditor. The independent Audit Committee of the board of directors must now also approve all allowed types of nonaudit services, strengthening oversight of the remaining relationships that might cause a problem.
Liability
The third major avenue by which auditors might be induced to maintain their independence in the face of attempts by management to coopt them is legal liability: the ability of owners to sue the auditor—and recover large damages—for auditing failures (e.g., an ex post discovery of a fraud or misstatement). Since evaluators currently do not face significant sources of liability, we only briefly touch on the hypothesis here.
For legal liability to prove an effective deterrent, there must be legal liability and the penalties that firms face must be sufficiently large relative to the benefits the auditor gets from compromising independence (Dye 1993). Neither condition appears to be consistently satisfied. Liability is only incurred through gross negligence or knowing collusion. However, we have already noted that auditing rules allow for considerable discretion. In the face of such discretion, it is difficult to assess legal liability (see, e.g., Judge Kaplan’s opinion to dismiss most charges against Ernst & Young with regard to their audit of Lehman Bros., In re: Lehman Brothers Securities and ERISA Litigation 2011). Thus, auditors can often satisfy management without resorting to behavior that would make them legally liable—even if a bad outcome occurs and the misleading accounting treatment is discovered.
Owner Oversight
If the role of the audit is to protect owners’ interests and if the problem is that management oversees the audit, then a direct response would be to increase owners’ role in overseeing the audit. Since shareholders are too dispersed to directly oversee the audit, their representatives in the form of the board of directors might take on an increased oversight role. Thus, boards could select the auditor and make the decision whether to renew the audit contract. Similarly, given the possibility of (and profits from) nonaudit work serving as a way for management to affect the auditor’s choices, boards might provide tight oversight of, or perhaps simply ban, other contracts between the auditor and the company.
In addition, boards might exert more oversight of the actual auditing decisions. They might insist that all major discretionary decisions in the audit be referred to them for review. In particular, boards might insist on the auditor discussing all changes made to the original financial statements prepared by management. 14 Under this scenario, such changes would be discussed with owners even when the changes were resolved to the auditor’s satisfaction. Along the same lines, boards could pay bonuses to the auditor (as well as possibly disciplining or dismissing management) when the auditor discovered a major problem with the financial statements prepared by management (Feess and Nell 2002; Kornish and Levine 2004).
Prior to 2002, some of these board oversight activities were identified as best practices (e.g., Treadway Commission 1987) but not consistently used. In part, this may have been because boards of directors in many corporations are not themselves very independent of management (Bebchuck and Fried 2004). Directors are often chosen by management, have other relations with management (e.g., reciprocal directorships, other business relations, and friendship), and are compensated (e.g., with stock options) in ways that can make them similarly interested in misrepresenting the success of the company.
The board’s role in the audit changed with the passage of SOX in 2002. SOX implements many of these requirements (in fact, much of this list is drawn from the SOX reforms). Specifically, SOX requires that the auditor report directly to an independent Audit Committee. That Audit Committee is responsible for hiring and firing the auditor, as well as approving consulting relations (i.e., above and beyond the audit) between the firm and the auditor. Furthermore, SOX defines an Audit Committee as “independent” as one that does not include any members of management and must meet some additional independence requirements. 15
Insights for Avenues to an Independent Evaluation
The previous section has reviewed four strategies discussed in the auditing literature for why auditors should serve the interests of owners and the broader financial community, rather than the interests of management that directly oversees the audit: professionalism, reputation, liability, and owner oversight. The previous section also discussed the SOX structural reforms to shift auditors toward serving owners and the broader investment community.
In this section, we apply those insights to evaluation. We have argued that, like auditing, evaluation has a disjunction between those who oversee the evaluation—the agency—and the broader audience for the evaluation—the broader policy community. The last of the auditing strategies, and the focus of the SOX reforms, focuses directly on increasing owner oversight. Our discussion begins with the equivalent of owner oversight: increasing the role of the broader evaluation community in oversight of impact evaluations. We then turn to the other strategies—reputation, professionalism, and liability—arguing that increasing the role of the broader evaluation community will strengthen the role of reputation and professionalism.
Contract Oversight
In the case of auditing, SOX reforms have strengthened the role of the board (representing owners) in dealings with the auditor. Here, we consider analogous changes with respect to evaluation. The place to start is to note that there is a clear need for contract oversight. Like any other contractual relationship, sometimes evaluators do not deliver or deliver substandard products. Similarly, evaluation is the sum of day-to-day decisions. Someone needs to oversee and guide those decisions on a day-to-day basis. Occasional meetings of an oversight board (as with an Audit Committee) do not replace the need for ongoing staff oversight and coordination with the evaluator. Klerman (2010) discusses these issues in detail and proposes some mechanisms to address them.
Granting the need for oversight and coordination with agency staff, this article’s discussion of auditing suggests that Klerman (2010) significantly underplays the potential role of stakeholder oversight in increasing independence while retaining rigorous contract oversight. Analogously with the SOX role for the Audit Committee, independent evaluations might receive oversight from a committee of stakeholders.
There already exists one mechanism to address these concerns: most evaluations currently have a technical working group (TWG) made up of interested and informed policy analysts (methodologists and substance experts drawn from academia, former officials in the agency, and other evaluators). The TWG reviews and comments on evaluation plans and draft deliverables.
However, in the face of agency attempts to steer an evaluation in a particular direction, the TWG’s current ability to ensure that the evaluation responds to the needs of the broader policy community is extremely limited. The TWG is selected by the agency. Then, the agency controls the flow of information—what and when—to the TWG. Furthermore, TWGs advise; they do not make decisions. Agencies can and do ignore a TWG’s guidance. In particular, final discussion about what to include in a report, how to characterize findings, and whether and when to release a report are usually made after the last TWG meeting. Finally, TWG members are usually required to sign nondisclosure agreements such that—both before and after release of reports—they are prohibited from discussing unreleased materials. Thus, TWG members are usually contractually prohibited from discussing the status of a report (e.g., why it has not yet been released) or the process leading up to what is reported (e.g., on what issues there was substantial discussion in the TWG, what information or analyses do not appear in the publicly released report).
Strengthening the TWG appears to be a promising, low-cost way to increase evaluation independence. While one could imagine a strong form of oversight such that the TWG was heavily involved (and thereby address more of the ways that an agency might influence an evaluator), 16 in most cases, that seems unnecessary. Given ambiguity about the magnitude of the problem, shifting the bias modestly in the direction of broader stakeholder interests seems an appropriate first step. The crucial issue appears to be the flow of information to the TWG and from the TWG to the public. Thus, two changes might address the more egregious cases:
Remove contractual restrictions on the evaluator talking to a TWG member about issues related to the evaluation. 17
Authorize the TWG to resolve disagreements between the evaluator and the agency about the details of evaluation and whether material should be publicly released. 18 Thus, the evaluator might raise an issue with the TWG and the TWG would make a binding decision. A small shift from current practice could require a supermajority of the TWG in order to overturn an agency decision. Alternatively, consideration of the professional competence of the evaluator and a bias toward public release might instead suggest requiring a supermajority of the TWG in order to overrule the evaluator or to deny public release. This would constitute a more aggressive shift from the status quo approach.
Clearly, the choice of who would serve on such a committee would be crucial. Nevertheless, it should be possible to identify stakeholders on all sides of an issue (e.g., relevant academics and policy analysts, representatives of CBO and OMB, perhaps, appointees from majority and minority leadership of relevant Congressional committees). This is especially true because with a supermajority decision rule or with nonpartisan experts, exact balance is not needed.
To see how this might work, consider the case of the Corporation for National and Community Service’s Youth Corps evaluation (Price et al. 2011). As discussed in second section, generally agencies do not allow early dissemination of unfavorable results, preferring to sit on the study for as long as possible to ensure that the release is low profile and/or puts a positive spin on the findings. However, late in the review and release process of the Youth Corps study (ultimately released in June 2011), Reingold and Lenkosky (2011) wrote an essay (published in the December 2011 issue of Public Administration Review; presumably written and reviewed many months earlier) on “The Future of National Service” which stated: While the results of this evaluation are still being analyzed, the preliminary findings are not encouraging. Compared to a control group, participants in the Corps Network were less engaged in their communities or politics in the aftermath of completing the program. No behavioral changes were observed in labor market outcomes or many other indicators designed to capture changes in material well-being. Only modest positive changes were observed in participants’ goals for the future. While these findings are preliminary and may change upon further analysis, the effectiveness of the Corps Network appears to be limited, at best.
The impact of a reasonable expectation of release of results outside of the official report or of internal debates about what results to release and how to characterize them would have two positive outcomes. First, it would be a direct check on withholding of individual results or final reports. If an agency asked that a result be deleted from its final report, the TWG could ensure the deleted results were released in some other form. Similarly, if an agency refused to release a report under its own name, the TWG could release the results under its name.
Second, the knowledge that decisions might be overturned by a TWG would serve as a brake on agency interference with evaluator judgment. Agencies could continue to provide needed contract oversight. However, knowing that egregious interference would be identified, publicized, and overturned, agency personnel would probably self-censor some such interference, as they would know it would be less effective. This mechanism is analogous to the suggestion that auditors report to owners rejected proposals by management for favorable treatment of transactions.
Professionalism
Professionalism clearly has a role in evaluation. Evaluators interact routinely not only with other evaluators and their clients but also with the worlds of academia and public policy more broadly. They are also a multidisciplinary group with varying methodological loyalties. Evaluators publish in academic journals and therefore have an external check on their decisions, which can help maintain standards of rigor. There is a moderate amount of cycling of staff (especially senior staff) between evaluation firms and academia, government, and the policy community (e.g., nonprofits and advocacy groups), which provides a more constructive set of incentives than the cycling between auditor and management roles that occurs in accounting.
Furthermore, the American Evaluation Association has a set of Guiding Principles for Evaluators (http://www.eval.org/Publications/GuidingPrinciples.asp) similar to the AICPA’s Code of Conduct. These principles explicitly raise the issue considered here:
Freedom of information is essential in a democracy. Evaluators should allow all relevant stakeholders access to evaluative information in forms that respect people and honor promises of confidentiality. Evaluators should actively disseminate information to stakeholders as resources allow.
…
Evaluators should maintain a balance between client needs and other needs. Evaluators necessarily have a special relationship with the client who funds or requests the evaluation. By virtue of that relationship, evaluators must strive to meet legitimate client needs whenever it is feasible and appropriate to do so. However, that relationship can also place evaluators in difficult dilemmas when client interests conflict with other interests, or when client interests conflict with the obligation of evaluators for systematic inquiry, competence, integrity, and respect for people. In these cases, evaluators should explicitly identify and discuss the conflicts with the client and relevant stakeholders, resolve them when possible, determine whether continued work on the evaluation is advisable if the conflicts cannot be resolved, and make clear any significant limitations on the evaluation that might result if the conflict is not resolved.
However, these Principles conflict with current contracting practices. Standard contractual language for government impact evaluations prohibits “allow[ing] all relevant stakeholders access to evaluative information” and “discuss[ing] the conflicts with relevant stakeholders.” As a consequence, standard contracting agreements specifically preclude evaluators from applying this principle.
Therefore, the proposed changes to ownership oversight would probably increase the role of professionalism. The more likely it is that inappropriate professional choices (e.g., to serve the agency rather than the broader policy community) would be revealed, the larger the incentive to conform to professional expectations, rather than the narrow corporate interests of the evaluation firm (e.g., short-term profit). As with our discussion of auditors revealing to the Audit Committee denied auditing requests of management, a more involved TWG would probably lead to more revelation of deviations from professional standards. Finally, giving evaluators more ability to publish in academic forums would increase their affinity with academic standards of intellectual freedom and commitment to the truth and methodological rigor.
Increasing the Role of Reputation
In both auditing and evaluation, reputation for independence is a potentially valuable asset, but in both cases, its value is muted. The more that reforms increase the ability of evaluation firms to profit from their reputation, the more likely it is that evaluators will find ways to protect the independence of their employees and processes. Currently, however, the agency chooses the evaluator and agencies make it clear that “client service” is a major criterion for selection, not independence.
Increased stakeholder oversight would be likely to increase the value of a reputation for independence. If policy makers choose the evaluator and they seek independence, then evaluators would be more likely to compete on who would be more independent. Public revelation by a TWG of deviations from independence would also increase the ability of reputation to act as a constraint on behavior. For positive reinforcement, allowing evaluators more freedom to exercise their professional responsibilities by, for example, publishing results in academic, peer-reviewed journals would also facilitate development of the kind of reputation that is linked to independence and rigor.
Liability
The final mechanism to induce auditors to work for owners, rather than management, is legal liability. Liability does not appear to be operative in evaluation. Short of gross malfeasance or failure to deliver, evaluators do not appear to face any liability. Furthermore, currently the only party with standing to sue appears to be the government agency. Thus, ex post liability does not appear to be a major constraint on serving the agency rather than policy makers.
However, outsiders can “sue” for release of evaluation documents under FOIA. There has been some, but very limited, use of FOIA to force release of evaluation reports (the USDA case in Metcalf 2008, is one example of this; scientific reports buried by agencies have also been released via FOIA, as in Kaplan 2008b). Sometimes evaluation reports have been deemed internal policy documents and therefore not subject to FOIA. A formal definition of “independent evaluation” might explicitly state that evaluation materials are subject to FOIA (except for materials relating to human subject confidentiality or proprietary data considerations). In addition, if the evaluation process had clear expectations of report release and of the timing of such releases, this would facilitate external watchdog groups’ efforts to fulfill their mission. They would be able to track evaluation report due dates and regularly submit FOIA requests when those due dates pass.
Currently, in evaluations, the legal powers lie with the contracting agencies, which can use nondisclosure agreements to sue those who attempt to circumvent agency foot-dragging. These contract-enforcement powers have been documented to suppress research in various contract research situations, particularly in the private sector (Schulman et al. 2002; McGarity 2003). The inclusion of strong 21 nondisclosure requirements in evaluation contracts does not appear to serve any purpose beyond providing the agency with additional leverage over the evaluator and the results of the evaluation.
Definition and Labeling
Requiring an independent evaluation and the strengthened TWG suggested here is only meaningful if that term is defined. There is not currently a formal definition of an independent evaluation. This is in contrast to auditing where the Securities Exchange Act of 1934 required an “independent audit” and provided a mechanism for defining the term. Furthermore, GAAS and the SEC provided (at least some) guidance on what constitutes independence, and SOX and its newly created Public Company Auditor Oversight Board (PCAOB) have considerably tightened the definition.
Currently, Congress sometimes requires an “independent evaluation.” Once the concept of independent evaluation receives more attention and formal definition, we would expect Congress to more frequently require an independent evaluation, since independence can add considerable value to the information provided in an evaluation. Similarly, Congress might require that all funds to certain agencies (e.g., National Institute of Health grants, Institute for Educational Statistics contracts, and Centers for Disease Control studies) and designated “research and evaluation” funds for other agencies be used solely for independent evaluation,
Furthermore, once there is an official definition of an independent evaluation, agencies and programs looking for external confirmation of their effectiveness might request that evaluations of the effectiveness of their programs be conducted under the terms of an independent evaluation. They might do so because conducting an evaluation subject to the official definition of independent evaluation would enhance the credibility of the resulting evaluation.
The designation of independence could be given a more formal status in official government uses of studies. For example, an independent designation could be required for an evaluation’s inclusion in formal OMB, CBO, and GAO, and National Academy of Science/Institute of Medicine (NAS/IOM) reviews of evidence. Other evaluations might be inadmissible or at least down-weighted. Refereed journals could also require that papers resulting from contracts state whether or not they were conducted subject to independent evaluation conditions (see Klerman 2010, for more discussion of suggestions of this form).
Toward a Definition of “Independent Evaluation”
We have argued that standards for independent evaluations should be crafted—as much as possible—so that evaluators would serve policy makers and the public rather than the agency being evaluated. Klerman (2010) discusses in detail several possible strategies for increasing evaluator independence. This article builds on those strategies by drawing insights from auditing and the SOX reforms to highlight the potential value in considering an expanded role for a TWG. Here, we collect some principles that might characterize an expansion in the role of the TWG:
Independent evaluations would have an oversight entity, like the current TWG, composed of substance and methods experts who together reflect the varied ultimate consumers of the evaluation.
While the agency would oversee the day-to-day details of contract management, there would be no restriction of communication between the evaluator and members of the TWG.
Issues of disagreement between the agency and the contractor would be discussed before the TWG.
There are several ways in which the last point could be implemented. One way would be for the TWG to arbitrate such disputes, determining what appears in the final report. A less intrusive strategy might be for the TWG to authorize parallel release by the evaluator of material not included in the official agency report. A third strategy would be for a TWG member to publicly discuss the contractor–evaluator disagreement or to release the full report when—in the opinion of the TWG member—the public release was being inappropriately delayed by the agency. Klerman (2010) has a more thorough discussion of issues related to the public release of information that the agency wishes to suppress.
There are two instances where specific agencies, in the United States and the United Kingdom, are being prompted to move in the direction we are suggesting. Sissine (2012) touches on the possibility of removing the contract oversight from the agency under evaluation in the context of the Department of Energy. Picciotto (2008) offers specific recommendations on how to increase the independence of an in-house evaluator at the U.K.’s Department for International Development, which crucially involve increasing the role of its Independent Advisory Committee (roughly parallel to the TWG). Note as well that our principles could apply to an entity other than the TWG as long as its interests were reasonably well aligned with the ultimate stakeholders of the evaluation. Reingold (2008) reviews the alternative entities that could administer an evaluation contract other than the agency itself or the TWG.
Discussion
This article has reviewed the literature and experience on contracting for independent audits and made an analogy to contracting for independent evaluations. In a direct parallel with the SOX diagnosis of problems in auditing, we have argued that the fundamental challenge in contracting for independent evaluation is the disjunction between those who oversee the evaluation—usually closely related to the agency whose program is being evaluated—and those who need the results of the evaluation to make funding and policy design choices—the general public and its agents: the agency’s Inspector General, Program Analysis and Evaluation unit (or equivalent), OMB, Congress, and the policy community. While implicit, that disjunction is not explicit in the previous literature (e.g., Klerman 2010; Klerman, Baron, and Rolston 2010). This article has focused on the disjunction, considered its implications, and proposed structural reforms to lessen the negative effects of the disjunction.
Specifically, we propose structural reforms analogous to those that SOX uses to increase auditor independence: define independent evaluations and then shift some decision making power from the agency to a TWG or a similar entity. Furthermore, we argued that such an increased role for a TWG is likely to increase the effectiveness of two of the other mechanisms—professionalism and reputation.
Our suggestions are relatively low-cost measures that seem likely to support and nurture the culture of independence at program evaluation firms and centers. This is worth doing even if the incidence of pressure is relatively rare. As scientific, rigorous evaluation is increasingly demanded by government, which appears to be the prevailing trend, the stakes associated with evaluations are only going to grow. Setting in place in advance protections for independence will help establish a firm foundation for the evaluation industry’s continued relevance and value to the public discourse.
As in auditing, we do not believe that these changes are a complete or perfect solution. Even with a more powerful TWG, we need not worry that agencies will be powerless to exercise legitimate control of their contracts: there are still significant and costly hurdles to the evaluator raising an issue concerning direction from the agency. Evaluators may be reluctant to appeal to the TWG, just as they are now with other rights under contract. Thus, in net, the reforms proposed here would only modestly shift the balance from serving the interests of the agency toward serving the interests of the broader public, as represented by the broader public policy community. Nevertheless, a shift in that direction would be positive, helping to avoid or ameliorate at least the most egregious cases of interference.
Footnotes
Acknowledgment
We would like to thank Susannah Rose, Lisa Cosgrove, Mike Jones, Garry Grey, and Sheila Kaplan for advice and comments on previous drafts. We also thank the participants of the APPAM International Conference on Improving the Quality of Public Services and the Abt Journal Authors Support Group for their comments. Klerman’s work on this paper was supported by internal Abt research funds. This paper does not necessarily represent the position of Abt Associates or its clients.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: No conflicts of interest beyond the fact that Jacob Alex Klerman is a contract researcher—currently with Abt Associates, previously with the RAND Corporation– and Abigail B. Brown is formerly a contract researcher at the RAND Corporation.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
