Abstract
Benford’s Law asserts that the leading digit 1 appears more frequently than 9 in natural data. It has been widely used in forensic accounting and auditing to detect potential fraud, but its application to nonprofit data is limited. As the first academic study that applies Benford’s Law to U.S. nonprofit data (Form 990), we assess its usefulness in prioritizing suspicious filings for further investigation. We find close conformity with Benford’s Law for the whole sample, but at the individual organizational level, 34% of the organizations do not conform. Deviations from Benford’s law are smaller for organizations that are more professional, that report positive fundraising and administration expenses, and that face stronger funder oversight. We suggest improved statistical methods and experiment with a new measure of the extent of deviation from Benford’s Law that has promise as a more discriminating screening metric.
Keywords
Introduction
There have been demands for increasing transparency and accountability of the nonprofit sector and calls to improve nonprofit financial reporting since the Sarbanes–Oxley legislation in the United States (Calabrese, 2011). Form 990 is the primary information disclosure mechanism of U.S. nonprofit organizations, providing extensive financial and other information. Each year, most U.S. nonprofits with gross receipts exceeding US$50,000 must file Form 990 (or 990-EZ) with the Internal Revenue Service (IRS) and make it publicly available.
1
The IRS (2017, p. 1) recognizes: some members of the public rely on Form 990 . . . as their primary or sole source of information about a particular organization. How the public perceives an organization in such cases can be determined by information presented on its return.
Specifically, donors and funders may review Forms 990 before making contributions, charitable rating services (e.g., Charity Navigator, the Better Business Bureau Wise Giving Alliance, Charity Watch) use 990 data to evaluate nonprofit financial accountability, and Guidestar publicly posts the Forms 990 of all registered nonprofits. Finally, panels of 990 data are a goldmine for scholars who advance knowledge on nonprofit behavior.
Therefore, the accuracy and reliability of 990 data are important. Yet, there are many problems. In particular, Form 990 requires nonprofits to divide their total expenditures into three functional categories: program, fundraising, and administrative. Many studies reveal the misreporting of functional expenses. Donors, regulators, charity watchdogs, the media, and the public have paid the most attention to program ratio (i.e., the ratio of program to total expenses) as an important metric for nonprofit performance (see Garven, Hofmann, & McSwain, 2016 for a review). Therefore, nonprofits have the incentive to improve their program ratio by reallocating (thereby underreporting) some fundraising and/or administrative expenses to program expenses (thereby overreporting). Studies confirm that the patterns of functional expense reporting are consistent with stakeholder-impression management, as the reported program ratios are excessive, whereas the reported ratios of fundraising (or administrative) to total expenditures are too low (Jones & Roberts, 2006; Keating, Parsons, & Roberts, 2008; Krishnan & Yetman, 2011; Krishnan, Yetman, & Yetman, 2006; Trussel, 2003; Wing, Gordon, Hager, Pollak, & Rooney, 2006). Other studies find that nonprofits may also manipulate data to reduce their unrelated-business income tax obligations (Omer & Yetman, 2003, 2007). Some support their conclusions by comparing 990 data with audited financial statements (Burks, 2015; Froelich, Knoepfle, & Pollak, 2000; Keating & Frumkin, 2003; Krishnan et al., 2006). Others compare annual reports with state regulatory agencies with 990 data (Keating et al., 2008; Krishnan & Yetman, 2011). None use the approach taken in this article, comparing the distribution of reported numbers with that predicted by mathematical theory, namely Benford’s Law.
Benford’s Law asserts that the distribution of leading digits of natural data is non-uniform, that it is much more likely that the first digit will be a 1 (30% of the time) than a 9 (less than 5%). This law has been widely used to prioritize suspicious data for fraud investigations in the for-profit sector (e.g., Amiram, Bozanic, & Rouen, 2015; Archambault & Archambault, 2011). In this article, we apply Benford’s Law to a panel of 990 data from 501(c)(3) public charities and assess its usefulness as a screening tool in the nonprofit setting. Good screening would potentially enhance the effectiveness of limited investigative resources, deter fraud, and lead to more reliable nonprofit reporting and a more accountable nonprofit sector.
Specifically, we analyze compliance with Benford’s Law across the whole sample and by individual organizations. We develop hypotheses around three broad institutional characteristics that the literature has shown to be correlated with Form 990 reporting quality, namely, (a) professionalization, (b) institutional pressure, and (c) funder oversight. Our statistical tests provide support for all three sets of hypotheses, and the regression analyses find significant results for all sets of hypotheses except for institutional pressure. These results indicate the usefulness of Benford’s Law in picking up meaningful patterns of misreporting. In addition, we acknowledge that Benford Law test statistics are prone to generate false positives. It is important to recognize that Benford analysis should not be employed to determine misreporting, but rather to flag cases of potential misreporting for further investigation (Tam Cho & Gaines, 2007). Finally, we also call attention to a neglected statistical test, the Freedman–Watson U2 test that is more appropriate for Benford analysis than traditional tests used in previous work, and employ a newly developed descriptive measure, excess mean absolute deviation (EXMAD) that is superior to the traditional mean absolute deviation (MAD; Barney & Schulzke, 2016).
The next section provides background on Benford’s Law and its limited applications to nonprofit financial data. Then, we develop our hypotheses, describe data and methodology, and present results. We conclude by discussing the implications of Benford’s Law for practice.
Literature Review
Benford’s Law
Examining library copies of logarithm tables, the American astronomer Simon Newcomb observed that the first pages dealing with low digits were far more worn out than the last pages. Based on this observation, he formulated a hypothesis that the 10 digits do not occur with equal frequency, with more numbers beginning with the lowest digits (1 or 2) than with the highest digits (8 or 9). Specifically, “the law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally likely” (Newcomb, 1881). Given this, natural numbers beginning with digit 1 should occur for about 30% of the time, those beginning with digit 2 should occur for 18% of the time, with the share of leading digit occurrences decreasing down to the digit 9 that should occur for only 4.6% of the time. This phenomenon was later referred to as Benford’s Law. Newcomb did not provide any theoretical explanation and his discovery did not raise much attention until 1938, when Frank Benford noted the same pattern again. Benford confirmed this hypothesis for 20 different types of naturally generated numbers—representing a sum of 20,229 observations—ranging from atomic weights to street addresses of American Men of Science (Benford, 1938).
For a long time, Benford’s Law was deemed a mysterious law of nature. Although the mathematical proofs for Benford’s Law are complicated, the natural relationship between growth rates and the log of first digits provides some intuition: the growth rate of a variable, which is a percentage change over time, is mathematically equivalent to the change in the log of the variable. Suppose the total revenue of an organization is US$1,000,000, then it must grow 100% before it reaches US$2,000,000 (leading digit of 2), 50% for the leading digit to be 3, and 12.5% to raise a leading digit 8 to 9. This makes 1 more common than any other digit (Nigrini, 1999).
Various proofs detail different factors leading to the Benford distribution. Boyle (1994) showed asymptotic convergence when random variables are arithmetically combined, and Hill (1995) showed convergence when data from diverse statistical distributions are combined. Because most accounting data are a combination of transactions, Benford analysis is appropriate (Durtschi, Hillison, & Pacini, 2004).
However, certain types of data do not conform to Benford’s Law (Durtschi et al., 2004; Schräpler, 2011). First, a binding maximum or minimum value for the data causes nonconformance. Second, assigned numbers, such as zip codes or check numbers, follow a uniform, rather than a Benford, distribution. Third, numbers influenced by human thought, such as ATM withdrawal amounts or price setting at US$9.99, are not Benford distributed. Fourth, data consisting of single transactions do not obey the law, either.
In addition, Benford analysis cannot detect fraud that involves the omission of transactions. It will also fail to detect fraud when made-up numbers conform to Benford’s Law. Nonetheless, there are only a few studies on this topic and they largely suggest that made-up numbers do not comply with Benford’s Law (Burns, 2009; Gauvrit, Houillon & Delahaye, 2017). 2
Applications to Nonprofit Financial Data
Nigrini (2012) noted an increasing number of academic publications on Benford’s Law, from 50 in 1975 to about 750 in 2012. However, there have been surprisingly few applications of Benford analysis to nonprofit financial data. To our knowledge, this article is the first academic study to apply Benford analysis to U.S. nonprofit data. We have located only two articles (Van Caneghem, 2016: Belgian nonprofit organizations; Dang, Burger, & Owens, 2019: Ugandan non-governmental organizations [NGOs]) and one working paper (Dang & Owens, 2019: UK charities) that applied Benford analysis to nonprofit financial data in other countries.
Dang et al. (2019) used Benford’s Law (first-digit analysis) to examine irregularities in financial reports from a representative sample of Ugandan NGOs. They found that 75% of the NGOs conformed to Benford’s Law and showed that the more-conforming NGOs had higher community satisfaction ratings. They also found that the NGOs that were more regularly required to submit reports to funders were more likely to show deviations from Benford’s Law, suggesting that the reporting burden may contribute to misreporting.
Van Caneghem (2016) analyzed financial reporting of Belgian nonprofits, applying a variant of Benford analysis that considers the frequency distribution of the second digit. He found that deviations were higher for small organizations and those heavily reliant on grants and donations. He found that 0 and 5 appeared as the second digit much more often than expected by Benford’s Law and that all other digits were observed less frequently than expected. He attributes this to roundoff error. This finding may, however, serve as an argument against opting for second digit analysis because roundoff error is a less important type of reporting inaccuracy—both due to the smaller size of the error and because it has a weaker analytical link to intentional misreporting. Van Caneghem (p. 2705 and fn. 19) noted that roundoff error is not “quantitatively material” by usual accounting standards, although he argued it is “qualitatively material” because it may exploit the psychological tendency to focus on the first digit.
Dang and Owens (2019) applied Benford (first-digit) analysis to financial data in a large sample of UK public charities and found that 25% of the sample potentially misreport their financial information. They also found that organizations with a higher program expense ratio provided were more likely to conform to Benford’s Law only when their overhead ratio was sufficiently high (i.e., spending at least 15% of total income on governance activities), thus challenging the common practice of using program and overhead ratios as indicators for nonprofit accountability.
This article is distinct from the aforementioned studies and makes three contributions to the literature. First, this is the first academic study to apply Benford analysis to financial data of U.S. 501(c)(3) public charities. U.S. nonprofits are embedded in a different set of cultural and legal institutions, employing different accounting standards, so comparison across countries is of interest. In addition, the economic impact of U.S. nonprofits is sufficient to warrant a separate investigation. Second, instead of advocating the use of Benford’s Law when examining nonprofit financial data (e.g., Dang et al., 2019; Dang & Owens, 2019), we check the face validity of Benford analysis in the nonprofit setting. There are very few applications of Benford analysis to nonprofit organizations, which have different incentives to report data accurately compared with the much more studied for-profits. Specifically, we test if the patterns of organizational conformance to Benford’s Law follow those predicted by prior research that finds misreporting of Form 990 in the United States and then compare our results with Benford analysis of nonprofit financial reporting in other countries. Third, as a step to alleviate potential false positives and improve the usefulness of Benford analysis, we apply more advanced inferential and descriptive statistics that were not used in the aforementioned papers, namely, the Freedman–Watson U2 test and the EXMAD statistic. Both may be useful when conducting Benford analysis in other contexts.
Theory and Hypotheses
The reliability of nonprofit financial reports is important, yet prior research provides evidence of misreporting. Some suggest that nonprofit financial misreporting could be unintentional, due to constraints in organizational resources, such as accounting expertise, management experience, and governance mechanisms (Keating et al., 2008; Yetman & Yetman, 2013). Others imply that misreporting could be a result of intentional managerial manipulation and that nonprofit managers may understate their administrative and/or fundraising expenses to improve their efficiency ratios (Jones & Roberts, 2006; Krishnan et al., 2006).
As a validity check for Benford’s Law, we consider whether the patterns of conformance to Benford’s Law align with prior research findings describing the misreporting of financial data in the U.S. nonprofit sector. 3 For this purpose, we propose eight hypotheses under three broader categories: professionalization, institutional pressure, and funder oversight.
Professionalization
The literature shows that misreporting can result from inadequate knowledge and skills in accounting and management. We consider three proxies for professionalization: the use of accrual accounting, external accountants, and paid officers/directors (vs. volunteers).
Accrual accounting
Form 990 requires organizations to report their accounting method: accrual, cash, or others (e.g., modified cash accounting). Although not required by the IRS, accrual accounting is recommended in the Generally Accepted Accounting Principles (GAAP) because it provides a more accurate picture of a firm’s overall financial health (Keating & Frumkin, 2003). Keating et al. (2008) suggested that the use of GAAP indicates organizations’ accounting sophistication, and they found that organizations with less accounting sophistication are more likely to misreport the costs of telemarketing campaigns on Form 990. Following previous literature, our first one-sided hypothesis is as follows:
External accountants
Some studies find that organizations using external professional accountants are more likely to properly report their financials on Form 990, compared with those with no external professional accountants (Keating et al., 2008; Krishnan et al., 2006). Other studies suggest that the use of outside accountants might increase reporting errors due to their limited knowledge about the organization (Froelich & Knoepfle, 1996). Because the evidence is mixed, the second hypothesis, in two-sided form, is as follows:
Paid officers/director
Some research suggests that professional managers have greater ability to produce high-quality financial reports (Tinkelman, 1999). Other research finds that nonprofit managers may misreport expenses to increase their compensation (e.g., Baber, Daniel, & Roberts, 2002; Krishnan et al., 2006). Principal–agent theory, which describes difficulties when a principal works with others (agents) who do not necessarily share the principal’s interest, provides an analytical explanation of why managers (as agents of the nonprofit board) may misreport. Because we do not know which effect is bigger, we test the two-sided hypothesis as follows:
Institutional Pressure
Rightly or wrongly, donors, charity watchdogs, regulators, and media all pressure nonprofits to have a low overhead ratio (i.e., fundraising and administration expenses divided by total expenses) and consequentially a high program ratio (i.e., program expenses divided by total expenses). Under this pressure, many nonprofits are found to enter a starvation cycle, underinvesting in their organizational infrastructure and particularly cutting staff wages and professional fees (Gregory & Howard, 2009; Lecy & Searing, 2015). Because there are limits to cutting actual overhead expenses, some nonprofits appear to misallocate part or all their fundraising and administrative expenses as program expenses, or report net donations (gross donations minus fundraising expenses) and zero fundraising expenses (Garven et al., 2016). The Urban Institute and Center on Philanthropy at Indiana University (2004) analyzed 220,000 Forms 990 (1999-2004) and reported that 37% of the nonprofit organizations with at least US$50,000 in contributions reported no fundraising costs and 25% of those receiving between US$1 million and US$5 million in contributions did the same. In addition, 13% of the nonprofits reported zero administrative costs. Similarly, in a sample of 73,107 Forms 990 reporting at least US$10,000 in total expenses (1998-2006), Yetman and Yetman (2013) found 36% of the organizations receiving at least US$10,000 in donations reported no fundraising expenses and 3% reported no administrative expenses. Following prior research, our one-sided hypotheses are as follows:
Funder Oversight
There are many principal–agent relationships among nonprofits and their stakeholders, with the nonprofit sometimes acting as the principal and sometimes as the agent (Steinberg, 2010). The manipulation of nonprofit financial reports by managers as agents of the board as principal was discussed in H3. Here, we have a principal–agent problem between the organization as the agent and outside stakeholders as principals. Donors and funders contribute to nonprofits to provide services (or values) and personal benefits (e.g., warm glow). They seek accurate information about the use of their contributions and may also use this information to decide whether (or how much) to donate in future. The nonprofits should use donations according to donor intent, but may fail to do so and manipulate financial data for continued donations/grant opportunities. Monitoring and oversight by funders reduce this potential manipulation (Hansmann, 1996).
Government agencies, as principals, use various monitoring strategies and oversight requirements to ensure that their agents provide accurate financial feedback regarding the use of their grant and contract receipts, using financial audits, quarterly fiscal reports, and other techniques (Van Slyke, 2007). Therefore, we formulate our one-sided hypothesis as:
Nonprofits often receive “indirect public support” through federated fundraising agencies (e.g., United Way). Federated campaigns frequently impose audit and other financial accountability requirements on participating organizations to ensure the reputability of the combined campaign. For example, the umbrella organization of United Way requires that local United Ways assure that all local member organizations “undergo annual financial audits conducted by independent certified public accountants whose examination complies with generally accepted auditing standards. In addition, United Ways have developed comprehensive requirements for completion of audited financial statements to ensure consistency and transparency system-wide” (United Way, n.d.). Our one-sided hypothesis is as follows:
Finally, some organizations have temporarily or permanently restricted assets. These time or purpose restrictions are imposed by major institutional or individual donors, who have the motivation and power to exert more oversight. Prior research provides evidence that organizations with donor-restricted assets are associated with less misreporting (e.g., Keating et al., 2008; Yetman & Yetman, 2013). Therefore, we propose the one-sided hypothesis as follows:
Data
Our data come from Form 990 filed by U.S. public charities, and sampling occurs at two levels. First, we draw a random sample of organizations from the National Center on Charitable Statistics (NCCS)-GuideStar National Nonprofit Research Database (“digitized data”). This database includes nearly all 501(c)(3) entities that were required to file IRS Form 990 or 990-EZ between 1998 and 2003. Second, we use a sample of financial records from each organization. The generalizability of our findings comes from the first sampling process, but the validity of Benford analysis depends on the second sampling process. The digitized data include almost every financial variable reported on Form 990 (NCCS, n.d.), but are not available after 2003. Subsequent datasets (e.g., the IRS Statistics of Income [SOI] files and NCCS Core files) do not include as many financial variables, so that Benford sample sizes may be inadequate. 4 Overall, Chang, Tuckman, and Chikoto-Schultz (2018) conclude (p. 25): “To date, the digitized 990 data from 1998-2003 remains the most nuanced or comprehensive dataset available.”
At the first sampling stage, we exclude organizations that file different forms—private foundations filing Form 990-PF and organizations filing Form 990-EZ. We also exclude those with unknown National Taxonomy of Exempt Entities (NTEE) codes. We then draw a stratified random sample of organizations by NTEE code in 2003 for our analyses, which includes 85,386 organization-year observations. 5 This sample is broadly representative of the population of digitized data in 6-year average total assets (μ = US$9,764,160 and σ = US$81,500,000 vs. US$9,418,343 and US$624,000,000 for the population), organizational age in 2004 (μ = 24 and σ =16 vs. 21 and 16) and, through stratification, subsector composition. The smaller sample variance in assets may be due to the omission of outliers from our sample.
For the second stage, we start with all numbers reported on Form 990’s financial statements, including Revenue and Expenses, Functional Expenses, and Balance Sheets. Following Nigrini (2011), we exclude reported values of zero (no first digit), negative numbers (usually analyzed separately), and numbers smaller than 10 (immaterial for investigative purposes). We also exclude totals and subtotals generated from other included data because they cannot be independently manipulated, numbers brought over from other sections because they should not be double-counted, and, conservatively, management and fundraising expense items (likely to be manipulated downward, the opposite of program expense items which would potentially be revised upward). The full list of variables is in Online Appendix 4. Finally, we exclude organizations that report fewer than 100 records. 6
Sample size varies across our analyses. When we test Benford compliance for the whole sample, we include all eligible numbers reported by all organizations for all available years (N = 2,096,305). For digital analysis at the organizational level, samples include all eligible numbers pooled across years per organization (N ranges from 100 to 468 for the 11,060 included organizations). For our hypothesis tests and regressions, data were missing on key variables in a small number of cases, reducing the total number of organizations (sample size) to 10,889.
Method
Our analysis consists of digital analysis and hypothesis tests.
Digital Analysis
We compare the observed first-digit frequencies of the sample Form 990 data with those predicted by Benford’s Law. We test the joint null hypothesis:
against the two-sided alternative, where pE(d) is the empirical probability (observed frequency) of the first digit equaling d. If this null hypothesis is rejected, then the organization warrants further investigation. 7
We employ three inferential tests: the Freedman–Watson U2 (U2), Pearson chi-square (χ2), and the modified Kolmogorov–Smirnov (KS; Joenssen, 2015) statistics. The χ2 and KS tests are traditional in Benford analysis but are based on the false assumption that the null distribution has a linear support. This means that when a leading digit of 9 is replaced by a leading digit of 1, the deviation from Benford’s Law is larger than when a leading digit of 1 is replaced by 2. However, first digits are circular—they grow from 1 to 2 to 3 . . . to 9 to 1 and around again. The U2 test is specifically designed for distributions with a circular support. Monte Carlo experiments find that the U2 test is more powerful than χ2 and KS tests in most cases (Lesperance, Reed, Stephens, Tsao, & Wilton, 2016). Therefore, we report the U2 test as the preferred test, and the χ2 and KS tests for robustness and comparability. 8
Inferential tests tend to over-reject the null hypothesis—the “false-positive problem”—because they test for exact conformance to a distribution that holds only asymptotically. In finite samples, data are only approximately Benford distributed. To deal with this, Nigrini (2011) and others recommend the MAD as an alternative in large samples. MAD calculates the sum of the absolute differences between the actual and expected proportions of each leading digit, divided by the total number of leading digits (i.e., 9 for first-digit analysis):
where AP is the actual proportion, EP is the expected proportion, and K is the total number of leading digits.
MAD grows larger as the average difference between the actual and expected distributions grows. Simulations by Amiram et al. (2015) show that MAD correlates well with errors introduced into financial statements. Based on analyses of multiple large real-world datasets (with N between about 3,000 and about 160 million), Drake and Nigrini (2000) developed and Nigrini (2012) updated an interpretive scale which, for first-digit analysis, labels MAD statistics less than 0.006 as indicating “close conformity” to the Benford distribution, MAD between 0.006 and 0.012 “acceptable conformity,” between 0.012 and 0.015 “marginally acceptable conformity,” and MAD greater than 0.015 “nonconformity.”
However, under the null hypothesis, MAD decreases as sample size N increases. Thresholds like Nigrini’s are independent of sample size and do not account for this. Therefore, Barney and Schulzke (2016) proposed a new descriptive measure, EXMAD, as the difference between MAD and its expected value under the null, conditional on N:
This correction is particularly important when we test organizational compliance because our sample size ranges from 100, with E(MAD|100) = 0.02352, to 468, with E(MAD|468) = 0.0105. 9 Thus, MAD is more likely to report false positives for smaller N than larger N. The correction is less important for larger samples, because E(MAD|N) changes less with N, so the false-positive problem is nearly independent of N when sample sizes are uniformly large.
The small number of records on financial statements (for both for-profits and nonprofits) necessitates the pooling of data across several years. Prior studies pool organizational-level data across time (e.g., Dang & Owens, 2019; Nigrini, 2011) or across organizations at a point in time (e.g., Amiram et al., 2015). When each dataset complies with Benford’s Law, the pooled data also comply; when each dataset fails to comply, the pooled data will generally not comply. In between, if data comply in 5 out of 6 years in our sample, we may fail to achieve statistical significance for the pooled data. In sum, although we may miss some organizations with intermittent problems, pooling across years will enable us to identify organizations with persistent data problems.
Hypothesis Testing
We test our hypotheses at the organizational level using independent sample t-tests to see if the average organizational MAD and EXMAD differ across the subsamples defined by each hypothesis. We verify these bivariate tests using ordinary least squares (OLS) regressions with robust standard errors, where organizational MAD (following, e.g., Amiram et al., 2015) or EXMAD are the dependent variables. Specifically, we estimate the effects of professionalization (use of accrual accounting, accounting fees, officer/director compensation), institutional pressure (zero fundraising expense but positive donations, zero administrative expense), and funder oversight (receipt of government grants, indirect public support, and temporarily or permanently restricted net assets) on the magnitude of deviation from the law. We also include organizational size, age, subsector, number of organizational records, and number of organizational 990s as control variables. Table 1 describes variable construction in more detail.
Variable Definitions.
Note. NTEE = National Taxonomy of Exempt Entities.
Results
Digital Analysis
Whole sample conformance
When we combine data from all organization-year records (N = 2,096,305), we find very close conformance to the Benford distribution (Figure 1). The average MAD statistic is 0.0008, indicating close conformity to Benford’s Law according to Nigrini’s (2012) threshold (Table 2). However, these tiny deviations are sufficient to reject the null hypothesis that the frequencies of the first digits follow the Benford distribution at p < .001 using all three statistical tests. With such a large sample size, the false-positive problem created by incomplete convergence is unlikely to be the cause of rejection, but the very low MAD suggests that on average, the deviations are not material in forensic accounting.

Whole sample compliance with Benford’s Law.
Benford Analysis for the Whole Sample and Organizations.
Note. Two-sample t-tests indicate that both average MAD and average excess MAD for conforming organizations are significantly smaller than those for nonconforming organizations by each test (p < .001). MAD = mean absolute deviation; KS = Kolmogorov–Smirnov.
Organizational conformance
To test each organization’s conformance to Benford’s Law, we conduct first-digit tests on all eligible numbers reported by each organization across all available years. We obtain the three test statistics, MAD, and EXMAD for the 11,060 organizations with at least 100 observations. Organizations are labeled as conforming when we cannot reject the null hypothesis at p ≤ .01. According to the preferred U2 test, about 66% of the organizations report data that conform with Benford’s Law, and results are similar using the other two tests, although the tests do not always agree on which organizations conform. 10 As expected, the average organizational MAD is larger than that of the whole sample (M = 0.029, SD = 0.010). The average organizational EXMAD is 0.011 (SD = 0.009). As surmised, conforming organizations have significantly smaller average MAD and EXMAD values than nonconforming organizations. Table 2 reports the test p-values, MAD, and EXMAD statistics for the whole sample and organizational analysis.
Hypothesis Testing
Table 3, panel 1 reports summary statistics for the sample used in all hypothesis tests, which contains fewer organizations (10,889) because of missing data on covariates. Table 3, panel 2 presents bivariate hypothesis tests for differences in average MAD and EXMAD, respectively, and Table 4 presents multivariate tests. 11
Summary Statistics for the Hypothesis Test Sample.Panel 1: Continuous Variables.
Note. N = 10,889 organizations. MAD = mean absolute deviation
Panel 2: Organizational MAD and EXMAD by Indicator Variables.
Note. MAD = mean absolute deviation; EXMAD = excess mean absolute deviation.
p < .01. ****p < .001.
OLS Regressions With Robust Standard Errors on Organizational MAD and EXMAD (N = 10,889).
Note. Both regressions controlled for 27 NTEE subsectors. Significance levels account for hypotheses: we test the one-sided hypothesis that the coefficient is positive for reported zero fundraising and management costs and the one-sided hypothesis that the coefficient is negative for grants, indirect costs, and donor-restricted funds. All other tests are two-sided. t statistics is given in parentheses. MAD = mean absolute deviation; EXMAD = excess mean absolute deviation; NTEE = National Taxonomy of Exempt Entities.
p < .10. **p < .05. ***p < .01. ****p < .001.
Professionalization
We hypothesize that organizations employing accrual accounting are more conformant than others, a result strongly supported in bivariate testing. MAD and EXMAD are significantly lower for the accrual accounting group (p < .0001). However, this indicator was statistically insignificant in both regressions. Organizations that pay accounting fees are significantly more conformant by all tests, suggesting that the positive effect of professional training outweighs any negative effect due to outside accountants’ lack of familiarity with the organization. Organizations with paid directors/officers are significantly more conformant by all tests, suggesting that the positive effect of director professionalization outweighs any agency problems.
Institutional pressure
Organizations that report spending nothing on fundraising while receiving donations in any year are significantly less conformant in our bivariate tests, but this effect is small and loses statistical significance in the regressions. Dang and Owens (2019) also fail to find significance for a similar variable in their regressions using British data. In our data, organizations that report zero management expenses also are less conformant in bivariate tests, but there is no statistical significance in the regressions.
Funder oversight
All our hypotheses are well-supported by results. Organizations that receive government grants and those receiving indirect support are significantly more conformant by all tests than those that do not receive these funds. Dang and Owens (2019) found the same result for receipt of government grants by British nonprofits, although they did not test indirect support. Organizations that report receiving temporarily or permanently restricted assets are significantly more conformant than others by all tests, but the level of statistical significance falls to borderline (p < 10%) in the regression explaining EXMAD. Dang and Owens (2019) also obtained borderline significance when explaining MAD (they did not test EXMAD).
Summary
All our signed hypotheses are confirmed by bivariate analyses at high levels of statistical significance. Regression results also provide overall support for most of our hypotheses, although we do not find statistical significance for the effects of accrual accounting and the two institutional pressure indicators. These results support the conclusion that high values for MAD and EXMAD raise valid suspicions of misreporting by U.S. nonprofits. As for the other covariates, older and larger organizations and those reporting data in more years have a statistically significant larger deviation from the Benford distribution. Dang and Owens (2019) do not only test organizational age but also find that larger British organizations have larger deviations. Van Caneghem’s (2016) analysis of Belgian nonprofits finds the opposite, using inferential statistics for the Benford distribution of the second digit. Using data on Ugandan nonprofits, Dang et al. (2019) found a U-shaped relationship between organizational age and MAD, with young and old organizations less compliant than the middle-aged. While having more financial records for Benford analysis is significantly and negatively associated with the degree of nonconformity in our data, having more annual reports demonstrates a significantly positive association (the same as Dang & Owens found for British nonprofits). MAD and EXMAD are both negative functions of the number of financial records, but the effect size drops by an order of magnitude for the latter. This suggests that EXMAD is doing its job of reducing the dependence of MAD on the number of records.
Discussion and Conclusion
This article provides the first Benford analysis of U.S. nonprofit financial data. We find support for using Benford analysis as a screening procedure to prioritize further investigation into nonprofit data integrity. Unlike prior research that advocates the use of Benford’s Law (e.g., Dang et al., 2019; Dang & Owens, 2019), we suggest that more methodological development is needed before its widespread application to nonprofit financial data.
Like those analyzing nonprofit data in other countries, we find that inferential statistical tests identify too many suspicious candidates to serve as useful screening devices by themselves. This “false-positive” problem has several sources. One possibility is that the wrong tests are being used. Traditional statistical tests measure departures from Benford’s Law by distance along a number line, rather than around a circle, neglecting the fact that leading digit 9 is equally distant from 8 and 1. We find that treating deviations using a distance line rather than a circle explains at best a small portion of a large number of false positives: the percent of organizations in our sample showing statistically significant nonconformance is 34% using the more valid U2 test, lower than the 40% obtained using χ2, but higher than the 28% using KS.
More importantly, the null hypothesis is that leading digits are exactly Benford-distributed, but the theory suggests that leading digits are only asymptotically Benford-distributed. Therefore, powerful statistical tests should reject the null in finite samples regardless of data integrity. So, like others, we turn to descriptive measures of the size of deviations from the Benford distribution, suspecting data quality when these measures exceed established thresholds. However, the false-positive problem appears to remain: too many organizations in our sample have MAD values exceeding Nigrini’s (2012) thresholds for conformity. We think this is largely because Nigrini’s thresholds were developed for large datasets with more than 3,000 financial records, but our organizational samples contain between 100 and 468 records. In large datasets, correcting for N is less important because the expected value of MAD is smaller and changes much more slowly with the increasing number of records, but a correction is important for small datasets like ours. The expected value of MAD ranges from 0.02352 for the organization with the fewest records (100) to 0.0105 for the organization with the most records in our sample (468). In this range, an MAD threshold that varies with the number of records is superior because, under the null hypothesis, the expected value of MAD is positive and inversely related to the number of financial records in the Benford set.
Thus, we turn to the excess of MAD over its expected value (EXMAD) as our main indicator (Barney & Schulzke, 2016). A positive EXMAD means data depart from the Benford distribution, whereas a positive MAD smaller than the expected value of MAD does not. EXMAD has face validity for our data because factors predicted to affect the size and direction of Benford deviations are highly significant in the regression. Ultimately, we think that EXMAD will provide the best tool for Benford analysis. However, Barney and Schulzke (2016) did not develop appropriate descriptive thresholds to signal EXMAD conformity with Benford’s Law, and this task is beyond our scope as well. Resolution of the false-positive problem will require more research using EXMAD, case studies of known fraud, and perhaps simulation analysis.
It may be that different descriptive thresholds are needed for nonprofits and for-profits. Nonprofit accounting standards differ from those used by for-profits and the economic incentives to manipulate data are also different. In particular, instructions for allocating expenses between the program, fundraising, and management expenses are complicated and ambiguous, so befuddled managers may use human judgment and approximation in reporting these items. These made-up numbers may provide a realistic reflection of functional expenses, but they are not expected to conform to Benford’s Law.
In conclusion, our results provide some support for using Benford analysis in nonprofit financial data. However, we also find that the first-digit analysis flags a high number of deviating organizations in our sample, indicating the potential false-positive problem. Thus, Benford analysis needs more refinement to function as an effective screening tool for egregious misreporting. We recommend the use of EXMAD and suggest future research to develop appropriate descriptive thresholds for when organizations conform or do not conform with Benford’s Law. However, more research is needed to discover the best combinations of Benford analysis and other forensic accounting tools to make screening useful to regulators, donors, the academic community, and other stakeholders. In the age of big data and machine learning, digitized access to the full range of Form 990 variables provides a relatively easy way to fast track progress toward understanding how we use forensic tools such as Benford’s Law more effectively to advance transparency and accountability in the nonprofit sector.
Supplemental Material
Benford_Online_Appendix – Supplemental material for Abiding by the Law? Using Benford’s Law to Examine the Accuracy of Nonprofit Financial Reports
Supplemental material, Benford_Online_Appendix for Abiding by the Law? Using Benford’s Law to Examine the Accuracy of Nonprofit Financial Reports by Heng Qu, Richard Steinberg and Ronelle Burger in Nonprofit and Voluntary Sector Quarterly
Footnotes
Authors’ Note
We would like to thank Canh Dang, Trudy Owens, Daniel Tinkelman, Enrique Pinzon, Eugene Steurle, Paul Arnsberger, attendees of conference and seminar presentations, and the anonymous referees for their comments that helped improve this paper.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
