The unfolding mystery of the numbers: First and second digits based comparative tests and its application to Ghana’s elections

Abstract

This study focuses on the use of digits-based test in anomaly detection in presidential elections in Ghana. Even though Ghana has conducted several successful elections to elect presidents, the outcomes of the elections have been challenged in courts on allegations of vote rigging and fraud. It has been established in the literature that for an election to be anomaly free, the following should be satisfied: the distribution of voters turn-out, the winners’ share and total valid votes cast in the election should be uni-modal. Therefore, we assess the applicability of both first and second digits-based tests to aid in the detection of possible anomaly in the 2016 and 2020 presidential election results data in Ghana. The Benford frequency distribution and Spearman rank correlation coefficient tests were used for the analysis of data obtained from the Electoral Commission of Ghana. The results show that the observed first digits distributions of valid vote counts for both New Patriotic Party (NPP) and National Democratic Congress (NDC), and the total valid votes cast (TVVC), in 2016 and 2020 are consistent with the distributional pattern of first digits postulated by Benford’s Law. However, the findings of the distribution of second digits of the valid vote counts for NPP and total valid vote cast in both 2016 and 2020 elections do not satisfy the probability distributional pattern of second digits according to the Benford’s Law. In view of these, we recommend using the first two digits-based tests to check for consistency of possible election anomaly between the first and second digits since it conveys more information.

Keywords

Benford’s law Spearman rank correlation coefficient first digits second digits valid votes cast Ghana

1. Introduction

In many naturally occurring groupings of numbers (such as annual earnings and election vote counts), the observed frequencies of leading digits are not uniform, despite what one may first assume. Instead, smaller numbers are more probable than more significant numbers to appear as the first digits. Formann (2010) shows that over 30 per cent of the entries in datasets begin with a 1, 18 per cent with a 2, and so on until the less frequent starting digit, 9, which occurs only about 5 per cent of the time. According to the digital law of Benford, these are the anticipated frequencies of these digits. Therefore, given empirical frequencies involving the distribution of digits in numbers, Benford’s law can be used for digit-based analysis to screen numerical datasets for anomalies such as false or fraudulent data. The digit-based study uses goodness-of-fit tests such as the Pearson Chi-square test among others, with a null hypothesis signalling compliance with the expected Benford’s law (da Fonseca, 2016). Also, in comparing the distribution of the hypothetical digits constructed by Benford and the observed distribution of the digits, both Deckert et al. (2011) and Mebane Jr (2013) suggest using Benford’s expected values and the mean of the distribution of the digits. However, the mean value is a weaker approach to establishing a possible election anomaly claim. Bërdufi (2014) therefore suggested the use of a non-parametric method for the analysis of digits distribution as a more robust technique in establishing the correlation between Benford’s distribution of digits and observed digits distribution.

For countries practising a democratic system of governance, free and fair elections are one of the tools used to sustain their democracies. A valid statistical technique employed by most researchers around the globe for detecting possible fraud or anomaly in elections is termed “election forensics”. Examples of these election forensics are the digit-based tests adopted from the popular Benford Law (Breunig and Goerres, 2011). Levin et al. (2009) have established that the distribution of the voter turnout only need not be uni-modal (possible absence of fraud or anomaly) to give strong evidence of an anomaly-free election. However, the distribution of the winners’ share and total valid votes cast in the election should also be uni-modal to guarantee the conclusion of a possible anomaly-free election.

Electoral irregularities and breaches are not new to Ghana’s electoral politics: they have electoral politics. They have characterized almost every election in the Fourth Republic of Ghana. For example, the election results in 2012 and 2020 were challenged in the Supreme Court (SC) of Ghana by the NPP and NDC respectively. The paths so far taken by the political parties, that is, resorting to court in settling such electoral disputes, are in tune with democratic practices. Rulings of the SC have contributed immensely towards calming the nerves of people and reducing the political tensions in the country during these periods. However, the phenomena continue to threaten the country’s quest for an enduring democracy. This is particularly so because, on such occasions, the political tempo rises, causing tension, insecurity, and demonstrations that sometimes turn violent. One would have thought that given the over three decades of the existence of the current democracy and the continuous desire of Ghanaians to maintain the status quo, the recent irregularities would have been a thing of the past, but this appears not to be the case as the anomalies continue to linger on (Aggrey-Darkoh &Asah-Asante, 2017; Jeffries, 1998; Nugent, 1995; Clarke, 1992).

In addition, experiences from Africa and Latin America indicate that such problems, if not handled properly, can lead to democratic reversals, where the inability of democracies to deliver the needed dividends has led to the return of the authoritarian rule (Ayee, 1998; Boafo-Arthur, 2008; Aggrey-Darkoh & Asah-Asante, 2017). Even though Ghana has had a stable democracy since the early 2000s, the presidential elections have been characterized by allegations of vote rigging by losing political parties and candidates. These parties/candidates have contested the results in the country’s high and supreme courts. Therefore, the importance of scientifically assessing anomalies in these elections over the years cannot be overemphasized. In this regard, we employ digits-based tests, i.e., the first and second, to Ghana’s presidential election results in 2016 and 2020. Specifically, the priority of this study is to determine whether the observed frequency distribution of the first and second digits of the valid votes cast for Ghana’s two most prominent political parties (NPP and NDC) and the total number of valid votes cast in the 2016 and 2020 presidential elections deviate from both frequencies of first and second digits distributional pattern postulated by Benford’s Law.

Digits-based tests as an election forensics technique have been employed in the detection of fraud or anomalies in elections in several countries: Nigeria (Tunmibi & Olatokun, 2021), Egypt (Ketchley, 2021), Afghanistan (Weidmann & Callen, 2013), Russia (Bader & Van Ham, 2015; Kalinin & Mebane, 2017), United States of America (Mebane et al., 2008), Albania (Bërdufi, 2014) and Iran (Mebane Jr, 2010; Roukema, 2014). In the case of Ketchley (2021), Benford’s second and last digits were used to analyze the 2018 presidential elections held in Egypt. The observed mean value of the second digits of the 13,087 polling stations deviated significantly from Benford’s expected mean value of 4.186, raising suspicion of a possible election anomaly. Similarly, the distribution of the last digits to the observed voters’ turnout demonstrated an absolute deviation from Benford’s anticipated uniform distribution, indicative of a data integrity problem.

Also, to complement Benford’s law, a local bootstrap method was adopted by Roukema (2014) to analyze possible election anomalies of the Iranian 2009 presidential elections using first digits frequencies of the vote count. The proposed local bootstrap method was first calibrated on the restricted set of first rounds of five presidential elections (2002–2006) before being applied to the 2009 election. Excess vote counts beginning with digit 7 led to other anomalies (one of these anomalies was significant at 1% and two at 0.1%) were observed. It was further disclosed that unless the five elections (2002–2006) favouring candidate A are taken into account, the Iranian pre-election opinion polls significantly contradict the official results. Thus, candidates R and K suffer an abrupt (70% $\pm$ 15%), dramatically decreasing performance if the five elections, represent normalised data.

Aside election anomaly detection, Benford’s law has been applied in many fields, including the determination of annual growth rates (Nye & Moul, 2007), forecasts of financial statement (Varian, 1972), balanced account data anomaly detection (Nigrini & Mittermaier, 1997; Nigrini, 1999, 2012, 1993; Asllani & Naco, 2014), income tax (Nigrini, 1996; Berger & Hill, 2015), stock market indices (Zhao & Wu, 2010), stock market prices (Pietronero et al., 2001), frequency of prime numbers (Luque & Lacasa, 2009), survey data (Judge & Schechter, 2009; Kaiser, 2019), image forensics (Crisan et al., 2021) and fraudulent scientific data (Diekmann, 2007). This study seeks to show how the digital distribution of valid votes cast for NPP and NDC and total valid votes cast (TVVC) can help detect possible anomalies in elections.

Therefore, in this research, we test the following hypotheses:

(i)
$H_{01}$ : No statistically significant relationship exists between each party (NPP, or NDC, or TVVC) observed first digits distribution of election results and Benford’s distribution of first digits.
(ii)
$H_{02}$ : No statistically significant relationship exists between each party (NPP, or NDC, or TVVC) observed first digits distribution of election results and Benford’s distribution of second digits.

The remainder of the paper is organized as follows. Section 2 discusses the data and methods used for the study. Section 3 presents the results and key findings, while Section 4 presents the results of hypotheses testing. In Section 5, concluding remarks, recommendations and areas of future studies are provided.
2. Data and methods

The study made use of secondary data comprising of 2016 and 2020 presidential election certified results based on the 275 constituencies in Ghana. The data was obtained from the Electoral Commission (EC) of Ghana, and it is available at https://ec.gov.gh/. A purposive non-probability sampling procedure was employed in selecting the 2016 and 2020 presidential elections conducted in Ghana. The two major political parties in Ghana (NPP and NDC) were included in the study because of their vote share since the first election in 1992 – with at least five digits of total valid votes cast. A function was created to extract a vector of the first and second digits of valid vote counts (both NPP and NDC), and Benford’s observed and Benford (expected) proportions, using the certified election results in the 2016 and 2020 presidential elections respectively. Likewise, a vector of the first and second digits of TVVC was also extracted with their corresponding observed proportions. Lesperance et al. (2016) assert that many researchers, in an attempt to demonstrate conformance with Benford’s Law, use simple statistical techniques such as visual plots and individual confidence intervals for digits probabilities, among others. However, these methods may be inaccurate and inefficient while lacking statistical power to detect departures from Benford’s Law. Furthermore, da Fonseca (2016) states that the classical significance tests with fixed dimensions have more power in large samples, and hence, are able to detect divergence from Benford’s law. However, this may result in the rejection of Benford’s law in samples where the deviations from it are without practical importance, and consequently non-erroneous or anomaly free samples are likely to be classified as erroneous or fraudulent. As a results, in this paper, we resort to the Benford’s frequency distribution and Spearman’s rank correlation coefficient (SRCC) tests are the main statistical tools used for the analysis of data obtained from the Electoral Commission of Ghana to override the limitation of using the traditional classical test of hypothesis discussed above.

2.1 Hypothesis testing

The Benford first-digit mean is given by,

$\displaystyle BL_{1}=\sum_{j=1}^{9}g_{j}Pr(G_{j}=g_{j})=3.441$ (1)

where $g_{j}\in(1,2,\ldots,9)$ and $Pr(G_{j}=g_{j})$ are Benford’s first digit proposed proportions presented in Table 1.

Similarly, the Benford second-digit mean is given by,

$\displaystyle BL_{2}=\sum_{k=0}^{9}g_{k}Pr(G_{k}=g_{k})=4.186$ (2)

where $g_{k}\in(0,1,\ldots,9)$ and $Pr(G_{k}=g_{k})$ are Benford’s second digit proposed proportions presented in Table 3.

The non-parametric SRCC correlation test is adopted for the study to address deviation from normality assumption. Also, the digits (observed first and second digits for the votes and expected first and second digits of Benford’s distribution) under consideration are ordered categories.

2.2 Benford’s law for a collection of digits

Digit analysis detects fraudulent or erroneous data based on empirical regularities regarding significant digits. In digit analysis, a baseline frequency distribution representing normal behaviour is modelled, and then some particular datasets are tested to see if they deviate significantly from it (Bolton & Hand, 2002). Specifically, the application of Benford’s Law to detect false or erroneous data is just a more complex form of digit analysis. Varian (1972) was the first researcher to propose the application of Benford’s Law to digit analysis. Newcomb (1881) and Benford (1938) each after the realization of the numerical phenomena existence of any given digit derived the empirical probabilities of occurrence of the first and second digits. Benford (1938) later extended this phenomenon by providing the empirical probabilities of the occurrence of any digit.

The marginal probability mass function (PMF) of the first digit $(G_{1})$ is given by:

$\displaystyle Pr(G_{1}=g_{1})=\log\left(\frac{g_{1}+1}{g_{1}}\right),\ \ \ g_{% 1}\in(1,2,\ldots,9).$ (3)

Likewise, the marginal probability mass function (PMF) of the second-digit $(G_{2})$ is given by:

$\displaystyle Pr(G_{2}=g_{2})=\sum_{g_{1}=1}^{9}\log\left(1+\frac{1}{10g_{1}+g% _{2}}\right),\ \ \ g_{2}\in(0,1,\ldots,9).$ (4)

Thus, it can be observed that the marginal probability mass function (PMF) of the $n^{\text{th}}$ most significant digit $(G_{n})$ is given by:

$\displaystyle Pr(G_{n}=g_{n})=\sum_{g_{1}=1}^{9}\sum_{g_{2}=0}^{10}\cdots\sum_% {g_{n-1}=0}^{10}\log\left(1+\frac{1}{\sum_{j=1}^{n}10^{n-j}g_{j}}\right),\ \ % \ g_{j}\in(0,1,\ldots,9).$ (5)

In general, for the first $m$ significant digits, $g_{1},g_{2},\ldots,g_{m}$ ,

$\displaystyle Pr(G_{1}=g_{1},G_{2}=g_{2},\ldots,G_{m}=g_{k})=\log\left[1+\left% (\sum_{j=1}^{m}g_{j}\times 10^{m-j}\right)^{-1}\right].$ (6)

We note that, Eqs (1) and (2) are used for the computation of the Benford’s first and second digits’ mean values respectively and these formulas were used in the calculation of all the other mean values.

For example, in Table 1, the first digit mean value of NPP was calculated as follows:

$\displaystyle BL_{1\textit{NPP}}=\sum_{j=1}^{9}g_{j}Pr\left(G_{j}=g_{j}\right)% =(1\times 0.327)+(2\times 0.255)+\ldots+(9\times 0.022)=2.902.$ (7)

and the first digit mean value of TVVC was calculated as:

$\displaystyle BL_{1\textit{TVVC}}=\sum_{j=1}^{9}g_{j}Pr\left(G_{j}=g_{j}\right% )=(1\times 0.102)+(2\times 0.262)+\ldots+(9\times 0.007)=3.402.$ (8)

Likewise, in Table 3, the second digit mean value of NPP was calculated as:

$\displaystyle BL_{2\textit{NPP}}=\sum_{k=0}^{9}g_{k}Pr\left(G_{k}=g_{k}\right)% =(0\times 0.113)+(1\times 0.127)+\ldots+(9\times 0.062)=4.171.$ (9)

and similarly all the other are obtained as shown in Tables 1–4.

3. Results and discussions

The empirical findings from the study are presented in this section.

3.1 Distribution of first digits in 2016 and 2020 elections

First, we apply Benford’s first digit distributional pattern in (3) to the valid vote cast for NPP and NDC in the 2016 presidential election in Ghana, and the results are shown in Table 1. The distribution of the observed frequency of each digit is shown in the table, together with the corresponding Benford proportions. It can be observed that the mean value of the NPP’s 2016 distribution of valid votes of 2.902 is closer to the expected value of 3.441 under Benford’s law compared to the NDC’s expected value of 2.675. Also, according to Table 1, several observed proportions of the total valid votes cast in the 2016 presidential election differ from Benford’s proportion. The first-digit distribution has digit six closer to Benford’s posited proportion for the TVVC in 2016. The Benford mean value of the first digits, 3.441, in 2016, is closer to the observed mean of 3.402.

Table 1
Distribution of frequency of first digits for NPP, NDC and TVVC in 2016

	First digits	NPP		NDC		TVVC		Benford’s proportions
		Observed counts	Observed prop.	Observed counts	Observed prop.	Observed counts	Observed prop.
	1	90	0.327	123	0.447	28	0.102	0.301
	2	70	0.255	69	0.251	72	0.262	0.176
	3	41	0.149	20	0.073	69	0.251	0.125
	4	21	0.076	9	0.033	38	0.138	0.097
	5	12	0.044	15	0.055	32	0.116	0.079
	6	11	0.040	7	0.025	17	0.062	0.067
	7	11	0.040	8	0.029	11	0.040	0.058
	8	13	0.047	10	0.036	6	0.022	0.051
	9	6	0.022	14	0.051	2	0.007	0.046
Total		275		275		275
Mean			2.902		2.675		3.402	3.441

In addition, Table 2 presents the probability distribution of the first digits of valid votes cast for NPP and NDC in Ghana’s 2020 presidential elections. The frequency distribution of each digit’s observed counts is also shown in the table, along with the corresponding Benford’s predicted proportions. Similar to the 2016 election results, the observed mean value, 2.979, of the distribution of the first digits of valid votes cast for NPP is closer to Benford’s projected mean value of 3.441 than the NDC.

Table 2

Distribution of frequency of first digits for NPP, NDC and TVVC in 2020

	First digits	NPP		NDC		TVVC		Benford’s proportions
		Observed counts	Observed prop.	Observed counts	Observed prop.	Observed counts	Observed prop.
	1	84	0.305	107	0.389	23	0.084	0.301
	2	68	0.247	78	0.284	46	0.167	0.176
	3	44	0.160	36	0.131	63	0.229	0.125
	4	23	0.084	13	0.047	55	0.200	0.097
	5	18	0.065	9	0.033	32	0.116	0.079
	6	11	0.040	9	0.033	24	0.087	0.067
	7	7	0.025	11	0.040	14	0.051	0.058
	8	9	0.033	6	0.022	9	0.033	0.051
	9	11	0.040	6	0.022	9	0.033	0.046
Total		275		275		275
Mean			2.979		2.555		3.925	3.441

Furthermore, several observed first digits proportions of the TVVC differ from Benford’s proportions. However, the first digits distribution of digits 2, 7 and 9 are closer to Benford’s postulated proportion for the total valid votes cast. Therefore, for the first digit distribution, NDC has mean values that are farther away from Benford’s expected values compared with NPP in both 2016 and 2020 elections. High discrepancies in mean values (Benford and observed digits) may indicate possible anomaly in the valid votes count of NDC in both 2016 and 2020 elections.

3.2 Distribution of second digits in the 2016 and 2020 presidential elections

In this subsection, we apply Benford’s expected second-digit distribution to the valid vote cast for NPP and NDC in 2016 and 2020 presidential elections. The probability distribution of the second digits of valid votes cast for NPP and NDC in the 2016 presidential election is shown in Table 3. The observed frequency distribution of each digit is also shown in the table jointly with the corresponding Benford’s projected proportions. In comparison to the NDC observed mean (3.872) in 2016, NPP’s observed mean (4.171) is closer to Benford’s expected value (4.186). The probability distribution of second digits 1, 4, and 6 of the TVVC in 2016 have proportions that are closer to Benford’s hypothesised values.

Table 3
Distribution of frequency of second digits for NPP, NDC and TVVC in 2016

	First digits	NPP		NDC		TVVC		Benford’s proportions
		Observed counts	Observed prop.	Observed counts	Observed prop.	Observed counts	Observed prop.
	0	31	0.113	31	0.113	17	0.062	0.120
	1	35	0.127	32	0.116	36	0.131	0.114
	2	34	0.124	39	0.142	21	0.076	0.109
	3	27	0.098	33	0.120	23	0.084	0.104
	4	18	0.065	33	0.120	25	0.091	0.100
	5	26	0.095	26	0.095	24	0.087	0.097
	6	32	0.116	25	0.091	28	0.102	0.093
	7	24	0.087	21	0.076	36	0.131	0.090
	8	31	0.113	20	0.073	35	0.127	0.088
	9	17	0.062	15	0.055	30	0.109	0.085
Total		275		275		275
Mean			4.171		3.872		4.860	4.186

Also, the probability distribution of the second digits of valid votes cast for NPP and NDC in the 2020 presidential elections are shown in Table 4. The NDC’s second-digit distribution mean, 4.191, is quite closer to Benford’s second-digit mean, 4.186, than the NPP, i.e., 4.579.

Table 4

Distribution of frequency of second digits for NPP, NDC and TVVC in 2020

	First digits	NPP		NDC		TVVC		Benford’s proportions
		Observed counts	Observed prop.	Observed counts	Observed prop.	Observed counts	Observed prop.
	0	21	0.076	37	0.135	20	0.073	0.120
	1	30	0.109	29	0.105	31	0.113	0.114
	2	35	0.127	27	0.098	29	0.105	0.109
	3	23	0.084	30	0.109	29	0.105	0.104
	4	22	0.080	33	0.120	25	0.109	0.100
	5	30	0.109	16	0.058	25	0.109	0.097
	6	32	0.116	28	0.102	24	0.087	0.093
	7	28	0.102	26	0.095	34	0.124	0.090
	8	23	0.084	26	0.095	31	0.113	0.088
	9	31	0.113	23	0.084	27	0.098	0.085
Total		275		275		275
Mean			4.579		4.191		4.633	4.186

Therefore, in general, for the distribution of second digits, NDC has a mean that is farther away from Benford’s expected value compared to the NPP in the 2016 election. High discrepancy in mean (Benford and observed digits) is indicative of a possible anomaly in the valid votes count of NDC in the 2016 election. On the contrary, for the distribution of second digits, NPP has a mean that is farther away from Benford’s expected value compared with the NDC in the 2020. Thus, we can conclude that there is a possible anomaly in the valid votes count of NPP in 2020 elections. However, several observed proportions of the second digits’ distribution of the total valid votes cast in the 2020 presidential election conforms to Benford’s proportions. Specifically, the observed proportions of second digits 0, 8, and 9 of the total valid votes cast are closer to Benford’s hypothesized distributional oportions. Moreover, in the case of the mean values, the TVVC value of the second digits for the 2020 elections is closer to Benford’s expected value for TVVC 2016 elections.

Figure 1.

First and Second digits distribution between Benford, NPP’s votes and NDC’s votes in 2016 and 2020.

Based on the distribution of second digits of both NPP and NDC, Fig. 1a indicates that most observed proportions do not conform to Benford’s specified proportions. In the 2016 presidential elections, the distribution of the digits 3 and 7 are closer to Benford’s second-digit distribution for NPP, while the distribution of the digits 1, 5, and 6 are closer to Benford’s second-digit distribution for NDC.

Also, Fig. 1b shows that most of the observed proportions for the first digits of both NPP and NDC in the 2020 presidential elections are much closer to Benford’s posited proportions for the first digit distribution. For NPP, the distribution of digits 2 and 3 are farther away from Benford’s first digit probability distribution than the other digits. On the other hand, for NDC, more digits 1, 2, 4, and 5 are farther away from the Benford’s first-digit distribution of the valid votes in the 2020 election.

In addition, based on the distribution of second digits of both NPP and NDC, Fig. 1c indicates that most observed proportions do not conform to Benford’s specified proportions. In the 2016 presidential election, the distribution of the digits 3 and 7 do not differ much from the Benford’s second-digit distribution for NPP. Similarly, the distribution of the digits 1, 5, and 6 are closer to Benford’s second-digit distribution for NDC.

Moreover, from Fig. 1d, most of the observed proportions for the second digits distribution of both NPP and NDC deviate from Benford’s expected proportions for the distribution of second digits. The nearest to Benford’s second digits distribution for the valid votes cast for both NPP and NDC in the 2020 presidential elections is the distribution of digits 1, 7, and 8.

4. Tests of hypotheses

We conduct a formal hypothesis test for the null hypotheses stated in Section 1. The results of the hypotheses tests are presented in this section.

Table 5
Spearman rank correlation coefficient test for the digits distribution of 2016 and 2020 votes

Digits	Criteria	Correlation coefficient	$P$ -value	Sample size
First	NPP/Benford (2016)	0.895	0.001	9
	NDC/Benford (2016)	0.667	0.006	9
	NPP/Benford (2020)	0.912	0.001	9
	NDC/Benford (2020)	0.941	0.000	9
First	TVVC/Benford (2016)	0.833	0.008	9
	TVVC/Benford (2020)	0.695	0.038	9
Second	NPP/Benford (2016)	0.565	0.089	10
	NDC/Benford (2016)	0.802	0.005	10
	NPP/Benford (2020)	$-$ 0.230	0.520	10
	NDC/Benford (2020)	0.717	0.020	10
Second	TVVC/Benford (2016)	$-$ 0.541	0.106	10
	TVVC/Benford (2020)	$-$ 0.244	0.496	10

(i)

$H_{01}$ : No statistically significant relationship exists between each party (NPP, or NDC, or TVVC) observed first digits distribution of election results and Benford’s distribution of first digits.

The SRCC test for the NPP and NDC first digits distribution in the 2016 and 2020 presidential elections is shown in Table 5. In both the 2016 and 2020 presidential elections, there is a statistically significant relationship between Benford’s proposed distribution of first digits and the distribution of first digits of valid votes cast for the NPP ( $p$ -values of $<$ 0.05). The extent of the relationship between Benford’s proposed first-digit distribution and the distribution of valid votes cast for the NDC in the 2016 and 2020 presidential elections show significant correlations, with respective $p$ -values of 0.006 and 0.000 ( $p$ -values $<$ 0.05). Thus, the distribution of first-digit valid votes projected by Benford is consistent with that of first-digit valid votes cast for NPP and NDC in 2016 and 2020. Furthermore, a statistically significant correlation exists between the first digits’ observed distributions of total valid votes cast in the 2016 and 2020 presidential elections and the Benford distribution of first digits, with respective $p$ -values of 0.008 and 0.038 ( $p$ -values $<$ 0.05) as shown in Table 5. Thus, we conclude that the distribution of the observed first digits of the total valid votes cast in both the 2016 and 2020 presidential elections conformed to Benford’s hypothesized distribution.

(ii)

$H_{02}$ : No statistically significant relationship exists between each party (NPP, or NDC, or TVVC) observed first digits distribution of election results and Benford’s distribution of second digits.

Also, as demonstrated in Table 5, there is no statistically significant relationship between the second digits of valid votes cast for NPP in the 2016 and 2020 presidential elections and the distribution of second digits specified by Benford, with $p$ -values of 0.089 and 0.520, respectively ( $p$ -values $>$ 0.05). Hence, we fail to reject the null hypothesis of no relationship. Consequently, the NPP’s second-digit distribution did not follow the projected Benford distribution in the 2016 and 2020 presidential elections. In contrast, there is a statistically significant relationship between Benford’s second-digit distribution and the observed second-digit distribution of the valid votes cast for NDC in the presidential elections of 2016 and 2020, with respective $p$ -values of 0.005 and 0.020 ( $p$ -values $<$ 0.05). As a result, we reject the null hypothesis. This shows that NDC’s valid votes are consistent with Benford’s predictions of second-digit distributions in 2016 and 2020. Table 5 shows that there is no statistically significant association between Benford’s distribution of second digits and observed distributions of second digits in the presidential elections of 2016 and 2020, with $p$ -values of 0.106 and 0.496, respectively. As a result, we fail to reject the null hypothesis. This demonstrates that the observed second digits distribution of total valid votes cast in the 2016 and 2020 presidential elections did not follow Benford’s projections for the distribution of the second digits.

5. Conclusion

In this study, we resort to the applicability of digits-based test in anomaly detection in presidential elections in Ghana. In the previous section, we observed that the first digits distribution of valid votes cast for both parties, NPP and NDC, in Ghana’s presidential elections in 2016 and 2020 conforms to the Benford’s expected distribution for first digits. Also, the distribution of the first digits of total valid votes cast conforms to Benford’s law. Thus, these findings of the first digits suggest a possible anomaly-free presidential election in 2016 and 2020. In addition, the findings of the first digit distribution for the 2020 presidential election conform to the results from Nortey et al. (2022), where the authors employ a Bayesian methodology in validating the credibility of the 2020 presidential election in Ghana. In contrast, the distribution of the second digits of valid votes cast for the NPP in the 2016 and 2020 presidential elections deviate from the Benford’s hypothesized distribution, whilst the NDC’s second-digit distribution conforms to the Benford expected distribution. Also, the distribution of the second digit of the total valid votes cast in 2016 and 2020 deviates from the possible Benford distribution for second digits. These findings conform to previous research into election anomalies such as Breunig and Goerres (2011), Ketchley (2021) and Tunmibi and Olatokun (2021). In the case of the mean, discrepancies between Benford’s mean and the mean of observed digits distribution give rise to high suspicion of possible anomalies. For the distribution of the first digits, high discrepancies in mean values (Benford and observed digits) amount to high suspicion of a possible anomaly in the valid votes count of NDC (2016, 2020) and TVVC (2020). Also, for the analysis of the second-digit distribution, high discrepancies in the mean of both Benford and observed digits give rise to high suspicion of possible fraud claims of valid vote counts of NDC (2016), NPP (2020) and TVVC (2016, 2020). Therefore, to further assess the plausibility of these conclusions, we resorted to the Spearman rank correlation coefficient test. The hypotheses test results indicate that the distribution of the first digits of valid votes cast for NPP and NDC and the Total Valid Votes in the 2016 and 2020 presidential elections comply with the expected distribution under Benford’s law. Thus, there is no statistically significant difference between observed and expected means in valid votes cast for NPP and NDC, and the total valid votes implying possible anomaly-free elections. The findings of the first digit distribution seem to conflict with the results of the second digit distribution for the valid vote cast for NPP and the total valid votes cast in 2016 and 2020. Therefore, we recommend that to check for consistency in election anomaly detection between the first and second-digit-based tests, the first two digits must be used since it captures more information. For future research, digit analysis using Benford’s Law under the framework of Bayesian would be explored.

Footnotes

Acknowledgments

The authors thank the Electoral Commission of Ghana for providing access to the election dataset used for the research.

Conflict of interest

The authors declare that there are no conflicts of interest.

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Aggrey-Darkoh

, & Asah-Asante

(2017). Supreme court ruling on 2012 electoral dispute in ghana: A recipe for chaos or deepening of ghana’s democracy? GHANA’S, page 125.

Asllani

, & Naco

(2014). Using benford’s law for fraud detection in accounting practices. Journal of Social Science Studies, 2(1), 129–143.

Ayee

J.R.

(1998). The 1996 General elections and democratic consolidation in Ghana. Department of Political Science University of Ghana.

Bader

, & Van Ham

(2015). What explains regional variation in election fraud? evidence from russia: A research note. Post-Soviet Affairs, 31(6), 514–528.

Benford

(1938). The law of anomalous numbers. Proceedings of the American philosophical society, pages 551–572.

Bërdufi

(2014). Statistical detection of vote count fraud (2009 Albanian parliamentary election and Benford’s law). Mediterranean Journal of Social Sciences, 5(2), 755.

Berger

, & Hill

T.P.

(2015). An introduction to Benford’s law. Princeton University Press.

Boafo-Arthur

(2008). Democracy and stability in West Africa: the Ghanaian experience. Nordiska Afrikainstitutet; Department of Peace and Conflict Research …

Bolton

R.J.

, & Hand

D.J.

(2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235–255.

10.

Breunig

, & Goerres

(2011). Searching for electoral irregularities in an established democracy: Applying benford’s law tests to bundestag elections in unified germany. Electoral Studies, 30(3), 534–545.

11.

Clarke

(1992). The Presidential Election in Ghana, 3 November, 1992: The Report of the Commonwealth Observer Group. London: Commonwealth Secretariat.

12.

Conover

W.J.

(1999). Practical nonparametric statistics, volume 350. john wiley & sons.

13.

Crisan

Irimia

Gota

Miclea

Puscasiu

Stan

, & Valean

(2021). Analyzing benford’s law’s powerful applications in image forensics. Applied Sciences, 11(23), 11482.

14.

da Fonseca

P.M.T.

(2016). Digit Analysis Using Benford’s Law: A Bayesian Approach. PhD thesis, Universidade de Lisboa (Portugal).

15.

Deckert

Myagkov

, & Ordeshook

P.C.

(2011). Benford’s law and the detection of election fraud. Political Analysis, 19(3), 245–268.

16.

Diekmann

(2007). Not the first digit! using benford’s law to detect fraudulent scientif ic data. Journal of Applied Statistics, 34(3), 321–329.

17.

Formann

A.K.

(2010). The newcomb-benford law in its relation to some common distributions. PloS One, 5(5), e10541.

18.

Jeffries

(1998). The ghanaian elections of 1996: Towards the consolidation of democracy? African Affairs, 97(387), 189–208.

19.

Judge

, & Schechter

(2009). Detecting problems in survey data using benford’s law. Journal of Human Resources, 44(1), 1–24.

20.

Kaiser

(2019). Benford’s law as an indicator of survey reliability – can we trust our data? Journal of Economic Surveys, 33(5), 1602–1618.

21.

Kalinin

, & Mebane

W.R.

(2017). Worst election ever in russia? Available at SSRN 2959824.

22.

Ketchley

(2021). Fraud in the 2018 egyptian presidential election? Mediterranean Politics, 26(1), 117–129.

23.

Lesperance

Reed

Stephens

Tsao

, & Wilton

(2016). Assessing conformance with benford’s law: Goodness-of-fit tests and simultaneous confidence intervals. PloS One, 11(3), e0151235.

24.

Levin

Cohn

Ordeshook

P.C.

, & Alvarez

R.M.

(2009). Detecting voter fraud in an electronic voting context: An analysis of the unlimited reelection vote in venezuela. In EVT/WOTE.

25.

Luque

, & Lacasa

(2009). The first-digit frequencies of prime numbers and riemann zeta zeros. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 465, 2197–2216.

26.

Mebane

W.R.

Alvarez

R.M.

Hall

T.E.

, & Hyde

S.D.

(2008). Election forensics: The second-digit benford’s law test and recent american presidential elections. Election fraud: Detecting and deterring electoral manipulation, pages 162–181.

27.

Mebane Jr

W.R.

(2010). Fraud in the 2009 presidential election in iran? Chance, 23(1), 6–15.

28.

Mebane Jr

W.R.

(2013). Election forensics: The meanings of precinct vote counts’ second digits. In Summer Meeting of the Political Methodology Society, University of Virginia.

29.

Newcomb

(1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics, 4(1), 39–40.

30.

Nigrini

M.J.

(1993). The detection of income tax evasion through an analysis of digital distributions. University of Cincinnati.

31.

Nigrini

M.J.

(1996). A taxpayer compliance application of benford’s law. The Journal of the American Taxation Association, 18(1), 72.

32.

Nigrini

M.J.

(1999). Adding value with digital analysis. Internal Auditor, 56(1), 21–24.

33.

Nigrini

M.J.

(2012). Benford’s Law: Applications for forensic accounting, auditing, and fraud detection, volume 586. John Wiley & Sons.

34.

Nigrini

M.J.

, & Mittermaier

L.J.

(1997). The use of benford’s law as an aid in analytical procedures. Auditing, 16(2), 52.

35.

Nortey

E.N.N.

Agyemang

E.F.

Minkah

, & Asah-Asante

(2022). Bayesian estimation of presidential elections in ghana: A validation approach. African Journal of Applied Statistics, 9(1), 1297–1317.

36.

Nugent

(1995). Big Men, Small Boys, and Politics in Ghana: Power, Ideology, and the Burden of History, 1982–1994. Asempa Publishers, Christian Council of Ghana.

37.

Nye

, & Moul

(2007). The political economy of numbers: On the application of benford’s law to international macroeconomic statistics. The BE Journal of Macroeconomics, 7(1).

38.

Pietronero

Tosatti

, & Vespignani

(2001). Explaining the uneven distribution of numbers in nature: The laws of benford and zipf. Physica A: Statistical Mechanics and its Applications, 293(1-2), 297–304.

39.

Roukema

B.F.

(2014). A first-digit anomaly in the 2009 iranian presidential election. Journal of Applied Statistics, 41(1), 164–199.

40.

Tunmibi

, & Olatokun

(2021). Application of digits based test to analyse presidential election data in nigeria. Commonwealth & Comparative Politics, 59(1), 1–24.

41.

Varian

H.R.

(1972). Benfords law. American Statistician, 26(3), 65.

42.

Weidmann

N.B.

, & Callen

(2013). Violence and election fraud: Evidence from afghanistan. British Journal of Political Science, 43(1), 53–75.

43.

Zhao

, & Wu

(2010). Does chinese stock indices agree with benford’s law? 2010 International Conference on Management and Service Science, pp. 1–3.

The unfolding mystery of the numbers: First and second digits based comparative tests and its application to Ghana’s elections

Abstract

Keywords

1. Introduction

2.1 Hypothesis testing

3.1 Distribution of first digits in 2016 and 2020 elections

Table 1 Distribution of frequency of first digits for NPP, NDC and TVVC in 2016

Table 3 Distribution of frequency of second digits for NPP, NDC and TVVC in 2016

Table 5 Spearman rank correlation coefficient test for the digits distribution of 2016 and 2020 votes

Footnotes

Acknowledgments

Conflict of interest

Data availability

References

Table 1
Distribution of frequency of first digits for NPP, NDC and TVVC in 2016

Table 3
Distribution of frequency of second digits for NPP, NDC and TVVC in 2016

Table 5
Spearman rank correlation coefficient test for the digits distribution of 2016 and 2020 votes