A probabilistic epidemiological model

Abstract

We construct a model for the progress of the 2020 coronavirus epidemic in the United States of America, using probabilistic methods rather than the traditional compartmental model. We employ the generalized beta family of distributions, including those supported on bounded intervals and those supported on semi-infinite intervals. We compare the best-fit distributions for daily new cases and daily new deaths in America to the corresponding distributions for United Kingdom, Spain, and Italy. We explore how such a model might be justified theoretically in comparison to the apparently more natural compartmental model. We compare forecasts based on these models to observations, and find the forecasts useful in predicting total pandemic deaths.

Keywords

Epidemic model probabilistic generalized beta compartmental coronavirus

1. Introduction

The onset of the coronavirus pandemic in early 2020 led to global anxiety. Everyone wanted to know what to expect. This was particularly true of governments, who had to plan public health policies, and other policy responses. The two primary time series of interest to policy makers are the daily number of new cases, and the daily number of new deaths. Related time series also played a prominent role in national discussions, such as the total number of hospitalizations, and usage of other resources.

The traditional approach to the time series of cases and deaths has been the compartmental epidemiological model, particularly the SIR model (Kermack and McKendrick). Here S $=$ susceptible population, I $=$ infected population, and R $=$ recovered or removed population (removed either through recovery or through death). S precedes I, and I precedes R. At each stage of the process, there is a certain probability of advancing to the next stage.

The SIR model is difficult to use, being a solution of a system of three first-order, but non-linear, ordinary differential equations. The system does not have a general closed-form solution, so solutions are approximated using numerical methods. The parameters of the model are based on educated guesses, such as the ever-changing value of the basic reproduction number, $R_{0}$ .

Models produced early in the pandemic predicted catastrophe, with millions of deaths forecast in the United States by the model produced by the MRC Centre for Global Infectious Disease Analysis at the Imperial College (Imperial College model). Following its rejection by public health officials (David Adam), those officials used the model produced by IHME, the Institute for Health Metrics and Evaluation at the University of Washington (IHME model). Eventually, this predictive model was abandoned as well (William Wan et al), with the public health establishment deciding to rely primarily on actual observations.

An observer cannot help but notice the resemblance between graphs of daily new cases and daily new deaths, on the one hand, and probability distributions on the other. This suggests the possibility of fitting probability distributions to the data. We will explain why this task is impossible, what similar task should be done instead, and the challenge in justifying such a model theoretically.

2. Incidence distributions

Despite their superficial resemblance, the graph of the daily number of new cases or new deaths is not a true probability distribution, because the sum of the values of the daily numbers of new cases or new deaths is not one.

We choose to generalize the concept of probability distributions. Rather than insist that the measure of the whole space (popularly understood as “the area under the curve”) equal one, we relax this condition to require that the measure of the whole space be positive but finite. We denote the measure of the whole space by $N$ . This number is not the size of the entire population, but rather the ultimate number of cases or deaths at the conclusion of the epidemic.

During the pandemic, $N$ is not known; it may be estimated from existing data. For this reason, we cannot shoehorn our approach into a traditional probability distribution by dividing by $N$ , since $N$ is not yet known.

An incidence distribution is $N$ times a probability distribution, where $0<N<\infty$ . The concepts of support, measures of central tendency, variability, and moments are inherited from the corresponding probability distribution. However, $N$ is a new type of parameter for an incidence distribution. Probability distributions result in the special case $N=1$ .

Given a series of daily new cases or daily new deaths, and a finite-dimensional family of probability distributions, we seek the best-fit incidence distribution that describes the data. This requires the identification of parameters from the family of probability distributions, in conjunction with the identification of $N$ . This enhanced set of parameters is identified simultaneously.

The next task is to identify a measure of goodness of fit, and an appropriate family of probability distributions. Some of the most common measures are the root mean square error (RMSE), $R^{2}$ , the $F$ -statistic, and the $p$ -value of the model. These measures of goodness of fit are related. Indeed, the model that minimizes RMSE simultaneously maximizes $R^{2}$ , maximizes $F$ , and minimizes $p$ . Our optimization will focus on the goal of minimizing RMSE. For the reader’s convenience, however, we will report all four measures of goodness of fit.

Distribution fitting has been attempted before, in accordance with Farr’s law of epidemiology. It is stated on page 97 of Laws of Epidemics (Farr): “It appears probable, however, that the small-pox increases at an accelerated and then a retarded rate; that it declines first at a slightly accelerated, then a rapidly accelerated, and lastly at a retarded rate, until the disease attains the minimum intensity, and remains stationary.”

Attempts have been made to fit normal and logistic distributions to epidemic data. Such attempts are thoroughly inappropriate, as these two distribution families are symmetric with light tails, whereas epidemic data tend to be quite asymmetric, with heavy tails. We propose a better alternative.

3. Generalized beta distributions

The generalized beta distributions (McDonald & Xu, 1995) form a very broad family of probability distributions with strong closure properties, and includes numerous familiar probability distributions as either special cases or limiting cases.

•
Generalized beta distributions are supported on $(0,d)$ , where $0<d\leqslant\infty$ , and thus are capable of bounded or semi-infinite support. This is appropriate for epidemic studies, where the infections or fatalities commence on a given date ( $X=0$ ), and it is not known in advance when or if the infections or fatalities will end.
•
Every Pearson distribution with bounded or semi-infinite support of the form $(0,d)$ , and $0<d\leqslant\infty$ , is either a generalized beta distribution or a limit of generalized beta distributions. This includes Pearson distributions of types II (symmetric beta), I (beta), III (chi-squared, exponential, gamma), VI (beta prime, Fisher-Snedecor), and V (inverse chi-squared, inverse gamma).
•
The family of generalized data distributions is closed under positive scaling and under power transformations. That is, if $X$ is a generalized beta random variable, so is $kX^{r}$ , where $k>0$ and $r\neq 0$ .

The generalized beta distribution family appears to be the smallest family of probability distributions with the above three properties. It has five parameters, as seen in the probability density function:

$\displaystyle f(x;a,b,c,p,q)=\frac{|a|x^{ap-1}{(1-(1-c){({x}/{b})}^{a})}^{q-1}% }{b^{ap}B(p,q){(1+c{({x}/{b})}^{a})}^{p+q}}$

We note the following.

1.
The parameters satisfy $b>0$ , $0\leqslant c\leqslant 1$ , $p>0$ , and $q>0$ . $a$ is the non-zero exponent in a power transformation; $b$ is a scaling factor; $c$ is used to interpolate between generalized beta of the first kind (GB1 when $c=0$ ) and generalized beta of the second kind (GB2 when $c=1$ ).
2.
$B(p,q)$ is the beta function, $\Gamma(x)\Gamma(y)/\Gamma(x+y)$ , where $\Gamma(x)$ is the gamma function.
3.
The independent variable satisfies $0<x<{b}/{{(1-c)}^{1/a}}$ (clearly, $a\neq 0$ ).
4.
The support is bounded when $c<1$ , and is semi-infinite when $c=1$ .
5.
The Pearson type I (beta) distribution occurs for $a=1$ , $c=0$ .
6.
The Pearson type VI (beta prime) distribution occurs for $a=1$ , $c=1$ .

4. The search process

Together with $N$ , there are now a total of six parameters to be estimated, in $(N,a,b,c,p,q)$ space. Solutions in Excel Solver over this six-dimensional parameter space did not appear to be stable, for two reasons. First and most importantly, the parameter $c$ belongs to a closed interval, which leads to potential problems close to the boundary of the interval. In particular, the best observed fit tended to be for the case $c=1$ , on the boundary of the parameter space. Secondly, five parameters are bounded ( $N>0$ , $b>0$ , $0\leqslant c\leqslant 1$ , $p>0$ , and $q>0$ ).

We converted the search for the five bounded parameters into searches over unbounded intervals, replacing these five by $\ln(N)$ , $\ln(b)$ , $logit(c)=\ln({c}/{(1-c)})$ , $\ln(p)$ , and $\ln(q)$ . Also, we chose to narrow the parameter search by searching within the five-dimensional $(N,a,b,p,q)$ space three times for each data set: once for $c=0$ , once for $c={1}/{2}$ , and once for $c=1$ . The resulting optimal solutions were tested successfully for stability.

Another difficulty arose in computing the probability density function; the first two factors in the denominator are either extraordinarily large or extraordinarily small, mostly canceling out each other. To compute the product of the first two factors in the denominator, we computed the logarithms of the individual factors, added them, and exponentiated the sum. That is:

$\displaystyle b^{ap}B(p,q)={\exp({\ln(b^{ap}B(p,q))})}={\exp({\ln(b^{ap})}+{% \ln(B(p,q))})}=\exp(ap{\ln(b)}+{\ln(\Gamma(x))}+{\ln(\Gamma(y))}-{\ln(\Gamma(x% +y))})$

We used Excel’s function GAMMALN.PRECISE in the last computation.

5. Overview of data

Data were gathered from the Worldometers website (Worldometers). This process was somewhat problematic, because published data were subject to change as the various country health agencies updated their reports. Often, small changes would be published going back several weeks. Most changes were small in magnitude, however, and did not substantially affect the model parameters. Consequently, we chose merely to add new data, rather than continually revisit the previously recorded data for changes. Our primary interest is in demonstrating the feasibility of this model-fitting method.

We display, in two graphs, the weekly number of new cases of coronavirus per one million population, and the weekly number of new deaths from coronavirus per one million population, in the countries under study. In each graph, for each country, the data series starts on the first day when there is at least one case or one death per two million population. Figure 1 depicts weekly cases per one million population; Fig. 2 depicts weekly deaths per one million population.

Figure 1.

Weekly cases of coronavirus per one million population.

Figure 2.

Weekly deaths from coronavirus per one million population.

In addition to the United States, United Kingdom, Spain, and Italy, we also initially followed data from Brazil, France, Belgium, and Germany. These eight countries were chosen for study in early June, 2020 as having the highest impact of the pandemic upon the country and upon the world, by having the largest values (at that time) for the metric $\sqrt{{CD}/{P}}$ , where $C$ is the total number of cases, $D$ is the total number of deaths, and $P$ is the total population of the country. As of October, 2020, Peru, Argentina, India, Mexico, and Colombia also have high values for this metric.

A further time series was added, representing the geometric mean of the series for the northern hemisphere countries, to smooth out irregularities observed in only one country. Data is current to July 9, 2020.

The behavior of the western European countries is somewhat similar, enough to consider this group of densely-populated neighboring countries as a single cluster. In all the northern hemisphere countries in the graphs, the shapes of the deaths plots are very similar, with the exception of amplitude; the number of weeks to the peak is about four to five. The shapes of the cases plots are similar among the western European countries, again except for amplitude; the number of weeks to the peak is about four to six. Recall that time on both graphs is counted from the first day that cases or deaths respectively occur once for every two million population.

In the United States, deaths decayed steadily, but the number of cases underwent a second increase. This does not necessarily signify a “second wave”, but rather the effect of the large size and multiple climates in the United States, which drove the northern United States indoors in the cold winter, and the southern United States indoors in the hot summer. Thus, each region of the United States experienced its “first wave” during the period under study. It could be argued plausibly that such a curve could be understood more easily as a sum of two distinct functions, one representing a cluster of cases in the northeastern United States, and a second one representing the rest of the country.

Brazil, in the southern hemisphere, only reached its peak infection rate during its winter, which was summer in the northern hemisphere. Because of the delayed onset of the disease there, it will not be studied further herein.

6. Results

For each country, we used 7-day centered moving averages to smooth the daily number of new cases and the daily number of new deaths; on day 3, we used a 5-day centered moving average; on day 2, we used a 3-day centered moving average; and on day 1, we used the observation itself. There are 120 days of smoothed data for cases, and 105 days of smoothed data for deaths (this required following the raw data for 123 days and for 108 days respectively).

Next, we used Excel Solver to fit the best (least RMSE) generalized beta distribution to the smoothed data for each country, for the cases series and for the deaths series, three times: once for $c=0$ (GB1), once for $c={1}/{2}$ , and once for $c=1$ (GB2). Thus for each country, there are three models for cases and three models for deaths. For the ease of the reader, we identify the best model with an asterisk.

Table 1
Results for the United States of America

USA
	Cases			Deaths
	$c=0$	$c={1}/{2}$	$c=1^{*}$	$c=0$	$c={1}/{2}$	$c=1^{*}$
$n$	3187802	3187802	3187802	131775	131775	131775
$N$	115505409	148665275	330859357	146478	147515	184991
$a$	0.0898	0.0891	0.1443	0.2015	0.2582	2.3055
$b$	13843401.21	10593.59	12.17	20324.02	1038.96	31.22
$p$	33.01	41.81	36.07	30.84	24.75	1.01
$q$	38.14	24.97	12.59	74.13	42.76	0.42
RMSE	7944.97	7930.36	7870.84	69.18	68.30	52.14
$R^{2}$	0.5179	0.5197	0.5248	0.9866	0.9870	0.9924
adjusted $R^{2}$	0.5012	0.5030	0.5083	0.9861	0.9864	0.9921
$F$	30.89	31.11	31.75	1843.06	1891.49	3263.05
$p$ value	1.85E-17	1.50E-17	8.21E-18	1.07E-92	2.98E-93	5.68E-105
${N}/{P}$	34.9107%	44.9331%	100.0000%	0.0443%	0.0446%	0.0559%
$\textit{RMSE}/{N}$	0.0069%	0.0053%	0.0024%	0.0472%	0.0463%	0.0282%

Table 2

Results for the United Kingdom

UK
	Cases			Deaths
	$c=0^{*}$	$c={1}/{2}$	$c=1$	$c=0$	$c={1}/{2}$	$c=1^{*}$
$n$	313030	313030	313030	43614	43614	43614
$N$	326858	325890	331653	44619	44759	53050
$a$	0.1920	0.2340	0.3271	0.1716	0.1653	3.1902
$b$	231114692700398	13839.87	14.74	127573.85	1565.72	26.90
$p$	109.12	73.12	91.50	69.91	92.24	1.01
$q$	28819.04	230.11	59.59	210.13	123.64	0.39
RMSE	267.86	267.89	272.27	34.33	32.92	22.68
$R^{2}$	0.9769	0.9769	0.9762	0.9851	0.9863	0.9935
adjusted $R^{2}$	0.9761	0.9761	0.9754	0.9845	0.9857	0.9932
$F$	1218.35	1218.14	1178.30	1649.29	1795.46	3809.89
$p$ value	4.12E-93	4.16E-93	2.69E-92	2.55E-90	3.89E-92	2.59E-108
${N}/{P}$	0.4814%	0.4800%	0.4885%	0.0657%	0.0659%	0.0781%
$\textit{RMSE}/{N}$	0.0820%	0.0822%	0.0821%	0.0769%	0.0735%	0.0428%

For each model, we report the number of observations to date $(n)$ , the parameters of the model $(N,a,b,p,q)$ , and the measures of goodness of fit $(\textit{RMSE},R^{2},\text{adjusted∼{}}R^{2},F,p\text{∼{}value})$ , where the $F$ -statistic is computed with 4 numerator degrees of freedom and either 115 or 100 denominator degrees of freedom, depending on whether we are modeling cases or deaths. We also report two other important fractions: $({N}/{P},\textit{RMSE}/{N})$ . The first of these fractions gives the ultimate prevalence of cases or deaths (respectively) in the population; the second fraction expresses the RMSE as a proportion of the total number of cases or deaths (respectively).

We note that the exceptionally small levels computed for the $p$ -values most likely reflect the choice to fit the model to smoothed daily data. In all cases, the computed $p$ -values were significant even by the strict standards of (Switkay), which are designed to reject false positives. Another legitimate approach would be to fit the models to weekly totals, in which case there would be 17 observations for weekly new cases, and 15 observations for weekly new deaths.

We further note that optimal models only occurred for $c=0$ and $c=1$ , never for $c={1}/{2}$ ; only in one case (UK cases) was $c=0$ optimal, and then, only barely. For seven of the eight models, $c=1$ , the GB2 model, was the best choice.

Results for the United States of America are given in Table 1; for the United Kingdom in Table 2; for Spain in Table 3; for Italy in Table 4.

Table 3

Results for Spain

Spain
	Cases			Deaths
	$c=0$	$c={1}/{2}$	$c=1^{*}$	$c=0$	$c={1}/{2}$	$c=1^{*}$
$n$	297931	297931	297931	28273	28273	28273
$N$	297931	297931	297931	28389	28449	28745
$a$	0.1876	0.1574	0.9354	0.1803	0.1729	0.6418
$b$	188069151990438	8054.80	9.25	79151.28	1104.22	26.15
$p$	101.06	137.52	18.35	72.24	95.11	15.05
$q$	24223.39	251.66	5.18	229.23	131.49	14.23
RMSE	474.31	473.93	448.01	30.02	29.64	29.40
$R^{2}$	0.9583	0.9583	0.9628	0.9877	0.9880	0.9882
adjusted $R^{2}$	0.9568	0.9569	0.9615	0.9872	0.9875	0.9877
$F$	660.41	661.51	743.69	2010.42	2061.90	2096.08
$p$ value	2.61E-78	2.38E-78	3.72E-81	1.47E-94	4.21E-95	1.87E-95
${N}/{P}$	0.6372%	0.6372%	0.6372%	0.0607%	0.0608%	0.0615%
$\textit{RMSE}/{N}$	0.1592%	0.1591%	0.1504%	0.1057%	0.1042%	0.1023%

Table 4

Results for Italy

Italy
	Cases			Deaths
	$c=0$	$c={1}/{2}$	$c=1^{*}$	$c=0$	$c={1}/{2}$	$c=1^{*}$
$n$	239176	239176	239176	34503	34503	34503
$N$	239176	239176	242033	35020	34947	35191
$a$	0.1877	0.1663	0.6284	0.1493	0.3963	0.5902
$b$	62360679813147	9994.96	12.27	13063.35	1473.66	248.27
$p$	127.13	156.34	35.19	70.88	17.31	10.36
$q$	24192.79	308.60	16.12	99.96	65.93	31.70
RMSE	207.42	207.60	198.09	29.87	29.86	29.58
$R^{2}$	0.9861	0.9861	0.9873	0.9846	0.9846	0.9849
adjusted $R^{2}$	0.9856	0.9856	0.9869	0.9840	0.9840	0.9843
$F$	2038.34	2034.81	2237.72	1596.77	1597.89	1629.75
$p$ value	9.99E-106	1.10E-105	5.02E-108	1.25E-89	1.21E-89	4.59E-90
${N}/{P}$	0.3956%	0.3956%	0.4003%	0.0579%	0.0578%	0.0582%
$\textit{RMSE}/{N}$	0.0867%	0.0868%	0.0818%	0.0853%	0.0855%	0.0840%

We provide graphs of the best models for the United States, together with 95% prediction limits for new observations. Figure 3 depicts cases in the United States; Fig. 4 depicts deaths in the United States. The prediction of cases in the United States is the only example where the best-fit model has $R^{2}<0.95$ . It should be noted that most generalized beta distributions are unimodal, so the poor fit is to be expected in this case. In contrast, the models of cases and deaths for the UK, Spain, and Italy during the period under study all have the good fit found in the USA deaths model.

Figure 3.

USA cases, GB2 model, with 95% prediction interval.

Figure 4.

USA deaths, GB2 model, with 95% prediction interval.

We provide Fig. 5 for comparison. This demonstrates what happens to the model once a second wave begins. In this graphic, daily deaths in the United States are extended to September 30, 2020, and centered moving averages to September 27, 2020. $R^{2}$ falls to 0.6055, and RMSE increases to 306.14. For the record, $N$ , the total number of deaths for the United States, is predicted to be 346020 based on data through the end of September. However, this is based on a unimodal model that does not account for the multi-modal reality of the observations. The data are more likely to be better fit by a sum of incidence distributions, as a sort of mixture model.

Figure 5.

USA deaths, GB2 model, with 95% prediction interval, to September 27, 2020.

7. Probabilistic models compared with compartmental models

The use of a probabilistic model in forecasting the path of an epidemic requires discussion. At first glance, the compartmental model seems more natural. It reflects the dynamic nature of the reality of the epidemic. Individuals progress from susceptible to infected to recovered (or dead) with certain probabilities, as if in a Markov process. Unlike a Markov process, however, the probabilities associated with changing states themselves change over the course of time. This only adds to the difficulty of estimating the parameters of the model.

The use of a probabilistic model is recommended primarily by the appearance of the graph of the number of new daily cases or deaths. That resemblance is problematic, however. In the epidemic, the horizontal axis of the graph represents time. For a probability distribution or incidence distribution, the horizontal axis of the graph represents a random variable. To justify the use of a probabilistic model entails a fatalistic view of the epidemic. In this view, there is a total number of people who were destined to become cases or deaths; the random variable identifies the time at which their destinies as cases or deaths materialize.

This paradigm shift is an apparently strange perspective, inviting further commentary. One potential explanation is that the graph represents a sort of struggle for the population to reach herd immunity at or about $P(1-{1}/{R_{0}})$ , short of which the population will continue to suffer new cases and new deaths.

The ultimate test from the scientific point of view is how well each type of model explains the observed data. Another concern is the stability of these models as new data becomes available. In other words, we want to know how well models constructed from the first $w$ weeks of data forecast observations from week $w+1$ , as well as how much the parameters of the models change after an additional week of observations.

Table 5
Predicted and observed deaths per million population

	GB2 predicted ultimate deaths per million: July 9, 2020	Observed deaths per million: October 26, 2020	Percent error
USA	559	697	$+$ 19.8%
UK	781	662	$-$ 18.0%
Spain	615	749	$+$ 17.9%
Italy	582	620	$+$ 6.1%

Table 6

Relative error of prediction, total deaths: one-month forecast, GB2 model

	Percent error: July to August	Percent error: August to September	Percent error: September to October
USA	$+$ 12.0%	$+$ 7.35%	$+$ 6.84%
UK	$-$ 6.85%	$+$ 5.27%	$+$ 4.33%
Spain	$-$ 1.14%	$+$ 7.39%	$+$ 18.22%
Italy	$+$ 1.29%	$+$ 2.23%	$+$ 9.08%

8. Summary and recommendations

It has been our goal in this paper to present an alternative approach to modeling pandemic outcomes like new cases and new deaths, by fitting generalized beta distributions to smoothed or aggregated data. Most of the time, generalized beta distributions of the second kind had the best fit to the data, with $R^{2}>$ 95% in most cases we studied. This strong fit is explained, in part, by the fact that the model-fitting process took place after local peaks of new cases and new deaths, respectively. Those peaks occurred, in general, within about 40 days after new cases or new deaths reached one case in two million population. Consequently, any forecast made prior to the peak is likely to be suspect, unless the behavior of the pathogen is known from previous seasons.

We note that the approximately 40-day duration between a threshold in cases or deaths and a local maximum in that variable did not hold throughout the world. Much lengthier delays between threshold and peak took place in the southern hemisphere, due to its seasonal opposition in comparison to the northern hemisphere. However, there was a lengthy delay in the northern hemisphere as well, for example in India, where the gap for new cases was more than five months from threshold to peak, and the reason for this discrepancy is worth exploring. In particular, it would be interesting to know if this discrepancy is related to the widespread use of hydroxychloroquine in India.

It is clear in October, 2020 that the asymptotically declining behavior of the functions graphed above does not accurately anticipate successive waves of new positive tests. It remains to be seen whether those are all symptomatic cases, or whether an aggressive testing regime, running 40 cycles in the PCR test, has uncovered inactive traces of virus from earlier infections. In either case, however, these multi-modal time series are better described as a sum of multiple generalized beta incidence distributions, rather than fitting one such function to the whole series. The modes of these components might very well turn out be fit themselves to a generalized beta incidence distribution, if the waves increase in magnitude, and then decrease in magnitude.

Models of this sort are incapable of detecting seasonality. If there is reason to suspect that a disease might become a seasonal phenomenon, like influenza, other information would be required to produce a reliable forecast.

All the reservations above lead to the question: of what use are such models? These models are useful for smoothing of observations and for short-term forecasts, probably no farther out than one month in the future when forecasting new cases.

However, it is remarkable that forecasts of ultimate total deaths by the end of the pandemic, made on the basis of data through July 9, 2020, turn out to be in the vicinity of total deaths as reported by October 26, 2020. Thus a pure generalized beta incidence distribution may well capture an intrinsic feature of the total deaths caused by a pandemic. Table 5 compares the forecast number of deaths per million population from the tables above, employing the GB2 model ( $c=1$ ), to the observed number of deaths per million population. Even after a gap of nearly four months, in each of the four countries, the relative error of prediction was less than 20%.

Table 6 displays even more clearly the usefulness of these models for short-term forecasts, giving the percent error in deaths based on the GB2 model from one month to the next. The median absolute value of the relative error of prediction in this table is less than 7%.

Footnotes

Acknowledgments

The author wishes to express thanks for the tremendous patience of his wife, Yan, as the author studied data set after data set, many of which she gathered. The author also thanks an anonymous reviewer who made suggestions that improved the clarity of the exposition.

References

Adam

(2020). UK has enough intensive care units for coronavirus, expert predicts. New Scientist. 25 March 2020. https://www.newscientist.com/article/2238578-uk-has-enough-intensive-care-units-for-coronavirus-expert-predicts/. Accessed November 30, 2020.

Farr

(1840). Causes of death in England and Wales. Second Annual Report of the Registrar General of Births, Deaths and Marriages in England. 2, 69-98.

Kermack

W. O.

McKendrick

A. G.

(1927). A Contribution to the Mathematical Theory of Epidemics. Proceedings of the Royal Society A. 115(772), 700-721.

IHME model. https://covid19.healthdata.org/united-states-of-america. Accessed June 4, 2020.

Imperial College model. https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/covid-19-reports/. Accessed June 4, 2020.

McDonald

J. B.

, & Xu

Y. X.

(1995). A generalization of the beta distribution with applications. Journal of Econometrics, 66(1-2), 133-152.

Switkay

H. M.

(2020). The Significance of Statistical Significance. Proceedings 2020 Joint Statistical Meetings, 614-641.

Wan

Dawsey

Parker

Achenbach

(2020). Experts and Trump’s advisers doubt White House’s 240,000 coronavirus deaths estimate. Washington Post. April 2, 2020.

Worldometers coronavirus data. https://www.worldometers.info/coronavirus/. Accessed continually from March through July, 2020, and October, 2020.