Using the Weibull distribution to model COVID-19 epidemic data

Abstract

COVID-19 is a severe acute respiratory syndrome caused by the new Coronavirus. COVID-19 outbreak is a Public Health Emergency of International Concern, declared by WHO, that killed more than 2 million people worldwide. Since there are no specific drugs available and vaccination campaigns are in the initial phase, or even have not begun in some countries, the main way to fight the outbreak worldwide is still based on non-pharmacological strategies, such as the use of protective equipment, social isolation and mass testing. Modeling of the disease epidemics have gained pivotal importance to guide health authorities on the decision making and applying of those strategies. Here, we present the use of the Weibull distribution to model predictions of the COVID-19 outbreak based on daily new cases and deaths data, by non-linear regression using Metropolis-Markov Chain Monte Carlo simulations. It was possible to predict the evolution of daily new cases and deaths of COVID-19 in many countries as well as the overall number of cases and deaths in the future. Modeling predictions of COVID-19 pandemic may be of importance on the evaluation of governments and health authorities mitigation procedures, since it allows one to extract parameters that may help to guide those decisions and measures, slowing down the spread of the disease.

Keywords

COVID-19 Weibull distribution modelling model death toll

1. Introduction

Since the World Health Organization has declared the COVID-19 outbreak a Public Health Emergency of International Concern, in January, 30 ${}^{\text{th}}$ , more than 93 million cases and 2 million deaths were registered worldwide. Many efforts are being done for discovering and developing therapeutic strategies against COVID-19. In despite of global initiatives to search for treatments and vaccines, the main tool for slowing down the spread of the disease throughout communities still are social isolation, personal hygiene and mass testing.

Before the development of efficient vaccines, many non-pharmacological strategies have been proposed to fight COVID-19. Most of them are based on slowing the virus spread by self-care measures, as the use of personal protection equipment, mass testing and restriction of social contact, through patient quarantine, social isolation and lock down. Despite many discussion, social isolation and mass lock down measures have been described as successfully strategies for slowing the virus spreading (Anderson et al., 2020; Lau et al., 2020; Mitjà et al., 2020; Saez et al., 2020; Sjödin et al., 2020; Wilder-Smith & Freedman, 2020). Mass testing has been shown to be one of the most effective strategies, since it allows tracing precisely the contact network of each contaminated person and apply isolation and quarantine measures. Success of some countries, such as South Korea, in slowing down the spread of the virus has been attributed to mass testing and selective quarantine (Choi, 2020).

Comparison of the evolution of epidemic curves among countries may be of pivotal importance to predict the effect of the mitigation measures taken. It is possible, based on the data analysis, to model the evolution of the disease and to predict the number of infected, healed and deceased people along days and weeks. Such predictions may be extremely helpful in the decision taking by health authorities. Many papers have presented predictions of the epidemic evolution by different methods (Ciufolini & Paolozzi, 2020; Gupta et al., 2020; Kim et al., 2020; Li et al., 2020). In this work, we have used the Weibull distribution in a selected set of data, depicting the number of daily new cases and deaths, from countries that present distinct epidemic patterns. Weibull distribution is one of the most commonly used parametric lifetime model (Lawless, 2003), mostly for its parsimony, its ability to satisfactorily model data which are commonly encountered in survival analysis and its availability in statistical software packages (Khan, 2018; Lawless, 2003). We believe that data from daily new cases and deaths of COVID-19, as well as from other epidemic outbreaks, may be modeled by the Weibull distribution, resulting in valuable information to be used for supporting mitigation measures taken by governments and health authorities worldwide.

2. Materials and methods

Data on the daily number of confirmed new cases and deaths, for every country, were extracted from Our World in Data project (Roser et al., 2020) as comma-spaced values (CSV) files, processed with R (R Core Team, 2013) using Rstudio 1.2.5042 (RStudio Team, 2020) for Linux. Data were subset in order to select countries names, dates of registers and the number of daily new cases and deaths for each country, since the beginning of the pandemic (December, 31 ${}^{\text{st}}$ 2019) up to January, 7 ${}^{\text{th}}$ 2021. In order to perform proper statistics, data were subset to use only those countries in which more than 3,000 deaths were registered until the day of the data collection.

Data on daily new cases and daily new deaths were adjusted to a 4-parameter Weibull distribution using Markov Chain Monte Carlo simulation (MCMC). A modified Weibull 4-parameter equation was used to adjust the data to the prediction model, described as:

$\displaystyle f(t)=\begin{cases}0,&\tau\leqslant\gamma\\ \displaystyle\frac{\alpha\beta}{\eta}{\left(\frac{t-\gamma}{\eta}\right)}^{% \beta-1}e^{-{\left(\frac{t-\gamma}{\eta}\right)}^{\beta}},&\text{otherwise}% \end{cases}$ (1)

where $t$ is the time; $f(t)$ is the number of new cases or new deaths as a function of $t$ ; $\alpha$ is the area under the curve (sum of total cases or deaths), $\gamma$ is the location parameter, $\beta$ and $\eta$ are the shape and the scale Weibull’s parameter, respectively.

Some cases in which data calculations required the use of bimodal Weibull distribution will be presented bellow. In such cases, a bimodal Weibull distribution, adapted from Eq. (1), was used:

$\displaystyle f(t)=\begin{cases}0,&\tau\leqslant\gamma\\ \displaystyle\alpha\left[\frac{\beta}{\eta}{\left(\frac{t-\gamma}{\eta}\right)% }^{\beta-1}e^{-{\left(\frac{t-\gamma}{\eta}\right)}^{\beta}}+\frac{\beta^{% \prime}}{\eta^{\prime}}{\left(\frac{t-\gamma^{\prime}}{\eta^{\prime}}\right)}^% {\beta^{\prime}-1}e^{-{\left(\frac{t-\gamma^{\prime}}{\eta^{\prime}}\right)}^{% \beta^{\prime}}}\right],&\text{otherwise}\end{cases}$ (2)

where $t,f(t)$ , $\alpha$ , $\beta$ , $\gamma$ and $\eta$ have the same meanings as in the Eq. (1); $\beta^{\prime}$ , $\gamma^{\prime}$ and $\eta^{\prime}$ are the shape, location and scale parameters of the second mode of the Weibull distribution, respectively.

2.1 Markov Chain Monte Carlo Simulations

Markov Chain Monte Carlo Simulations (MCMC) were performed using random walk Metropolis algorithm (Metropolis et al., 1953) within a five dimensional space to accommodate $\beta$ , $\eta$ , $\gamma$ and $\alpha$ parameters, as well as the standard deviation (SD). When bimodal distributions were used, the calculations were performed in an eight dimension space ( $\alpha$ , $\beta$ , $\eta$ , $\gamma$ , $\beta^{\prime}$ , $\eta^{\prime}$ , $\gamma^{\prime}$ and SD). Prior distributions used in MCMC were normal for $\eta$ and uniform for $\beta$ , $\gamma$ , $\alpha$ and SD. The log of the likelihoods were determined from 10,000 iterations, with a 5,000 iterations burn out period. Parameters were sampled from a normal proposal distribution averaged in the value of the parameter in the subsequent iteration. The standard deviations of the proposal distributions were set to 1.5% of the given parameters’ values, since this value was described to give an acceptance ratio around 0.23 among MCMC iterations. The accepatance ratio of 0.23 has been previously demonstrated to maximize the efficiency of the Metropolis-MCMC algorithm (Roberts et al., 1997).

2.2 Starting parameters for Metropolis-MCMC

Selection of the starting parameters for the Metropolis-MCMC procedures is a key feature for the efficiency of the simulation. These initial values should not be too far away from a typical set of parameters (where posterior density is high) because the Metropolis-MCMC algorithms would need too many iterations to reach the convergence if the initial values are far in the tail of the posterior distribution (Korner-Nievergelt et al., 2015). Since 4-parameter Weibull distribution was specifically used to model COVID-19 epidemic curves, we have used simple rules, based on the analysis of the graphical role of the Weibull parameters (Eq. (1)), as described below:

iii. i.
Location parameter ( $\gamma$ ): shifts the beginning of the distribution to higher $t$ values. In our data, it represents the time lapse before the firsts cases/deaths arise, or before the raising of the exponential growing of daily new cases and deaths. Data sets used in this work register COVID-19 daily new cases and deaths since December, 31 ${}^{\text{st}}$ 2019. However, most countries registered their first cases and deaths only in latter dates. Thus, there is a lag of zeroes (or near zeroes) registered before the raise of the first case/death. The starting $\gamma$ value for Metropolis-MCMC procedures ( $\gamma_{0}$ ) was empirically calculated as the first day in which more than 5% of the actual maximum number of daily new cases or deaths was registered, or the magnitude of the vector ${t}_{i}$ , from $t=0$ to the time value in which 5% of the maximum number of cases/deaths was registered, given by:

$\displaystyle\gamma_{0}=\|\overrightarrow{t_{i}}\|^{f(t_{i})=0.05\max(f(t))}_{0}$ (3)

where $\gamma_{0}$ is the location parameter at iteration 0 and $f(t_{i})$ is the number of daily new cases or deaths as a function of $t$ .
ii.
Scale parameter ( $\eta_{0}$ ): starting value for the Weibull’s scale parameter was set as the mean of the vector $t$ from its ${i}^{\text{th}}$ element equals to $\gamma_{0}$ to ${t}_{n}$ , described as:

$\displaystyle{\eta}_{0}=\frac{1}{(n-{\gamma}_{0})}\int^{i=n}_{t_{i}={\gamma}_{% 0}}{\overrightarrow{t_{i}}}$ (4)
iii.
Starting values for the area parameter ( $\alpha_{0}$ ) was set to the sum of the number of cases or deaths and for the shape parameter ( $\beta_{0}$ ) was set to 2.5 for all countries, since typical Weibull shape parameter that fits to most of the COVID-19 epidemic data ranges from 1 to 5 (data not shown).

2.3 Weibull4 R package

Fitting procedures described in this paper were summarized in a R language package named “weibull4”, designed to fit epidemiological data, in special for COVID-19, using Weibull 4-parameter equation (Eq. (1)) by Metropolis-MCMC algorithm. The package weibull4 is available for downloading and installing at the Comprehensive R Archive Network (CRAN) repository (R Core Team, 2013).

2.4 Supplementary material

Figure S1, described in the text, is available as Supplementary Material. The R script called “Moreau_weibull_ 2021”, with the codes for every calculation and plots in this paper is available in the Code Ocean server (codeocean.com).

3. Results

Data analysis can be of outstanding importance during infection diseases outbreaks, mainly if fast decision making is crucial to slow down the spread of the disease. Modeling of the course of the COVID-19 pandemic in highly affected countries is a live-saving demand (Eberhardt et al., 2020; Verma et al., 2020), since it can be used to support and guide decision makers to quickly act and block the spread of the disease. Data on daily new cases and daily new deaths were extracted from Our World in Data project on Coronavirus (Roser et al., 2020). Data from countries that faced the COVID-19 pandemics earlier, such as Italy, France, Spain, etc, formed a well defined single peak in a first moment. Such pattern allowed us to evaluate statistical modeling to proper fit the data. The Weibull distribution was chosen for this goal because of its potential in modeling life time events (Lawless, 2003). Such analysis allows us to forecast predict epidemiological outcomes, as death toll and the future number of daily new cases and deaths in the studied countries.

Figure 1.

Panel A: Profile of the first wave of daily new cases (open circles) and deaths (closed circles) of COVID-19 in Italy. Data were fit within an unimodal 4-parameter Weibull distribution (lines). Arrows show the $\gamma$ parameter, corresponding to the beginning of the exponential growth of new cases or deaths; and the mode of the distribution, corresponding to the average day in which the largest number of cases or deaths are observed. Calculated Weibull’s shape, scale, location and area parameters are shown in Panels B, C (upper lines), C(lower lines) and D, respectively, as a functions of the Metropolis-MCMC iterations. Lines converging to the same average values in each panel correspond to the same parameter calculation, with distinct starting values.

Figure 1 shows plots for the initial single peaks of daily new cases and deaths of COVID-19 registered in Italy, as well as the curve fits calculated for them. As seen, the Weibull distribution can properly fit the evolution of the epidemic peak and to be used to model and to forecast predict the number of daily new cases and deaths. Panel A of the Fig. 1 also illustrates the positions of the scale parameter ( $\gamma$ ) and the mode of the distribution ( $\text{Mof}(t)$ ), what may also be called ${t}_{\max}$ , that stands for the time (in days) in which the maximum number of cases or deaths were (or will be) registered or, yet, the maximum turning point, given by:

$\displaystyle\text{Mof}(t)=\begin{cases}\gamma,&\beta\leqslant 1\\ \displaystyle\gamma+\eta{\left(\frac{\beta-1}{\beta}\right)}^{{1}/{\beta}},&% \beta>1\end{cases}$ (5)

where $\text{Mof}(t)$ is the mode of the number of daily new cases or deaths distribution ( $f(t)$ ). Additionally, Fig. 1 shows the posterior values for the Weibull parameters along the Metropolis-MCMC simulations iterations (Panels B, C and D). Convergences are reached before the ending of the burning out period and the posterior distributions converge to the same average values even if distinct starting values were chosen. This could be observed for every parameter (converging lines in Panels B, C and D), suggesting that the Weibull-directed Metropolis-MCMC performed here is a suitable procedure to properly fit the daily new cases and deaths data of COVID-19.

Figure 2.

Data on daily new cases and deaths of COVID-19 from nine selected countries used as examples of customized analysis. Upper panels show countries in which two unimodal Weibull distributions were used, with split date on August, 1 ${}^{\text{st}}$ . (Belgium, Canada and Germany), i.e., Mo1 $=$ 1 and Mo2 $=$ 1. Middle panels display countries in which a single bimodal Weibull distribution was used for data fitting (Bolivia, Brazil and Russia), i. e., without splitting the data and Mo1 $=$ 2. Lower panels show countries in which two Weibull distributions were used – one unimodal and one bimodal – with split date on September, 1 ${}^{\text{st}}$ (Serbia and United States), and bimodal distribution up to the split date and an unimodal distribution from the split date foward (Mo1 $=$ 2, Mo2 $=$ 1); and United Kingdom, with an unimodal distribution up to the split date and a bimodal distribution from the split date forward (Mo1 $=$ 1, Mo2 $=$ 2).

Recently, most countries have entered in a second wave of COVID-19 infections. Due to this second wave, such countries began to present a multimodal pattern of daily new cases and deaths. Additionally, some countries have shown more complex patterns, with more than two mixed waves. Such puzzling patterns make harder, or even impracticable, to perform proper statistic analysis and predictions. Even so, such complex data on COVID-19 daily new cases and deaths could be analyzed if multimodal distributions were used. This is possible by splitting the data by date to perform the analysis with two Weibull distributions, being one before and one after the splitting date. Optional arguments were included into the weibull4 R package (see Material and Methods) in order to allow users to choose the dates for split the data in two parts, as well as to set the unimodal or bimodal Weibull distribution to be used before and after the split date. More explanations about such arguments functionality are described under the package documentation files (not shown).

Examples on the data analysis performed with the Weibull distribution within the countries COVID-19 daily new cases and deaths data are shown in Fig. 2, in three distinct ways: i. Upper panels display countries in which data were analyzed by two separated unimodal distribution (Belgium, Canada and Germany), with data splitting on Sep, 1 ${}^{\text{st}}$ ; ii. Middle panels display the analysis using a single bimodal distribution for the whole data set (Bolivia, Brazil and Russia), without split date; iii. Lower panels show the data analysis using one unimodal distribution and one bimodal distribution, with data splitting in Aug, 1 ${}^{\text{st}}$ (Serbia, United Kingdom and United States). Non-linear regressions performed with single or double, both unimodal and bimodal Weibull distributions look to fit the COVID-19 daily new cases and deaths data in a proper fashion. Split date, as well as the number of modes of the Weibull distribution to be used can be selected from each countries data. Such parameters may be chosen in order to reach better fit quality from daily new cases and deaths data. The split dates used in chart calculations of the Fig. 2 were manually selected to the deepest valley between two COVID-19 infection waves, both in daily new cases and deaths data.

The suitability of the model can be further evaluated by the residuals and by the Determination Coefficient ( ${R}^{2}$ ) of the regressions. Figure 3 shows the distribution of the fit residuals of the daily new cases (Panel A) and deaths (Panel B) of COVID-19 in all studied countries. Lines in Panels A and B represent normal distributions with means and standard deviations (SD) for each respective panel data. In both cases and death data, the residuals are narrowed distributed around zero when compared to a normal distribution with the same mean and SD of the residuals distribution (lines). Panels C and D display the residuals correlation plots between the actual number of daily new cases and deaths values versus the Weibull distribution estimates for all studied countries. As shown, residuals plot are well distributed around the slope $=$ 1, intercept $=$ 0 straight line, although it seams to present a large number of outliers (Fig. 3, panels C and D). Additionally, Table 1 shows the value for the ${R}^{2}$ of every countries data fit. As displayed, with rare exceptions, all fitted data resulted in ${R}^{2}$ values greater than 0.6.

Figure 3.

Distributions of the residuals of the fitted data for daily new cases (Panel A) and deaths (Panel B) of COVID-19 calculated with customized analysis for each country. Split dates, as well as the number of modes in each used Weibull distribution were as described in Table 1. Normal distributions, calculated with the same means and SD of each respective residuals distributions, are shown in lines. Panels C and D show the residuals plots of daily new cases and deaths for countries versus the estimated fits in custom mode, as described in Table 1. Straight lines were draw with slope $=$ 1 and intercept $=$ 0, merely to guide the eyes.

Table 1

Estimated overall death tolls calculated from customized analysis, updated to January, 7 ${}^{\text{th}}$ 2021, for every country. Overall death tolls represent the integer area under the regression curves shown in Fig. S1 ( $\alpha$ parameter of Eqs (1) or (2)). Errors were calculated from standard deviations of the last 5,000 iterations of Metropolis-MCMC simulation (see Material and Methods). Determination Coefficients ( ${R}^{2}$ ) for the data fitting, as well as the split date and the number of modes of the Weibull distribution used to fit the data before (Mo1) and after (Mo2) the split date are also shown for each country (1 corresponds to unimodal – Eq. (1) – and 2 to bimodal – Eq. (2) – Weibull distribution).

Country	Death toll		${R}^{2}$	Split date	Mo1	Mo2	Country	Death toll		${R}^{2}$	Split date	Mo1	Mo2
Argentina	68,512 $\pm$	1,505	0.8452	–	2	–	Italy	118,980 $\pm$	8,500	0.9409	Jul/01	1	2
Austria	8,952 $\pm$	630	0.8576	Jun/01	1	1	Japan	8,304 $\pm$	1,132	0.8586	Jun/01	1	2
Bangladesh	11,902 $\pm$	476	0.7970	–	2	–	Jordan	8,311 $\pm$	283	0.9510	–	1	–
Belgium	24,180 $\pm$	1,228	0.8536	Jul/01	1	2	Mexico	202,567 $\pm$	7,053	0.6970	–	2	–
Bolivia	15,752 $\pm$	298	0.8185	–	2	–	Moldova	5,584 $\pm$	785	0.7228	Jul/01	2	2
Bosnia and Herzegovina	4,785 $\pm$	296	0.7477	Jun/01	2	2	Morocco	13,735 $\pm$	956	0.9134	Jun/05	1	1
Brazil	358,586 $\pm$	15,536	0.6423	–	2	–	Netherlands	17,559 $\pm$	1,134	0.8089	Jul/01	1	2
Bulgaria	10,291 $\pm$	558	0.8125	Oct/01	1	1	Pakistan	20,151 $\pm$	2,266	0.6464	Sep/01	1	1
Canada	34,380 $\pm$	3,541	0.8252	Jul/01	1	1	Panama	13,142 $\pm$	1,444	0.8265	May/15	1	2
Chile	20,155 $\pm$	1,202	0.4935	Aug/01	1	2	Peru	51,300 $\pm$	3,463	0.5195	–	2	–
China	4,058 $\pm$	478	0.7175	May/01	1	2	Philippines	12,175 $\pm$	1,055	0.4125	May/20	2	1
Colombia	73,053 $\pm$	2,281	0.8549	–	2	–	Poland	34,976 $\pm$	5,243	0.7739	Jul/01	2	2
Croatia	7,812 $\pm$	357	0.9710	Aug/05	2	2	Portugal	10,622 $\pm$	593	0.9513	Aug/10	2	2
Czechia	24,690 $\pm$	1,693	0.9267	Jul/15	2	2	Romania	18,194 $\pm$	1,023	0.9192	Jun/01	1	2
Ecuador	20,980 $\pm$	764	0.0168	–	2	–	Russia	83,294 $\pm$	2,799	0.9478	–	2	–
Egypt	23,975 $\pm$	2,644	0.8569	Oct/05	2	1	Saudi Arabia	11,323 $\pm$	515	0.8250	–	1	–
France	86,646 $\pm$	5,763	0.6716	Jul/01	1	2	Serbia	5,205 $\pm$	372	0.9686	Sep/15	2	1
Germany	213,074 $\pm$	26,463	0.8010	Jul/01	1	1	South Africa	118,902 $\pm$	26,752	0.7654	Oct/01	1	1
Greece	6,906 $\pm$	207	0.9597	Jun/01	1	2	Spain	108,474 $\pm$	13,022	0.4896	Jun/15	1	2
Guatemala	6,178 $\pm$	180	0.5920	–	2	–	Sweden	15,763 $\pm$	1,193	0.4741	Sep/01	2	1
Honduras	5,547 $\pm$	208	0.3671	–	2	–	Switzerland	13,565 $\pm$	1,271	0.7349	Jul/01	1	1
Hungary	15,525 $\pm$	855	0.9617	Jul/01	1	1	Tunisia	6,234 $\pm$	626	0.5676	Jun/01	1	1
India	266,570 $\pm$	8,443	0.8933	–	1	–	Turkey	137,267 $\pm$	31,191	0.9289	Aug/01	2	1
Indonesia	53,125 $\pm$	2,335	0.8647	–	2	–	Ukraine	26,986 $\pm$	3,135	0.8644	Jun/01	1	2
Iran	58,670 $\pm$	1,270	0.9638	May/01	1	2	United Kingdom	123,536 $\pm$	4,698	0.8347	Jul/01	1	2
Iraq	30,346 $\pm$	473	0.9101	–	1	–	United States	1,209,893 $\pm$	193,330	0.7741	Sep/01	2	1
Israel	4,968 $\pm$	353	0.8405	Aug/15	2	2

3.1 Parameter extraction from Weibull Metropolis-MCMC

Modeling of natural processes is of primordial importance for predicting forecast tendencies of similar phenomena in the future, as well as for extracting information from the model that allows one to better understand it. The estimated death toll is one of the parameters that can be extracted from the calculations used to model the COVID-19 data here. Modeling of the COVID-19 curves of daily new cases and deaths allows us to predict both the number of daily new cases and deaths in the future, as well as the overall death toll for COVID-19 in a given country. Table 1 shows the total expected death tolls for COVID-19 for all the studied countries. Data fitting were performed, for each country, using the split dates and the number of modes for the first (Mo1) and the second (Mo2) distributions, as displayed in Table 1 (Mo1 $=$ 1, representeing unimodal Weibull distribution and Mo1 $=$ 2 bimodal distribution). Modeling of the COVID-19 data can be customizing adjusted for each countries data, in order to set the split date and the number of modes of the used distributions, as well as to be reevaluated day-by-day, as new data emerge. Such analytical model may be a worthfull tool to evaluate and guide the health authorities and governments response to COVID-19 and to other epidemics in the future.

4. Discussion

COVID-19 is a global health emergency that is going to change the way in which people, institutions and governments manage and execute their lives and duties. The fact that specific drugs or vaccines for COVID-19 have only been developed recently, raises the importance of behavioral strategies, as social isolation, lock-down (Anderson et al., 2020; Lau et al., 2020; Mitjà et al., 2020; Saez et al., 2020; Sjödin et al., 2020; Wilder-Smith & Freedman, 2020) and mass testing (Choi et al., 2020; Peto, 2020; Salath et al., 2020) to keep fighting the pandemic. In this scenario, modeling and forecast predicting the course of the pandemic play an important role by providing information for evaluating the measures taken by governments and health authorities. Parameters extracted from modeling and forecast predictions may be used to determine better strategies to mitigate the impact of infection diseases in the population (Verma et al., 2020). With this in mind, we proposed the use of the Weibull distribution to model data on daily new cases and death of COVID-19 pandemic from some selected countries. In our previous work, the Weibull distribution has been used to model forecast predictions of COVID-19 data in Brazil (Moreau, 2020). From our knowledge, that was the first time in which such approach was used with this end, and the present work is the first report of the use of the Weibull distribution to model COVID-19 data in a sistematic worldwide analysis.

Weibull distribution has been shown to fit well to a COVID-19 daily new cases and deaths single peak. Figure 1 displays the daily new cases and deaths data from the first wave of COVID-19 infections in Italy. Italy was chosen because it was one of the countries that displayed a well defined single peak of new cases and deaths, probably because of strict lock downs and wide mass testing measures taken in response to the first wave of infections. This pattern allows us to use the Italy data to evaluate the application of the 4-parameter Weibull distribution to fit the COVID-19 epidemic data. Similar results could be obtained when data from the first peak of dailly new cases and deaths were analyzed in other countries that displayed a clear initial single wave of infections, such as Belgium, Canada, China, France, Germany, Netherlands, Portugal, Spain, Switzerland and United Kingdom (data not shown).

Figure 2 shows examples of five distinct customized ways to model the daily new cases and deaths for COVID-19, depending on the pattern of the countries epidemic curve. It was possible to model the data by performing non-linear curve fitting with in a single unimodal Weibull distribution (Eq. (1)), within a single bimodal distributions (Eq. (2)) or within two Weibull distribution. Two distributions can be applied to the modeling calculation by splitting the data in a given date. Actually, the split date may be set to a day in the deepest valley between the end of one epidemic wave and the beginning of another wave. Figure 2 brings examples of the customized analysis in which the fitting parameters were set to reach better fit results. Data from Belgium, Canada and Germany (upper panels), were modeled within two unimodal Weibull distribution and split point at August, 1 ${}^{\text{st}}$ ; from Bolivia, Brazil and Russia (middle panels), analyzed with one single bimodal Weibull distribution (Eq. (2)); from Serbia and United States (lower panel), analyzed with two Weibull distributions – one bimodal up to September, 1 ${}^{\text{st}}$ (split date) and one unimodal from September, 2 ${}^{\text{nd}}$ up; and, finally, from United Kingdom (lower panel), analyzed by one unimodal distributions up to September, 1 ${}^{\text{st}}$ (split date) and with one bimodal distribution from September, 2 ${}^{\text{nd}}$ forward. The split date, as well as the number of modes of the Weibull distribution to be used before (Mo1) and after (Mo2) the split date may be chosen in order to reach better quality of the data fitting. Table 1 shows the splitting date and the number of modes of the used distributions (Mo1 and Mo2), as well as the Determination Coefficient ( ${R}^{2}$ ) for each countries data fit. As also seen, most fitting procedures showed here have reached a good fit quality, with ${R}^{2}$ values over 0.6, confirming that the data analysis procedures using the 4-parameter Weibull distribution is a suitable model for COVID-19 data fitting.

Data analysis of the results presented did not allow us to determine correlations between the goodness of the fit and any key parameter related to the response measures from the countries governments to the COVID-19 pandemic, such as the number of deaths per million or the Oxford COVID-19 Government Response Tracker (Hale et al., 2020), for instance (data not shown), though the goodness of fit might be associated to misconduct data collection and processing. Well defined peaks for daily new cases and deaths, with the clear presence of ascending and descending phases, may tend to conform better to the unimodal Weibull distribution, as shown in Fig. 1A. We speculate that fuzzy patterns for the daily new cases and deaths, presented by some countries, might be associated to misleading strategies taken by such country to fight COVID-19 pandemic, what would make the number of daily new transmissions strongly vary, due to undesired spreading of the virus through out the community. This misconducting might lead to what is called “multiple waves” of the disease. In cases in which multiples waves are present, alternative ways to perform the Weibull analysis were presented here (Fig. 2 and Table 1).

Figure S1 (Supplementary Material) shows the customized analysis, performed with a single (no split date) or two Weibull distributions for every country data that present more than 3,000 deaths up to January, 7 ${}^{\text{th}}$ . Multiple waves pattern can be observed for most, if not all, the countries (Fig. S1). Split dates, Mo1 and Mo2 used to fit the data in Fig. S1 were as described in Table 1. Although the data analysis presented here was able to deliver feasible models on the COVID-19 pandemic data, it may be taken in account with caution, due to the possible existence of corrupted data or by the unconfidence on the epidemic data collected by some countries authorities. Yet, although the overall death tolls – extracted from the area under the curve ( $\alpha$ in Eq. (2)) – displayed in Table 1, reflect good estimates of the real number of deaths at the end of the pandemic peaks, it might probably be biased by the oscillations present in the pattern of epidemic data from some countries and can reach much greater values if new waves of infection become present.

In a overall point of view, the 4-parameter Weibull distribution showed to be a suitable modeling distribution for the COVID-19 pandemic, when applied to daily new cases and deaths data. Figure 3 summarizes the residuals analysis of the non-linear regression of the daily new cases and deaths data from every country used in this work. Residuals from both daily new cases and deaths form narrow distributions around zero. Lines in Panels A and B represent normal distributions with the mean and SD for each respective panel data. It is worth to note that the residuals distributions are narrower than the normal distribution of residuals, with same mean and SD values. This observation denotes the presence of highly dispersed outliers in the residuals. Panels C and D display more evidently those outliers. In despite of the presence of the outliers, data both from new cases and deaths display sharp residuals distributions within the Weibull distribution fit, what reinforces the use of such method for suitably modeling, forecast predicting and parameters extracting from daily new cases and deaths data of COVID-19.

Non-linear regressions used here were performed by Metropolis-MCMC algorithm built in a R language script and coded in a R package called “weibull4”. This module may be quite useful to be applied not just to COVID-19, but to any epidemic data that displays similar spreading pattern. Weibull4 R package can be used for non-linear regression of daily new cases and deaths data using both unimodal and bimodal 4-parameters Weibull distributions (Eqs (1) and (2)), with the location parameter ( $\gamma$ ), that accommodates the time lapse before the arise of first cases or deaths, and the area parameter ( $\alpha$ ), that represents the overall number of registered cases or deaths. Weibull4 package is available at the R CRAN repository (R Core Team, 2013) at https://cran.r-project.org/.

Predictions of COVID-19 epidemic evolution based on daily new cases and deaths are especially efficient, because they can be revised day-by-day, giving to governments and health authorities the opportunity of re-conducting their measures as new data arise. Additionally, 4-parameters Weibull distribution, as well as weibull4 R package, may be suitable to perform analysis on epidemic data from other diseases or, eventually, from future pandemics, since it seams to be a consensus in the scientific community that we are in imminent risk of them (Osterholm, 2005). We believe that such predictions would be useful for decision makers in order to define strategies to fight epidemic and pandemic outbreaks, nowadays and in the future.

Footnotes

Acknowledgments

The author would like to thank to Dr. Gilson Carvalho for worthful discussion and to Dr. Juliana Cortines for the critical reading of the manuscript.

Supplementary data

The supplementary files are available to download from https://dx-doi-org.web.bisu.edu.cn/10.3233/MAS-210510.

Supplement Fig. 1.

Plots of daily new cases (open circles) and deaths (closed circles) for COVID-19 in every country used in this work, calculated with customized mode. Split dates and the number of modes in the Weibull distribution used to fit the data before and after the split date are shown in Table 1. Data were fit (lines) using weibull4 R package (see Material and Methods). Charts are ordered from higher to lower Determination Coefficients ( ${R}^{2}$ ) of the data fit.

References

Anderson

R. M.

Heesterbeek

Klinkenberg

, & Hollingsworth

T. D.

(2020). How will country-based mitigation measures influence the course of the COVID-19 epidemic? Lancet, 395, 931-934.

Choi

J. Y.

(2020). COVID-19 in South Korea. Postgrad Med J, 96, 399-402.

Choi

Han

Lee

Kim

S. I.

, & Kim

I. B.

(2020). Innovative screening tests for COVID-19 in South Korea. Clin Exp Emerg Med, 1-5.

Ciufolini

, & Paolozzi

(2020). Mathematical prediction of the time evolution of the COVID-19 pandemic in Italy by a Gauss error function and Monte Carlo simulations. Eur Phys J Plus 135, 355.

Eberhardt

J. N.

Breuckmann

N. P.

, & Eberhardt

C. S.

(2020). Multi-Stage Group Testing Improves Efficiency of Large-Scale COVID-19 Screening. J Clin Virol, S1386-6532(20)30124-4.

Gupta

Raghuwanshi

G. S.

, & Chanda

(2020). Effect of weather on COVID-19 spread in the US: A prediction model for India in 2020. Sci Total Environ, 728, 138860.

Hale

Angrist

, Cameron- Blake

Hallas

Kira

Majumdar

Petherick

Phillips

Tatlow

, & Webster

(2020). Variation in government responses to COVID-19. BSG Work. Pap. Ser. Blavatnik Sch. Gov. Univ. Oxford: Version 8.0.

Khan

S. A.

(2018). Exponentiated Weibull regression for time-to-event data. Lifetime Data Anal, 24, 328-354.

Kim

Seo

Y. B.

, & Jung

(2020). Prediction of COVID-19 transmission dynamics using a mathematical model considering behavior changes. Epidemiol Health, 42, e2020026.

10.

Korner-Nievergelt

Roth

von Felten

Guélat

Almasi

, & Korner-Nievergelt

(2015). Markov Chain Monte Carlo Simulation. in: Bayesian Data Anal. Ecol. Using Linear Model. with R, BUGS, STAN. Elsevier, 197-212.

11.

Lau

Khosrawipour

Kocbach

Mikolajczyk

Schubert

Bania

, & Khosrawipour

(2020). The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China. J Travel Med 27.

12.

Lawless

J. F.

(2003). Basic Concepts and Models 1.1. in: Stat Model Methods Lifetime Data, Second Ed, 1-47.

13.

Yang

Dang

Meng

Huang

Meng

Wang

Chen

Zhang

Peng

, & Shao

(2020). Propagation analysis and prediction of the COVID-19. Infect Dis Model, 5, 282-292.

14.

Metropolis

Rosenbluth

A. W.

Rosenbluth

M. N.

Teller

A. H.

, & Teller

(1953). Equation of State Calculations by Fast Computing Machines. J Chem Phys, 21, 1087-1092.

15.

Mitjà

Arenas

À.

Rodó

Tobias

Brew

, & Benlloch

J. M.

(2020). Experts’ request to the Spanish Government: Move Spain towards complete lockdown. Lancet, 395, 1193-1194.

16.

Moreau

V. H.

(2020). Forecast predictions for the COVID-19 pandemic in Brazil by statistical modeling using the Weibull distribution for daily new cases and deaths. Brazilian J Microbiol, 51, 1109-1115.

17.

Osterholm

M. T.

(2005). Preparing for the next pandemic. N Engl J Med, 352, 1839-1842.

18.

Peto

(2020). Covid-19 mass testing facilities could end the epidemic rapidly. BMJ, m1163.

19.

R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria.

20.

Roberts

G. O.

Gelman

, & Gilks

W. R.

(1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl Probab, 7, 110-120.

21.

Roser

Ritchie

, Ortiz- Ospina

, & Hasel

(2020). Coronavirus Pandemic (COVID-19) .

22.

RStudio Team. (2020). RStudio: Integrated Development Environment for R. Boston, MA.

23.

Saez

Tobias

Varga

, & Barceló

M. A.

(2020). Effectiveness of the measures to flatten the epidemic curve of COVID-19. The case of Spain. Sci Total Environ, 727, 138761.

24.

Salath

Althaus

C. L.

Neher

Stringhini

Hodcroft

Fellay

Zwahlen

Senti

Battegay

Wilder-Smith

Eckerle

Egger

, & Low

(2020). COVID-19 epidemic in Switzerland: on the importance of testing, contact tracing and isolation. Swiss Med Wkly.

25.

Sjödin

Wilder-Smith

Osman

Farooq

, & Rocklöv

(2020). Only strict quarantine measures can curb the coronavirus disease (COVID-19) outbreak in Italy, 2020. Eurosurveillance, 25, 1-6.

26.

Verma

Vishwakarma

R. K.

Verma

Nath

D. C.

, & Khan

H. T. A.

(2020). Time-to-Death approach in revealing Chronicity and Severity of COVID-19 across the World. Ed. Kannan Navaneetham. PLoS One, 15, e0233074.

27.

Wilder-Smith

, & Freedman

D. O.

(2020). Isolation, quarantine, social distancing and community containment: pivotal role for old-style public health measures in the novel coronavirus (2019-nCoV) outbreak. J Travel Med, 27, 1-4.