Reconstructing and Forecasting the COVID-19 Epidemic in the US Using a 5-Parameter Logistic Growth Model

Background: Many studies have modeled and predicted the epidemic of COVID-19 in the US using data that starts from the first reported cases. However, because of the shortage of test services to detect the infected, this approach is subject to error due to under-detection in the early period of the epidemic. We attempted a new approach to overcome this limitation and to provide data supporting the public policy decisions against the life-threatening COVID-19 epidemic. Methods: Documented data by CDC were used, including daily new and cumulative cases of confirmed COVID-19 in the US from January 22 to April 6, 2020. A 5-parameter logistic growth model was used to reconstruct the epidemic. Instead of all data in the whole study period, we fitted data in a 2-week window from March 21 to April 4 (approximately one incubation period) during which massive testing services were in position. With parameters obtained from the modeling, we reconstructed and predicted the epidemic and evaluated the under-detection. Results: The data fit the model satisfactorily. The estimated daily growth rate was 16.8% (95% CI: 15.95%, 17.76%) overall, with 4 consecutive days having a doubling growth rate. Based on the modeling result, the tipping point for new cases to decline will be on April 7 th , 2020, with 32,860 new cases. By the end of the epidemic, a total of 792,548 (95% CI: 789,162-795,934) will be infected. Based on the model, a total of 12,029 cases were not detected from the first case from January 22 to April 4. Conclusions: Study findings suggest the usage of a 5-parameter logistic growth model with reliable data that comes from a specified window period, where governmental interventions are appropriately implemented. In addition to informing decision-making, this model adds one tool for use to capture the underlying COVID-19 epidemic caused by a novel pathogen.


Introduction
COVID-19 is an infection caused by a novel pathogen named as SARS-Cov-2. The pandemic of COVID-12 is a typical example of global health issues , and it spread to the world only in less than five months. Since the first case reported in the US in January 22, many studies used different models to reconstruct the epidemic and to forecast the future trends, from simple growth models to classic susceptible-infectious-recovery models . Since little information is available for COVID-19 during the early period of the epidemic, there is a lack of data to construct complex and classic epidemiological models, leaving the population-based ecological growth model as a preferable option.
Historically, various population-based models are available in the literature to model population dynamics in demography and disease epidemics in public health and medicine. The first is the 1-parameter exponential growth model. In this model, population growth has no upper limit and is determined by one parameter of growth rate. To reflect the upper limit of population growth, the 2-parameter logistic growth model was developed. In this model, the population growth rate is exponential in the beginning, but this growth rate gets smaller and smaller as population size approaches a maximum carrying capacity as detailed described in Richards (1959), McIntosh (1985, Renshaw (1991), Kingsland (1995), andVandermeer (2010).
To obtain additional characteristics key to understanding population growth, the 2parameter logistic growth model has been extended to 3-parameter, 4-parameter, and 5parameter logistic growth models. These models have been widely used in other fields of research, including demography and analytical chemistry (Gottschalk and Dunn, 2005;Motulsky and Brown, 2006). Despite many advantages, no study employed the method to investigate the 5 | P a g e COVID-19 epidemic in the United States and other countries in the world. One purpose of this study is to assess the utility of the 5-parameter growth model.
Unlike typical population growth, only a small number of COVID-19 cases will be detected in the early period of an epidemic. The detected cases could approach the real epidemic if the time of outbreak of the epidemic was known. Consequently, if extensive testing services were implemented, detection would be more accurate. Reported data indicate the incubation period of COVID-19 is about 14 days (Chen & Yu, 2020); and COVID-19 test services in the US started from mid-March and sustained thereafter following CDC guidance. This provides a window time of 14 days with the highes level of detection rate that will not be affected by removal of the infected on the growth curve, which is ideal for model building. In principle, a model built with such data would be much closer to the truth than those with data from the whole study period. We tested this approach in analyzing the COVID-19 in the US.

Materials and Methods
Data: Data for this study were daily cumulative cases from January 22 to April 6, 2020. This real-time data were compiled from USA CDC website which were available by the time this study was conducted (https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-inus.html, accessed on April 7, 2020).

Model:
We modeled the data using the 5-parameter logistic growth model as below: where 1) ( ) is the cumulative cases of COVID-19 over time, t (t=1/22,1/23,…,4/6, 2020); is the minimum number of cases at the beginning of the epidemic on January 22, 2020 when the first case was reported in the US; 3) is the maximum number of cases when the epidemic ends, it is the model-predicted total number of Americans who would be infected with COVID-19; 4) r is the daily exponential growth rate;

5)
is the estimated tipping point when the daily new cases start to level-off where the daily new cases would increase at the left-side and decrease at the right-side; and 6) is an asymmetric parameter quantifying the skewness of the distribution of daily new cases. =1 indicate a symmetric distribution centered at ; >1 indicates faster increases in new cases before and slower after ; and the pattern will be reversed if <1.
With Model 1 defined above, daily new cases ( ) can be obtained by taking the first derivative of the model: where the error term ∈ ( ) is assumed to be normally distributed with mean 0 and standard deviation of σ.

Implementation of modeling analysis
Data analysis was conducted using the software R. Daily data for a window period from March 21 to April 4 were fitted with a 5-parameter logistic grow model as shown in Model 2.
Modeling analysis was implemented using a nonlinear optimization algorithm to minimize the sum of squared errors between the observed and model-estimated data. The optimization process 7 | P a g e was achieved by calling the R function "optim". Estimates were thus obtained through the optimization process for the five parameters , , , r, and with a significance level set at p < 0.05 (two-sided).
With the estimated five model parameters, model-based cumulative and new cases day by day were estimated up to April 6 and predicted beyond April 6 using Model 1 and 2, respectively. Under-detection cases in a specific period were computed as the differences between the reported and model predicted cases.

Results
The cumulative daily cases from March 21 to April 4 fit Model 2 satisfactorily and the model fit converged nicely. The estimated parameters, their standard error and 95% CI are summarized in Table 1. All model parameters were statistically significant at p < 0.001 level except . The lack of significance for appears to be reasonable given the small scale of this number relative to other parameters and practical difficulties in determining the number of cases at the beginning of the epidemic when the first few COVID-19 cases were detected and reported.  cumulative cases estimated using the model, contrasted with the observed data. Overall, it will take approximately one month from 20,000+ per day from around April 7 to 100s per day at around early May. Correspondingly, after the tipping point, the cumulative cases will continue to increase rapidly after the tipping point until early May as illustrated in Figure 2.

Discussions and Conclusions
In this study, we reported our work to model, reconstruct, and forecast the COVID-19 epidemic with a 5-parameter logistic grow model, a method widely used in demography, biology and other hard sciences. We are the first to use it in analyzing the epidemic of COVID-19 in the US. In addition, we innovatively used data from the period with more complete detection of new cases to fit the model, and then used the fitted model to reconstruct the cases before and after the study period and forecasted the future beyond the study period.
From a global health perspective, control the COVID-19 epidemic in the US is an essential part of fighting the pandemic across the globe . This study provides data much needed for public health decision-making to end the epidemic in the United States. In addition, this study demonstrates the utility and efficiency of the 5-parameter logistic growth model in understanding the epidemic of a new infection during its early period when the need for information is extremely high but not much data is available. Our modeling method provides a tool to overcome the challenge.
Based on findings from our modeling analysis, the likelihood would not be high for the new cases to increase continuously after the tipping point. However, by the end of the epidemic, an estimate of approximately 800,000 Americans will be infected. This number is lower than those by others that can be as high as 2.2 million in Ferguson et al. (2020)  of our estimation will be tested along with the ongoing development of the epidemic in the United States.
The daily exponential growth rate of COVID-19 was 16.85% for the US population. This is very close to 17.12%, the rate estimated for the same COVID-19 in China (Chen and Yu, 2020). In the early period of an epidemic, this growth rate can be obtained with limited data, and the importance of growth rate is more information than other parameters such as R0 (the reproduction number) to guide anti-epidemic actions. Growth rates provide a dynamic measure of instantaneous change, duration of doubling based on growth rate is practically quite useful to guide and evaluate anti-epidemic measures while R0 is a static measure with timing not included, and it may be of great value for research, but can hardly be determined accurately at the early stage of an new epidemic with rather limited data.

| P a g e
There are limitations to this study. First, selection of data from a window is more subjective than objective. Caution is needed when the same method is used in different countries/regions with different anti-epidemic strategies implemented in different ways. Second, additional work is needed to improve the confidence of , the minimum number of cases at the beginning of an epidemic. It is a challenge to improve the estimation given the large range of different measures in the model. For example, the differences between and in our analysis is from about 30 to 800,000. Furthermore, the reported cases at the beginning of the epidemic are highly unreliable and thus will lead to an unreliable estimation of .
Despite the limitations, findings from this study provided timely data, much needed for public health decision-making to end the epidemic. We will continue to update our model as more data become available with the evolvement of the COVID-19 epidemic in the United States.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and material
The data that support the findings of this study are available from https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html. Data accessed at 4:30pm, April 7, 2020.