Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis
Global Health Research and Policy volume 5, Article number: 20 (2020)
To contain the outbreak of coronavirus disease 2019 (COVID-19) in China, many unprecedented intervention measures are adopted by the government. However, these measures may interfere in the normal medical service. We sought to model the trend of COVID-19 and estimate the restoration of operational capability of metropolitan medical service in China.
Real-time data of COVID-19 and population mobility data were extracted from open sources. SEIR (Susceptible, Exposed, Infectious, Recovered) and neural network models (NNs) were built to model disease trends in Wuhan, Beijing, Shanghai and Guangzhou. Combined with public transportation data, Autoregressive Integrated Moving Average (ARIMA) model was used to estimate the accumulated demands for nonlocal hospitalization during the epidemic period in Beijing, Shanghai and Guangzhou.
The number of infected people and deaths would increase by 45% and 567% respectively, given that the government only has implemented traffic control in Wuhan without additional medical professionals. The epidemic of Wuhan (measured by cumulative confirmed cases) was predicted to reach turning point at the end of March and end in later April, 2020. The outbreak in Beijing, Shanghai and Guangzhou was predicted to end at the end of March and the medical service could be fully back to normal in middle of April. During the epidemic, the number of nonlocal inpatient hospitalizations decreased by 69.86%, 57.41% and 66.85% in Beijing, Shanghai and Guangzhou respectively. After the end of epidemic, medical centers located in these metropolises may face 58,799 (95% CI 48926–67,232) additional hospitalization needs in the first month.
The COVID-19 epidemic in China has been effectively contained and medical service across the country is expected to return to normal in April. However, the huge unmet medical needs for other diseases could result in massive migration of patients and their families, bringing tremendous challenges for medical service in major metropolis and disease control for the potential asymptomatic virus carrier.
The outbreak of coronavirus disease 2019 (COVID-19) has been presenting a major threat to public health. The first COVID-19 case was reported on Dec 8, 2019 . To curb the spread of the virus, Chinese health authorities have taken the strictest massive anti-epidemic actions since Jan 2020, including mass isolation, social distancing and community containment [2, 3]. Moreover, the government has implemented traffic restrictions across the whole country with massive reduction in public transportation capacity. As the epidemic situation remains fraught in China, key epidemiological questions, such as the effectiveness of implemented strategies for disease control, remain to be fully investigated.
The government has been increasingly investing medical resources in the treatment of patients with COVID-19. On February 5, three cabin hospitals and two other makeshift hospitals successively started to treat infected patients. By March 9, 346 medical teams and 42,600 medical professionals have been dispatched from other provinces across the country to combat the epidemic in Hubei province. It is reported that 7512 designated hospitals and related fever clinic are mobilized nationwide . However, the nationwide mobilization of medical resources could severely disturb local routine medical service. According to the 2018 National Report on the Services, Quality and Safety in Medical Care System , currently about 2 million patients each year travel across regions to seek medical care in China, among which 43% of all the cross-regional cases are concentrated in Beijing, Shanghai and Guangzhou (842 thousand cases in total). These three metropolises play a pivotal role in the healthcare system in China, providing high-quality medical service for patients in China. Notably, over thousands of medical professionals from medical centers in these metropolises have been dispatched to Wuhan and other cities in Hubei province to fight COVID-19 . As the full resumption of normal healthcare services in the metropolises marks the complete restoration of healthcare system in China from the epidemic of COVID-19, providing estimation on the number of affected patients and prediction of restoration of routine medical service is urgently needed to facilitate preparedness of the healthcare system.
In this study, we provided an estimation of the epidemic trend of COVID-19 in Wuhan and representative metropolises in China and forecast the time point when the routine medical service would recover from the epidemic. Furthermore, we utilized data on population migration to construct an improved mathematical model to measure the impact of traffic restrictions on the migrant patients, providing estimation of operational pressure for metropolitan medical service after the end of the epidemic.
In this study, data from two sources were used for statistical analysis. The website of Tencent news provided us with the time series data of COVID-19 by locations, including the number of confirmed cases, deaths, recovered cases, and newly diagnosed cases . Baidu migration  is an open-source big data project visualizing population migration. Leveraging its Location based services system and Baidu Tianyan system, we obtained the daily migration scale index (MSI) of Beijing, Shanghai and Guangzhou, in January and February of both 2019 and 2020. The data involved in this study are available and public, provided by the media or common data platform. There is no need for approval from ethics committee as no privacy issue exists.
Model construction and data analysis
Construction of a modified SEIR model
In view of the actual situation of self-healing of the exposed people in this epidemic, we added the rehabilitation coefficient (β) based on the classic SEIR model  (Susceptible, Exposed, Infectious and Removed model) and in order to verify the effectiveness of the implemented interventions for epidemic control, we added the quarantine measures variable (in day 46) and the Cabin hospital variable (in day 59). In addition, a new House quarantine module (H) was added to demonstrate the effectiveness of new initiatives, as shown in Fig. 1 (See details in the Supplemental Materials Part 1). In order to estimate the epidemic situation of large medical centers and predict the time to restore the functions of daily medical service, we utilized data from Shanghai, Beijing, and Guangzhou to simulate the development of the epidemic assuming that only human-to-human transmission exists, no special medicine is found at this stage and no major health events happen [10,11,12]. In the meantime, special consideration was given to the epidemic situation in Wuhan, as it’s the epicenter of epidemic. We constructed a specific model to estimate the impact of community containment and construction of makeshift hospitals on the epidemic situation in Wuhan. The date when the first confirmed case was detected in local government was used as the first day for modeling the local epidemic situation, specifically, Dec 8, 2019 in Wuhan , Jan 13, Jan 12 and Jan 21, 2020 respectively in Beijing, Shanghai and Guangzhou [13,14,15].Furthermore, we adopted a strategy different from that of the previous cumulant-based modeling methods . We counted the daily number of people excluding dead and recovered cases to ensure the independence of the current status of each patient, so as to guarantee the accuracy and interpretability of the SEIR model.
Construction of time series prediction model based on neural network
The purpose of the neural network (NNs) is to supplement overcome the limitations of the SEIR model . On the basis of the prediction of inflection point by the SEIR model, we used the neural network to further refine the fitting at specific time points to achieve accurate prediction. In order to obtain the optimal neural network model, we chose four network structures that was commonly used to predict the cumulative number of the confirmed cases nationwide, as shown in Enclosure. According to R2 (R-squared/coefficient of determination) and the loss function, the model with the best performance is selected to further predict other epidemic changes, including the number of suspects, cures, and deaths, where the R2 is defined as and is the predictive value of our model.
The predicted results are visualized using Tensorflow tools for further analysis. (See the Supplemental Materials Part 2).
Construction of inference model for inpatient hospitalization based on ARIMA
In order to quantify the impact of the outbreak on migrant patients during the persecuting period of epidemic, we utilized the Baidu migration big data platform to extract migration data for 1 month before and after the Spring Festival of Beijing, Shanghai and Guangzhou in 2019 and 2020. Using the Baidu Migration Index  as a measure of the number of immigrants and the Chinese lunar calendar date as a standard, in terms of daily units, we construct a curve of the reduction in the number of migrants in 2020 compared to 2019.
Autoregressive Integrated Moving Average model (ARIMA) is used to predict the date when the amount of migration can return to a normal status . Considering that February 10th is the presumed date of returning to work issued by the Chinese government, and 8th and 9th are amid the peak period of the return trip, we only use data after the 10th for the prediction. The ARIMA model contains three hyper-parameters: Auto-Regressive, Integrated, and Moving Average. Based on the estimated declining proportion of metropolitan immigration population, and the number of cross-regional cases in Beijing, Shanghai and Guangzhou in 2019, we estimated the declining proportion of patients whose demands for cross-regional medical service are suppressed during the outbreak. The features of the autocorrelation function and partial autocorrelation function were utilized to debug the model parameters, and the R2 was used as a final evaluation standard. (See the Supplemental Materials Part 3).
The Discrete variables including the daily number of confirmed cases, deaths and recovered cases were collected for modeling based on machine learning. The recovered rate and mortality were calculated to fit the SEIR model. The estimated number of patients with unmet medical services was indicated with median, and statistical uncertainty was presented using 95% confidence interval (CI). Python3.5 (Python Software Foundation) and Statistical Product and Service Solutions (SPSS21.0, Almonk, New York, USA) were used here for data analysis.
According to our model-based analysis, the estimated number of infected patients nationwide would reach to the peak of 80,000 during the persecuting period of the epidemic (Fig. 2). The short-term trend described by the neural network was consistent with the modified SEIR model. Therefore, the modified SEIR model was further used to predict long-term trends of the epidemic. The modified SEIR model simulated the consequences of several measures, including quarantine, the increment of medical professionals and beds. Wuhan was locked down 46 days after the outbreak of the epidemic, and in the meantime other cities strictly restricted the movement of population and initiated home-based quarantine. As a result, the daily growth of exposed people was significantly controlled and gradually dropped down to 15,000 people, which fundamentally minimize the epidemic risk of COVID-19.
According to our analysis, with the timely mobilizing of medical professionals and the expansion of beds capacity, the number of infected cases decreased significantly while the number of recovered cases increased significantly. The recovery rate increased by more than 10% and the death rate decreased by 2.5%. The peak of the epidemic occurred earlier than expected. The modeling results show that if the government simply conducted city-wide quarantine in Wuhan without additional beds and reinforced medical professionals, the total number of infected cases might reach more than 80,000 and the number of deaths might be around 20,000, which increases by 45% and 567% respectively. The peak of the epidemic in Wuhan might be postponed beyond April (Figs. 3 and 4).
The implementation of traffic restrictions led to better containment of COVID-19 infection in other parts of China. The recovery rate in Beijing, Shanghai and Guangzhou rose steadily to more than 45%, with a death rate lower than 1%. According to the estimation, there will be 480 confirmed cases in Beijing, 360 in Shanghai, and 380 in Guangzhou. Nevertheless, the death rates in Beijing, Shanghai and Guangzhou are significantly lower than that of Wuhan (1% vs. 4%), while the recovery rates are comparatively higher (45% vs. 16%), suggesting the dispatch of medical professionals have limited negative impact on the local epidemic control. This finding suggests that the growth of epidemic has been slowing down and is gradually reaching the peak. Based on the deduction results of MLP model, the epidemic of Wuhan may reach its inflection point at the end of this March and come to an end in April. The outbreak in Shanghai (Fig. 5), Beijing (Fig. 6) and Guangzhou (Fig. 7) will end about 1 month earlier than expected. Given that all medical professionals engaged in the epidemic treatment should be quarantine for 2 weeks, the national healthcare system (except Wuhan) would return to normal in mid-April.
The migrant population of Beijing, Shanghai and Guangzhou in the first quarter of 2019 and 2020 was shown in Fig. 8. The medical centers from three metropolises may face more than 58,799(95%CI 48926–627,232) additional hospitalizations in total in the first month after the epidemic. The estimated number of patients with unmet medical service (hospitalization) during the epidemic in Beijing, Shanghai and Guangzhou was shown in Table 1.
The ongoing outbreak of COVID-19 is a major public health emergency, posing great challenges to the public healthcare system [18, 19]. As Chinese authorities have taken massive anti-epidemic actions, the spread of the virus has been effectively curbed. However, patients’ demand for routine medical care is substantially suppressed during the persecuting period of the epidemic, especially for patients with noninfectious chronic diseases. In this study, we constructed a mathematical model to simulate the epidemic situation in Wuhan and other parts in China (represented by Beijing, Shanghai and Guangzhou). Compared to Wu’s study , we conducted multi-model analysis and prediction, with special consideration given to the massive anti-epidemic actions taken by Chinese authorities, such as community containment, large-scale mobilization of medical professionals and rapid expansion of beds capacity. In line with Chen’s study, which points to the effectiveness of massive interventions taken by China on the control of COVID-19 epidemic , we estimated that the end of COVID-19 would occur about 1 month earlier, ending in late April in Hubei and late March in the rest region of China (except Hubei). The recovery of national medical service was estimated to be observed in mid-April. However, the rapid expansion of migrant patients early after the end of the epidemic would bring significant pressure to medical service in Beijing, Shanghai and Guangzhou.
The distribution of medical resources in China is relatively imbalanced . For instance, 66% of all the established National Clinical Research Centers are located in Beijing, Shanghai and Guangzhou, and there are significant numbers of patients across the country admitted in these institutions each year . The restoration of operational capability of metropolitan medical centers is determined by several factors, including the epidemic control in critically infected region (represented by Hubei province), the regional epidemic control, and the recovery of public transportation and population flows nationwide . In the present study, we utilized the network data to model the trend of epidemic containment in Wuhan and those three metropolises, and conducted a model-based analysis to estimate challenges after the restoration of operational capability.
The SEIR model is a classic epidemiological model widely used for modeling infectious diseases with incubation period . In this study, we adapted and modified the classic SEIR model based on the characteristics of COVID-19, such as the viral transmission capability, its spatial distribution and route of transmission. We introduced the House quarantine module (H), the rehabilitation coefficient (β), the quarantine measures variable and the cabin hospital variable. Moreover, the real-time network data was leveraged in this modified SEIR model. The traditional SEIR model is flawed by inaccuracy to predict the inflection point of the epidemic . We used neural network to address this issue. Compared to the traditional time series prediction algorithms, the neural networks process the distributed parallel information by coordinating the interconnection among massive internal nodes . Hence, it is superior in large-scale parallel processing, distributed storage, elastic topology, high redundancy and high robustness, making it more suitable for high-speed non-linear operations. As such, we combined the neural networks with SEIR model, constructing model based on official data to compare the prediction accuracy among four commonly used neural networks and adopted the optimal model for the prediction for the epidemiological trend of COVID-19.
As the epidemic expands, the confirmed cases are continuously up-rising. However, according to the government report, except for Hubei province, the total number of newly confirmed cases in mainland China has declined for 16 consecutive days , indicating the epidemiological situation of COVID-19 is in a slow-growth (except for Hubei province), while the growth rate of infected cases in Hubei is still in a steady increase, which is also corresponding to our model-based prediction.
Amid the severe epidemic, patients’ demands for routine healthcare in other specialties are typically suppressed, especially for cross-regional cases. Admittedly, telemedicine among large medical centers may partially relieve the pressure on patients’ demand for medical service, whereas the treatment for major diseases, especially invasive surgical treatment, is less available for patients. These treatments, in most cases, cannot be provided by local community medical institutions.
The implementation of traffic restrictions is one of the main causes hindering large medical centers from returning to normal order of medical service. As public transportation is being shut down, a large number of patients are having difficulty to access large medical centers, leading to a waste of medical resources among these medical centers. To visualize the impact of this issue, we utilized the Baidu Migration Platform to obtain definitive data of population migration index among Beijing, Shanghai and Guangzhou. Based on the ARIMA model, we estimated that the number of cross-regional patients in major medical centers among those three large cities would be reduced by approximately 59 thousand during the first quarter of 2020. The predictions based on this model might be reliable because of its advantage in forecasting time series when compared to the general mathematical models like regression analysis and linear function. According to our analysis, an extra 59 thousand migrant patients would head to the major medical centers in Beijing, Shanghai and Guangzhou, posing a great challenge to the metropolitan healthcare system. Thus, Chinese health authorities should make adequate preparation for triaging patients and their accompanied families. In addition, sufficient attention must be paid to the infected patients with excessively long incubation period and asymptomatic 2019-nCoV carriers . Since 70% of patients are cross-regional cases in tertiary hospitals in Beijing, massive number of migrant patients would raise challenges to local epidemic control during the early stage after the containment of COVID-19. The relevant authorities should develop strategies for health inspection and quarantine. Our results also underscore the importance of a hierarchical healthcare system, which relies on regional medical centers to triage patients.
Our study had several limitations. Firstly, the SEIR model-based analysis was conducted based on open source data, it could not fully capture the actual number of infected cases, and the imported confirmed cases from overseas were not included in our model-based analysis. Secondly, the predictive model was constructed based on the natural distribution of people, it cannot be applied to the special population distribution such as welfare institute. Thirdly, this model is unable to accurately predict the epidemiological trend of COVID-19 under the cases of viral mutation and the development of specific anti-virus therapy. Fourthly, the increment of medical professionals involved and beds capacity followed an un-uniform growth pattern, which cannot be simulated by our models. Lastly, the psychological factors may cause a bias to our predictive models, as patients’ intention to seek medical care would be reduced under the shadow of the epidemic.
In conclusion, our study highlights the significant challenges presented to the healthcare system in China under a public health emergency. As the resolute massive anti-epidemic actions are implemented, the end of the outbreak would be expected in late March in mainland China outside Hubei province and the routine medical service would recover in mid-April. However, patients’ demand for routine medical care would expand rapidly within a month after the end of outbreak highlighting the need of coordination among regional medical centers. These findings could inform policy makers and public health officials to devise preparedness plans to address the unmet medical needs of other diseases under the COVID-19 epidemic.
Availability of data and materials
All the primary data and materials involved in this paper are from the published articles and web links, and they are all available online.
Coronavirus disease 2019
Susceptible, exposed, infectious and recovered
Neural network models
Autoregressive integrated moving average
Migration scale index
House quarantine module
Multilayer perceptron model
Intensive care unit
Recurrent neural network
Long short-term memory
Gate recurrent unit
Notification from Wuhan Municipal Health Commission about cases of viral pneumonia of unknown cause. http://wjw.wuhan.gov.cn/front/web/showDetail/2020011109036. Accessed 26 Feb 2020.
Chen J. Pathogenicity and transmissibility of 2019-nCoV-A quick overview and comparison with other emerging viruses. Microbes Infect. 2020;22:69–71.
Li Q, Guan X, Wu P, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. 2020. https://doi.org/10.1056/NEJMoa2001316.
Institute of medical information, Chinese Academy of Medical Sciences. The dataset of nationwide designated hospitals and related fever clinics for novel coronavirus pneumonia (in Chinese). Data warehouse of National Population Health Science Data Center PHDA, 2020.CSTR:A0006.11.A0003.202001.000583.
NHC.CHINA. 2018 National Report on the services, Quality and Safety in Medical Care System; 2018.
32,572 medical workers across the country have been sent to Wuhan to combat the virus. http://www.chinanews.com/sh/2020/02-21/9099829.shtml. Accessed 27 Feb 2020.
Real-time reporting of cases of coronavirus disease 2019. https://news.qq.com/zt2020/page/feiyan.htm?ADTAG=area. Accessed 26 Feb 2020.
Baidu migration. http://qianxi.baidu.com. Accessed 26 Feb 2020.
Zha WT, Pang FR, Zhou N, et al. Research about the optimal strategies for prevention and control of varicella outbreak in a school in a central city of China: based on an SEIR dynamic model. Epidemiol Infect. 2020;148:e56.
Chan JF, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. 2020;395(10223):514–23.
MRC Centre for Global Infectious Disease Analysis. https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/news%2D%2Dwuhan-coronavirus/. Accessed 20 Feb 2020.
Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395(10223):507–13.
3 new cases of novel coronavirus pneumonia were identified in Beijing. http://wjw.beijing.gov.cn/xwzx_20031/xwfb/202002/t20200210_1627164.html. Accessed 26 Feb 2020.
All-out efforts were made in Shanghai to contain novel coronavirus pneumonia, stated by director of Shanghai Municipal commission of Health. http://wsjkw.sh.gov.cn/xwfb/20200123/f2629d7ebc4646f09ca30b551357f909.html. Accessed 26 Feb 2020.
2 cases of novel coronavirus pneumonia were identified in Guangzhou. http://wjw.gz.gov.cn/ztzl/xxfyyqfk/yqtb/content/post_5643152.html. Accessed 26 Feb 2020.
Arav-Boger R, Boger YS, Foster CB, Boger Z. The use of artificial neural networks in prediction of congenital CMV outcome from sequence data. Bioinformatics Biol Insights. 2008;2:281–9.
Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in brief. 2020;29:105340.
Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382(8):727–33.
Tang B, Wang X, Li Q, et al. Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions. J Clin Med. 2020. https://doi.org/10.3390/jcm9020462.
Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet. 2020. https://doi.org/10.1016/s0140-6736(20)30260-9.
Chen X, Yu B. First two months of the 2019 coronavirus disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model. Glob Health res Policy. 2020;5:7.
Construction layout of national clinical medicine research center. https://www.sohu.com/a/327803067_610510. Accessed 27 Feb 2020.
Tillett HEJE, Infection. Infectious Diseases of Humans; Dynamics and Control. 1992; 108(1).
Parks RW, Long DL, Levine DS, et al. Parallel distributed processing and neural networks: origins, methodology and cognitive functions. Int J Neurosci. 1991;60:195–214.
No new coronavirus cases were identified in 16 consecutive days mainland China outside Hubei province. http://news.cctv.com/2020/02/20/ARTI4Rx7S8pBdP1b6CBVGbEB200220.shtml. Accessed 27 Feb 2020.
Gao WJ, Li LM. Advances on presymptomatic or asymptomatic carrier transmission of COVID-19. Zhonghua Liuxingbingxue Zazhi. 2020;41:485–8.
The study is supported by National Science Fund for Distinguished Young Scholars (81525002), Program for Shanghai Outstanding Medical Academic Leader (2019) and National Ten-Thousand Talents Program (2017).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
About this article
Cite this article
Liu, Z., Huang, S., Lu, W. et al. Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis. glob health res policy 5, 20 (2020). https://doi.org/10.1186/s41256-020-00145-4