A PIECEWISE GROWTH MODEL FOR MODELING THE ACCUMULATED NUMBER OF COVID-19 CASES IN THE CITY OF CAMPO GRANDE

In December of 2019, a new coronavirus was discovered in the city of Wuhan, China. The World Health Organization officially named this coronavirus as COVID-19. Since its discovery, the virus has spread rapidly around the world and is currently one of the main health problems, causing an enormous social and economic burden. Due to this, there is a great interest in mathematical models capable of projecting the evolution of the disease in countries, states and/or cities. This interest is mainly due to the fact that the projections may help the government agents in making decisions in relation to the prevention of the disease. By using this argument, the health department of the city (HDC) of Campo Grande asked the UFMS for the development of a mathematical study to project the evolution of the disease in the city. In this paper, we describe a modeling procedure used to fit a piecewise growth model for the accumulated number of cases recorded in the city. From the fitted model, we estimate the date in which the pandemic peak is reached and project the number of patients who will need treatment in intensive care units. Weekly, was sent to HDC a technical report describing the main results.


Introduction
On February 26th of 2020, the first case of COVID-19 was confirmed in Brazil. A man 61-years-old resident in the city of São Paulo, SP, Brazil. And on March 14th of 2020, the first case was confirmed in the city of Campo Grande, MS, Brazil. From the first case, the health department of the city (HDC) of Campo Grande asked the Federal University of Mato Grosso do Sul (UFMS) for the development of a mathematical model to monitor and project the evolution of the disease in the city. The HDC argued that the modeling results would serve to support some decisions in relation to disease prevention, such as, proposing the intensification of social isolation, the purchase of hospital equipment, an increase in the number of intensive care units in public hospitals, among others.
From the request of the HDC, the authors of this paper began a study with the aim of fitting a growth model for the accumulated number of the COVID-19 cases recorded in the city of Campo Grande. The modeling was started by calculating the main descriptive statistics, followed by the fit of the most known growth model, the exponential model. However, as described by several epidemiological studies the number of people becoming ill in an epidemic grows until a limit (MEYER, 1999); and not indefinitely as described by the exponential model. Due to this, it is usual to consider that the growth process of the number of cases of a disease is given by a model in which the graphic is a sigmoidal curve (S-shaped curve).
Thus, besides the exponential model, we also consider the fit of the Logistic and Gompertz growth models. Both models are characterized by an S-shape curve defined by two distinct phases. The first phase is characterized by growth at an increasing rate (positive slope) and the second phase is characterized by growth at a decreasing rate (negative slope). The point in which the curve changes the slope (positive to negative) is called the inflection point. In the context of an epidemiological study, this point indicates when the pandemic peak is reached. The choice for these two growth models is based on studies described in the literature that indicate that both models are excellent for use in quantitative longitudinal data, see for example Budimulyati et al., (2012) and its references.
However, in the course of the study, we noticed that the fit of a single growth model would not be suitable, since the recorded values indicated a change in the pandemic's growth behavior. Due to this, we adopted the fit of a piecewise growth model. In order to fit a piecewise model, we separated the dataset into four subdatasets. This separation was done in the course of the analysis using the model with the lowest mean square error as a criterion for choosing the separation point.
For each one of the sub-datasets, we fit the three growth models; where the estimates for the parameters of the models were obtained using the non-linear least square method, as decribed by Vieira and Hoffman (1977) and Hsieh (2017).
To choose the best model for each sub-dataset, we consider as a criterion the mean square error (MSE) and the model selection criteria Akaike Information criterion (AKAIKE, 1974;BOZDOGAN, 1987), denoted by AIC, and the Bayesian Information criterion (SCHWARZ, 1978), denoted by BIC. This procedure allowed us to get a fitted model with the smallest mean square error among the tested models. The resulting piecewise growth model has the following configuration: Gompertz, Gompertz, Exponential and Gompertz model. Based on the fitted model, we get the coordinates of the inflection point and consequently the estimated date for the peak of the pandemic. In addition, using a moving-sums procedure, we project the number of patients who will need treatment in the clinical units and in intensive care units. These pieces of information were sent weekly to the HDC in a technical report format. Until 11/08/20 were sent twenty one technical reports to HDC. However, in this paper, we focus on describing the modeling procedure adopted to obtain the piecewise model. That is, we focus on the four periods in which happened the change in the pandemic's growth behavior. If it is of interest to the reader, the technical reports sent to HDC can be obtained upon request by email to the authors.
The remainder of the paper is organized as follows. In section 2, we present the data and the three growth models considered. Section 3, describes the modeling procedure adopted to get a piecewise growth model. Section 4, presents the projections for the number of patients who will need care in clinical or intensive care units. Section 5 concludes the paper with the final remarks. Additional details are provided in the supplementary material, denoted by the prefix "SM" when referred to in this paper.

Dataset and growth models
Let X t be the number of recorded cases of COVID-19 in the city of Campo Grande on the t-th day, for t = 0, . . . , T = 239, where t = 0 represents the day that the first case was recorded (03/14/20) and T is the last day considered in the study (11/08/20). This dataset is publicly available on the website www.sesau.gov.br. In this period of 240 days, 37, 429 cases of COVID-19 were recorded and 696 people died due to this disease. Besides, of the 37, 429 recorded cases, 36, 017 are considered recovered and 601 people are in home isolation.
Our first analysis consists of visualizing the recorded values by using some plots and by calculating the main descriptive measures. Figure 1 shows the number of confirmed cases at the day t, for t = 0, . . . , T . The highest number of registered cases at a day was 1, 434 cases on 09/11/20. Figure 2 shows the barplot of the frequency distribution of the number of cases confirmed by month. Out of the total cases, 0.80% (299 cases) were registered in the months of March (38 cases), April (90 cases) and May (171 cases), 5.19% (1,942 cases) in June, 22.12% (8,285 cases) in July, 29.21% (10,940 cases) in August, 24.78% (5,347 cases) in September, 14.61% (4,515 cases) in the October and 3.28% (1, 230 cases) in the first eight days of November. Table 1 shows the descriptive statistics for the number of cases recorded per day. The median value is 109.50 confirmed cases, with an average of 155.95 recorded cases by day and a standard deviation (S.D.) of 179.95 cases.    Figure 3 shows the graphic of the moving-average of seven days for the number of recorded cases. In the last 21 days, the moving average value ramained inside the range of ±10% of the average of the moving-average values of this period (dotted lines). Due to this, we consider the confirmed number of cases was stable in this period. The moving-average value on the 139th day (11/08/20) was of 154.86, meaning that in the period from 11/02/20 to 11/08/20 was confirmed, in average, 154.86 case per day. Compared to the moving-average value of seven days ago (157.57 on 11/01/20), there was a reduction of 1.72%. Compared to the movingaverage value of fourteen days ago (161.29 on 10/25/20), there was a reduction of 3.99%.  Figure 14 in Appendix 1 of the SM shows the graphic of the moving-average for the number of death. In the last four days, there was a reduction in the movingaverage value. On 11/08/20 the moving-average value was of 2 death. Compared to the value of seven days ago (2.14 on 11/01/20), there was a reduction of 7%. Compared to fourteen days ago (3.86 on 10/25/20) the reduction was of 58.38%. Figure 4 show the graphic of the accumulated number of cases in the original scale and in the log-scale. Note the fast growth of the accumulated number of cases after the 89th day (06/11/20). Our interest is to model the accumulated number of cases using a nonlinear growth model. To facilitate the modeling procedure, we opt to model the accumulated number of cases in the log-scale.

Growth models
In order to model the data described in the earlier section, consider N t be the accumulated number of COVID-19 cases at time t and f (t|θ) a nonlinear function indexed by parameter θ (scalar or vector), for t > 0. Consider that N t = f (t|θ) and Y t = log(N t ) = g(t|θ), where g(t|θ) = log(f (t|θ)), for t > 0.

Exponential growth model
One of the most knows nonlinear models to describe the growth of an epidemiology disease is the exponential model. Its Equation is given by for θ = (α 1 , α 2 ), where α 1 is the number of cases in the initial time, t = 0, and α 2 is the growth rate, for t ≥ 0. For more details on the exponential model, please see Abramowitz (1965), Thomson (2005) and their references.
Taking the logarithmic transformation on both sides of the Equation (1), one gets the following linearized form of the model, called log-exponential model, for t ≥ 0. Figure 15 in Appendix 2 of the SM shows the graphics of the exponential and log-exponential models for an initial value α 1 = 2 and growth rate α 2 = {0.10, 0.20, 0.30}. Increasing the value of α 2 more inclined is the curve, meaning a fast growth in the number of disease cases. In addition, the curve of this model grows indefinitely regardless of the population size. This can be viewed as a practical problem since the number of accumulated cases is restricted, for instance, to the population size.

Logistic growth model
Consider now the Logistic growth model (BLUMBERG, 1968). Its Equation is given by where θ = (α 1 , α 2 , α 3 ) are the model parameters, for t ≥ 0. In the opposite of the exponential model, this model has an S-shape curve and consequently a growth limit. The parameter α 1 is the upper asymptote. In the context of the COVID-19 the value of α 1 is an estimate for the maximum number of cases. The parameter α 2 is related to the coordinates (t m , N tm ) of the inflection point, where t m = log(α2) α3 and N tm = α1 2 . The parameter α 3 is the intrinsic growth rate at the inflection point.
Taking the logarithmic transformation on both sides of the Equation (3), we get the log-logistic model, (4) Figure 16 in Appendix 2 of the SM shows the graphics of the logistic and log-logistic models for a upper asymptote α 1 = 10, 000, α 2 = 5, 000 and α 3 = {0.15, 0.25, 0.55}. Similar to the exponential model, by increasing the value of the parameter α 3 more inclined is the curve. In that Figure, the symbols • represents the inflection point.

Gompertz growth model
As the third growth model, consider the Gompertz model (GOMPERTZ, 1825;WINSOR, 1932). Its Equation is given by where θ = (α 1 , α 2 , α 3 ) are the model parameters, for t > 0. The interpretation of the parameters is similar to the described for the logistic model. The graphic of the Gompertz model is also a curve with an S-shape. The main difference in relation to the Logistic model is that the curve of the Gompertz model is not symmetric in relation to the inflection point. While the ordinate of the inflection point of the Logistic model is α1 2 , for the Gompertz model this ordinate is approximately 37% of the α 1 value.
Taking the logarithmic transformation on both sides of the Equation (5), we get the log-Gompetz model, for t ≥ 0. Figure 17 in Appendix 2 of the SM shows the graphics of the Gompertz and log-Gompertz models for a upper asymptote α 1 = 10, 000, α 2 = 8 and α 3 = {0.10, 0.20, 0.40}. Similar to the exponential and logistic models, by increasing the value of the parameter α 3 more inclined is the curve. In that Figure, the symbols • represents the inflection point.

Piecewise model
Consider now the fitting of a growth model for the accumulated number of COVID-19 cases recorded in the city of Campo Grande. For this, assume that the log-transformed measures, Y t , are generated according to the following model where g(t|θ) is given by one of the Equations in (2), (4) or (6) and ε t is a random error assumed from a normal distribution with mean 0, variance σ 2 and Cov(ε t , ε t ′ ) = 0, for t, t ′ = 1, . . . , n and t ̸ = t ′ . For more details on the normality assumption, please see (VIEIRA and HOFFMANN, 1977;HSIEH, 2017;ZHAO, 2019).
In order to get the parameter estimates, we adopt the nonlinear least square method. For this, we use the software R (R CORE TEAM, 2020) and the command nls of the package nlstools (FLORENT et al., 2015;PINHEIRO et al., 2020). Besides, we compare the models using as a criterion the mean square error (MSE) and the model selection criteria AIC and BIC, which are calculated according to the following expressions whereŶ t is the estimated value by the model, l(θ|y) = log(L(θ|y), in which, L(θ|y) is the maximum value of the likelihood function for the model and k is the number of estimated parameters in the model. The best model is the one that has the smallest MSE, AIC and BIC values.
Defined the dataset, we fit the three growth models described in Section 2. Table 2 shows the MSE, AIC and BIC values for the three fitted growth models.
The smallest values are highlighted in bold. As one can note, the log-Gompertz model is the best model.  Table 3 shows the estimates and the standard errors for parameters of the log-Gompertz model. The fitted model is given bŷ  However, as can be viewed in Figure 6, the recorded number of cases on the ten days after the date 04/12/20 (from 04/13/20 to 04/22/20) has led to an accumulated number of cases (log-transformed) that did not follow the projections done by the fitted model. Due to this, we insert these ten accumulated values into the dataset D 1 and update the model.   (77 cases). This result made us to conjecture that a new period with a different growth rate from the first 30 days could be beginning.
In order to empirically verify our conjecture, we registered the accumulated values in the next 10 days after the date 04/22/20 (from 04/23/20 to 05/02/20) and plot these values in the same cartesian plan of the updated model, as shown by Figure 7(b). As one can note, our conjecture is very plausible. Due to this, hereafter we adopt the fit of a piecewise growth model. The information on the beginning of a new period of the pandemic's growth, with a more aggressive growth rate, was described in a technical report and repassed to HSC on 05/02/20.

Model fitting 2
Let now the recorded number of cases in the first 60 days since the first case (from 03/14/20 to 05/12/20) in an accumulated way and log-transformed. Consider the separation of the this dataset into two sub-datasets as follows: D 1 = {y 0 , . . . , y 28 } and D 2 = {y 29 , . . . , y 59 }. The choice o the 28th day to separate the two sub-datasets was based on the criterion of smallest MSE among ten fitted models. Please, see Appendix 3 of the SM for more details.
Defined the two sub-datasets, we fit the three growth model for each one of the sub-datasets. Table 4 shows the MSE, AIC and BIC values for the three fitted growth models for D 1 and D 2 . The smallest values are highlighted in bold. For both datasets, the Log-Gompertz is the best model.  Table 5 shows the estimates and the standard errors for parameters of the fitted Log-Gompertz model for D 1 and D 2 . The fitted model is given by the following piecewise model 3.9738 − 3.4417 exp{−0.1365t} , for 0 ≤ t < 29; 5.2288 − 1.3354 exp{−0.0686t} , for t ≥ 29; .  However, as can be viewed in Figure 8(b), the registered values in the 20 days after 05/12/20 indicate the beginning of a third growth period. Again we informe the HSC on the beginning of a third growth period of the pandemic with a more aggressive growth rate. Then, we consider the following model fitting 3.
The fitted model for D 1 remains as describe in Equation (7). For the sub-dataset D 2 , we fit the three growth models and again the log-Gompertz model presents the smaller MSE, AIC and BIC values. For the sub-dataset D 3 was not possible to fit the log-logistic and the log-Gompertz models due to the non-convergence of the estimation algorithm. Thus, a log-exponential model was fitted to the sub-dataset D 3 . Table 6 shows the estimates and the standard errors for parameters of the fitted log-Gompertz model for D 2 and log-exponential model to D 3 . .
The MSE of this fitted piecewise model is 0.0093. Figure 9(a) shows the graphic of the recorded values and the fitted model for a period of 139 days (from 03/14/20 to 07/31/20). Figure 9(b) shows the graphic of the fitted model and the accumulated values registered in the period from 06/11/20 to 07/31/20. As one can note, the registered values in this period do not follow the projection of the fitted model. Following the same procedure, we inform the HSC the beginning of the fourth growth period of the pandemic with a more aggressive growth rate. Then, we consider the following model fitting 4.  The fitted models for D 1 , D 2 and D 3 remains as describe in Equation (8). We fit the three growth models for sub-dataset D 4 . The Log-Gompertz model presents the smaller MSE, AIC and BIC values. Thus, the piecewise model has the following configuration: Log-Gompertz for D 1 , Log-Gompertz for D 2 , Log-exponential for D 3 and Log-Gompertz for D 4 .
After the date 08/15/31, we update the model more five times, on 09/15/20, 09/30/20, 10/15/20, 10/31/20 and on 11/08/20. Figure 12 shows the curves of Figure 11 and the curve of the fitted model on the 239th day (11/08/20). As one can note, there was a significant flattening between the fitted model on 07/31/20 and the fitted model on 08/15/20. However, models fitted after 08/15/20 presented smaller flattening. Due to this, in order to maintain a good visualization of graphs, we opt to plot only the curves of these three fitted models. This last updated model has the following piecewise Equationŝ 3.9738 − 3.4417 exp{−0.1365t} , for 0 ≤ t < 29; 5.2842 − 1.3905 exp{−0.0637t} , for 29 ≤ t < 61; 5.2647 + 0.0350t , for 61 ≤ t < 89; 10.7074 − 4.6035 exp{−0.0222t} , for t ≥ 89; . (9) The peak is estimated to the 158th day (08/19/20). The projection for the maximum number of cases is 44, 685. Appendix 4 present a discussion on the residuals analysis of the fitted model. The information on the flattening of the curve was described in a technical report and repassed to HSC on 08/12/20.

Projections
In addition to estimates for the peak of the pandemic, the fitted model allows to project the number of cases that will be registered in next days. For example, Table 7 shows the projections for the seven days after the date 11/08/20. This Table also shows the registered values, the error (absolute) and the percent error (absolute) of the projections in relation to the real values. The biggest percentage error was 1.4507%. Another interest is to project the number of individuals who will need care in clinical care units and in intensive care units. In order to get these projections, we consider the projected values by the fitted model and a moving-sum procedure, in which, the number of clinical units occupied on a day d 1 is considered available after 5 days. For the intensive care units, a unit is considered available after 14 days. In addition, we consider that 10% of the projected values by the fitted model will needs care in clinical units and 3% in intensive units. Both percentual values were fixed according to the percentage observed in the data. In addition, the projections show that the peak for the number of patients who will need care in clinical and intensive care units has already happened. The peak for the number of patients who would need care in clinical units was estimated for the 169th day (08/30/20). And the peak for the number of patients who would need care in intensive care units was estimated for the 166th day (08/27/20). Since the health public system of Campo Grande city has available 341 clinical units and 157 intensive care units, these results indicated no evidence for the collapse of the public health system. These pieces of information were sent to HDC on 09/11/20.

Final Remarks
In this paper, we describe a study on the evolution of the COVID-19 pandemic in the city of Campo Grande, MS, Brazil. The aim of this study was to find out a non-linear growth model for the accumulated number of COVID-19 cases. However, in the course of the analysis, the data have presented temporal heterogeneity and due to this, we adopt the fitting of a piecewise growth model. The main advantage of this procedure is that it allowed us to obtain a model with the smallest MSE among the tested models.
The modeling procedure was developed by separating the accumulated number of cases registered in a period of 240 days into four sub-datasets. Then, we fitted the three non-linear growth models for each sub-dataset: Exponential, Logistic and Gompertz. In order to select the best model for each sub-dataset, we consider as criteria the MSE, the AIC and BIC. The fitted model has the configuration: Gompertz, Gompertz, Exponential and Gompertz. The resulting piecewise model project the peak of the pandemic for the 158th day (08/19/20). In addition, the projections do not show evidence for the collapse of the public health system of the city.
From a practical viewpoint, these results are very important since the prediction for the peak of the pandemic and projections that indicates the possibility or not for the collapse of the public health system may help government-agents to make a decision in order to try to reduce the transmission of the disease and to avoid the collapse of the health public system. Besides, the results may be used by public agents to show the population the importance to follow the recommendations of specialists from the health field and to maintain social isolation whenever possible.
In the course of these 240 days, were sent to HSC 21 technical reports describing the main results and the projections for the peak and possibility of the collapse of the clinical and intensive care units. In addition, the results of this study were used by many media to inform the population of Campo Grande on the evolution of the pandemic and to emphasize the importance of the collaboration of the population to reduce the transmission.
Although the article does not present any innovative mathematical and/or statistical result, the results showed to be very important for government agents of the city of Campo Grande, especially to justify actions to combat the proliferation of the disease. In addition, due to the need for a quick response, the use of consolidated models and well described in the literature proved to be the best alternative for the moment; since the development and validation of new models, would require a more long period of research. The computational codes used for fitting the models are in the R language and can be obtained by e-mail to the authors.  Figure 14 shows the moving-average of seven days for the number of death. In the last four days, there was a reduction in the moving-average value. On 11/08/20 the moving-average value was 2 death. Compared to the value of seven days ago (2.14 on 11/01/20), there was a reduction of 7%. Compared to fourteen days ago (3.86 on 10/25/20) the reduction was 58.38%.

Appendix 2: Growth models
In this appendix, we present the graphics of the growth models described in Section 2 of the paper. Figure 15 shows the graphics of the exponential and log-exponential models. Increasing the value of the α 2 more inclined is the curve. As a characteristic, the graphic of this model shows unlimited growth.   Figure 16 shows the graphics of the Logistic and log-Logistic models. As a characteristic, the graphic of this model has an S-shape. Similar to the exponential model, by increasing the value of the parameter α 3 more inclined is the curve. In that Figure Figure 17 shows the graphics of the Gompertz and log-Gompertz models. As a characteristic, the graphic of this model has an S-shape. Increasing the value of the parameter α 3 more inclined is the curve. In that Figure

Appendix 3: Sub-datasets
In order to get a point d that best separates the sub-datasets D 1 and D 2 , we adopt the following procedure. Let G = {25, . . . , 34} be a grid from 25 to 34 with increments of size 1. Then, we define the following 10 scenery: D 1 = {y 0 , . . . , y d } and D 2 {y d+1 , . . . , y 59 }, for d ∈ G.
For each one of the 10 scenery, we fit the three growth models to sub-datasets D 1 and D 2 . We select the best model by using as a criterion the AIC and BIC. For all 10 scenery considered, the fitted model has the following configuration: Gompertz for D 1 and Gompertz for D 2 . Table 8 shows the MSE values from ten fitted models. The point d = 28 has lead to a fitted model with the smallest MSE value. Due to this, we set up d = 28 as the separation point for D 1 and D 2 . To separate sub-datasets D 2 and D 3 , we follow the same procedure described above for d ∈ G = {55, . . . , 64}. As can be seen, the point d = 60 has lead to a fitted model with the smallest MSE value. As shown by Figure 9(b) there is a clear separation on the day 88. Due to this, we set d = 88 for separating D 3 and D 4 .

Appendix 4: Residuals Analysis
After fitting the model, it is necessary to verify that the assumptions made are satisfied. In order to verify the normality assumption, we adopt the graphical method based on the qq-plot. This method consists of the plot the percentiles of the sample against the percentiles expected by the adjustment of a normal distribution. Figure 18(a) shows the qq-plot graphic for the residuals of the fitted model (see Equation 9). We also apply the Shapiro normality test, getting a p-value equal to 0.0959. That is, for a significance level {0.01, 0.05} the normality assumption is not rejected. Figure 18(b) shows the graphic of the residuals against order of the data. As one can note, there is no reason to doubt the independence and homoscedasticity assumptions.