GENERALIZED GROWTH CURVE MODEL FOR COVID-19 IN BRAZILIAN STATES

The present paper consists of using the Chapman-Richard generalized growth model to functionally relate the number of people infected by COVID-19 with the number of days. The objective of this work is to estimate the instant that the number of infected people stops growing using the dataset of the accumulated amount of infected. For this propose, one conducted a comparative study of the performances of three models of Richard in eight Brazilian States. In the methodological context, the Gauss Newton procedure was used to estimate the parameters. In addition, selection criteria of the models were used to select the one that best fits the dataset. The methodology used allowed consistent estimates of the number of people infected by COVID-19 as a function of time and, consequently, it was possible to conclude that the projections provided by the growth curves point to a scenario of general contamination acceleration. Besides, the models predict that the epidemic is close to reaching its peak in Amazonas, Ceará, Maranhão, Pernambuco, and São Paulo States.


Introduction
In Wuhan, Hubei province, in December 2019, an outbreak of pneumonia caused by a new coronavirus occurred and it spread rapidly around the world. It has since been identified as a zoonotic coronavirus, similar to SARS and MERS coronaviruses and it was named COVID-19 (LIU et al., 2020). Due to the high epidemic spread power of the virus in a short period of time, on the 30th of January, 2020 the World Health Organization (WHO) declared the outbreak of SARS-CoV-2, a public health emergency of international concern (ZHENG et al., 2020).
In general, COVID-19 is an acute disease with the symptoms of severe infection and possibly curable, but it can also be deadly, with a 2% case fatality rate (XU et al., 2020). In Brazil, the mortality rate from the virus is reported at 4.2%.
In this sense, a new approach is proposed in this epidemiological study to model the evolution of the number of people infected, and the maximum number of people infected with COVID-19, through the Chapman-Richard non-linear model, considering eight Brazilian States. We used as outcome variable, the cumulative number of infected people up to 88 days after the appearance of the first case of the disease.
When we consider the accumulated processes, the objective is looking for sigmoidal functions that best fit the curves. Then, it is appropriate to consider a function P (t) with a sigmoid form, ideally originating in P (0), with a point of inflection occurring early in the one third of the stage before approaching a maximum value, with an asymptote that represents the end of the growth process. For more severe pandemics, this change in curvature can occur closer to the peak point.
Therefore, due to the heterogeneity of the Brazilian population, one decided to build the models by States. However, it is known that there is a high number of underreporting cases in Brazil, due to the unavailability of tests, which can cause problems of estimation of the parameters and prediction of these models.
In this paper, the procedure the Gauss Newton estimation was used to obtain the estimates and inference of the parameters of the Chapman-Richard growth model, in which the value of form parameter m was fixed at 0.5, 1.0, 1.5 and 2.0. In addition, selection criteria of the models were used to select the one that best fits dataset.
It is important to note that, in these models, it was not possible to incorporate some factors compatible with reality, such as isolation rate, disease incubation, among others. It is important to note that, in these models, it was not possible to incorporate some factors compatible with reality, such as isolation rate, disease incubation, among others. The presentation and discussion of the methodology of the statistical analysis study were developed with data available at https://covid.saude.gov.br/, until May 23rd, 2020. All analyzes were performed using the computational programming environment R CORE TEAM (2009), whose free version can be found at www.r-project.org/.
This paper is organized as follows. Section 2 defines the methods of estimation and criteria for selecting models. Real data applications considering the number of cases accumulated by  in Brazilian States are illustrated in Section 3. Finally, Section 4 draws some conclusions on the relevant results achieved.

Estimation methods
In order to model the phenomenon of the growth curve of the number of people infected by COVID-19, one opted for the Chapman-Richard non-linear model in the genuine and modified forms. The model considered is a well-known generalization of the logistic and Gompertz models. Its main advantage is that this model generally provides a more realistic description of various phenomena, however, it is little explored because it presents several difficulties in the procedure of non-linear estimation (AMARAL, 2009).
The Chapman-Richard nonlinear model is described by: where P (t) represents the average number of infected individuals; t is the number of days; a is the parameter that represents the asymptotic value of the model, i.e. the point of growth of cases free of seasonal variations; c is an constant without biological interpretation that is related to the initial cases of COVID-19; k is the growth rate of the variable of interest that determines the efficiency of the disease's growth; and m the parameter which defines the shape of the model's curve and, consequently, the point that it starts to grow less efficiently. As a consequence, for the case of COVID-19, the parameters k and c are determined by performing several fittings on the growth rate-trends of infection capacity of the viruses that mainly affect the respiratory system (SONNINO, 2020). Figure 1 shows the influence on the growth of the curve for the different values of the parameters a and k respectively. In order to construct these curves, one assumes that the shape parameter takes on the following values a = 350, 370, 390, 410, 430 and 450 and the growth rate k = 0.030, 0.035, 0.040, 0.045 and 0.050.
It is worth mentioning that Von Bertalanffy and Brody models are special cases of Chapman-Richard model for the inflection point m = 3 and m = 1 respectively. Several authors have reported difficulties in fitting for the function of Richards (BATES & WATTS, 1988;BROWN et al., 1976). Due to the difficulty of convergence of the parameters of the models, SARMENTO et al. (2006) reported that in applications using this model the convergence in the iterative process was not reached in approximately 50% of the fit attempts. Consequently, this difficulty was attributed to the need to estimate one more parameter with this model, and mainly due to the negative correlation between c and m. In this context, one will fit this model by fixing the values of the parameter m at 0.5, 1.0, 1.5 and 2.0; for the other parameters, adequate initial values were found, through which the convergence of the iterative procedure was achieved.
In many practical applications, the linearization procedure is adopted; this can be done by applying the logarithmic transformation in the model, Y (t) = ln(P (t)). In this way, the general model can be written as: where β can be defined as a vector of parameters, β = (β 0 , β 1 , β 2 , β 3 ) T = (ln(a), b, c, m) T , and e t is an independent random error, normally distributed N (0, τ −1 ), with variance σ 2 > 0 and τ = 1/σ 2 (FEKEDULEGN & MAC SIÚRTÁIN, 1999). The transformation of a non-linear model into a linear model in the parameters facilitates the fit process; however, it implies unrealistic assumptions such as normality and homoscedasticity of the errors. For the general model described in (1), using the natural logarithm takes us to the model given by: The use of transformations for linearization of non-linear models becomes more critical when the estimates obtained using the linearized models are made for the transformed parameters and not for the original parameters of the model. In these cases, information about the standard errors of the original parameters is lost and this results in great difficulties in inferring the original parameters and, in general, makes it impossible to calculate confidence intervals and test hypotheses about the original parameters of the model (a broader approach of this model). For more details, see MYERS (1990).

Gauss-Newton method
The Gauss-Newton method is a particular case of the weighted least squares method, also known as the linearization method. This method uses a Taylor series expansion to approximate the nonlinear regression model with linear terms and then applies ordinary least squares to estimate the parameters. Iterations of these steps generally lead to a solution to the problem of nonlinear regression.
The starting point of the Gauss-Newton method is to find initial values for the parameters β 0 , β 1 , . . . , β p , denoted by β p . These initial values can be obtained by means of previous studies known in the literature (AMARAL, 2009). With the initial values of the parameters, the expected function h(x, β) is approached to the desired power by linear terms of the Taylor series expansion, around the initial values β (0) . Thus, the general solution for the i-th case is given by: Note that the derivatives of h(t i , β) are valued at β j = β (0) j , and β j − β (0) j represents the difference between the true parameters of the regression and their initial estimates. Thus, the regression coefficients represent a correction that must be made in the initial regression coefficients.
Passing h(t i ; β (0) ) to the left side and denoting γ j = β j − β (0) j , we have the equation (4) rewritten as: where y is a vector with y i = h(t i ; β). Therefore, it is possible to estimate the parameters β by the method of ordinary least squares, considering as initial value of γ = (γ 1 , . . . , γ p ) T obtained from the equation (5): providing a first estimate for the parameter vector β = (β 1 , . . . , β p ) T , given by: One can formally affirm that the vector of the estimates of s-th iteration, defining by β (s) = (β (s) 1 , . . . , β (s) p ) T , it is given by β (s) = β (s−1) + γ (s−1) , that is: At this point, verifying the corrected regression coefficients represents an improvement in the appropriate direction. Any inference about the parameter estimates from the Gauss-Newton algorithm is based on the covariance matrix asymptotic regression, given by σ 2 (W T W ) −1 . Thus, we have: where σ 2 is the mean quadratic residue given by We denote by SS (s) R the sum of the squares of the residuals in the iteration s. This sum of squares in the s-th iteration is given by: and it is updated to the last iteration.
If the Gauss-Newton algorithm is in the right direction, SS (s) R must be less than SS (s−1) R . In this case, the process is repeated until the desired convergence is verified. It is worth mentioning that the choice of initial estimates in the Gauss-Newton method is very important, as a bad choice can result in a very large number of iterations, often not converging.

The use of fractional increments
A drawback of the Gauss-Newton procedure is the fact that, in some practical problems, the increment in the γ can be very small in some cases causing very slow convergence. In more severe situations, the algorithm may follow in the wrong direction and not converge. To overcome these difficulties, an increment strategy presented in MYERS (1990) is given by: Gauss-Newton algorithm with fractional increments continue on to the next iteration using β (s) ;

Practical aspects
To implement the Gauss-Newton algorithm, we consider the model and the Taylor series expansion given in equations (3) and (4) to calculate the elements w ij (i = 1, 2, . . . , n and j = 1, 2, 3, 4) of the matrix W . Table 1 presents the elements w ij of the matrix W of the Chapman-Richards model.

Model selection
The choice of criteria to select the growth model that best fits the dataset is an important step in the proposed methodology. Those criteria become necessary when several models are fitted to the same dataset, because the model adopted must be the one that best predicts the dependent variable according to the studied biological reality (AMARAL & PADOVANI, 2020). Given the above, for illustrating the phenomenon studied, we adopt two statistical indicators, the Akaike information criterion (AIC) (AKAIKE, 1974) and the Schwarz Bayesian criterion (SBC) (SCHWARZ, 1978), that under the hypothesis of normality and independence residual, are given respectively by AIC = n[ln(2πSS (s) R ) + 1] + 2(p + 1) and where n is the sample size and p is the number of free parameters.

Real data analysis
From the data available at https://covid.saude.gov.br until May 23rd, 2020, we presented the results of the model fitting for the data accumulated of the number of cases by COVID-19 in the Amazonas, Bahia, Ceará, Maranhão, Pará, Pernambuco, Rio de Janeiro, and São Paulo States. The purpose of this section is to determine the parameter estimates and compare the efficiency of the Chapman-Richard model in estimating the number of contagions considering different values for the parameter m. In addition, we provide long-term predictions, such as the estimated peak dates for each State, as well as predict when the pandemic will reach contagion stability to contribute with the conduct of intervention policies relevant to the control of the epidemic.
For the knowledge of the growth of the pandemic phenomenon in the eight Brazilian States, we show in Table 2 an overview of the number of infections of the new coronavirus until May 23rd, 2020. And through it, it can be seen that the first case of the new coronavirus in Brazil occurred in São Paulo State on February 26th, and since then, the number of cases has been expanding in an overwhelming way across the country. According to the survey, most of the cases are concentrated in São Paulo State, with 80,558 infected, followed by the Ceará State, which despite presenting the first case of the disease in the twelfth epidemiological week, is already the second State with the highest number of infections (35,122), and Rio de Janeiro State, which registers 34,533 cases. The Bahia State, although it started to spread in the tenth epidemiological week, is among the States with the lowest incidence of infection of the disease. Under this focus, the description of the data presented, corroborate with the indication made in the methodology to use the accumulated trajectory of the pandemic in the analysis of the non-linear growth model to predict the data. For the fit of the model, we considered the population of each State, presented in Table 2 to calculate the number of daily cases of infected people accumulated P (t) per hundred thousand inhabitants.
Giving the analysis segment, the estimates and confidence intervals (95%) of the parameters of the Chapman-Richard model for the eight Brazilian States are presented in Table 3. In the process of fitting the models, we fixed the shape parameter m, m = 0.5, 1.0, 1.5 and 2.0; for the other parameters, we provided initial values, through which the convergence was reached.
For the Bahia State, it is worth mentioning the results presented in Table 3, from where it was possible to notice that the amplitude of the confidence interval for parameter a (asymptote of the curve of infected) is large, and classical hypothesis tests would reject the significance this parameter. From a more detailed analysis of the data for this State, it was observed that between the May 18th and May 22nd, the cumulative number of cases presented a variation above 50%, leaving a total of 8, 581 cases for 13, 000 cases. This variation in a short period caused a high variance for the estimator of the parameter a. The fitted model considering the data until May 18th presents a significant estimate for this parameter (â = 305.297 and s.e. = 47.681); however, we chose to consider the fitting of the model with the data until May 23rd for reflecting the current situation.
The selection criteria for the fitted models are presented in Table 4. The results indicate that the AIC and BIC criteria were decisive in favor of the fitted Chapman-Richard model considering the form parameter m = 2.0 for almost all States, except Amazonas, in which the selected model was with m = 1.0. Accordingly, based on these results, the analyzes were performed with the selected models. Next, we provide an interpretation with the fitted model for the data for each state.

Results for Amazonas State
In Figure 2, we presented the number of new daily cases and the number of accumulated cases. In graph 2a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 2b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 6.081) and the coefficient of determination r 2 = 0.998. We note in this Figure that Amazonas State is close to the peak of pandemic, and that the cumulative number of cases has already exceeded the first inflection point of the curve for new daily cases, indicating a decrease in the growth in the number of new cases. This behavior must continue until the peak is reached. The fitted model for this State resulted in an estimate of the growth rate ofk = 0.099, which is high, indicating a fast growth. Besides, the estimates asymptote wasâ = 2, 200, giving an idea of the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 91,181 total cases).
In addition, Figure 2 also shows the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data). When comparing the graphs of Figure 3 with those of the other States, it was seen that the progression in the number of cases is less. The growth is currently upward in the capital; however, in the interior of Bahia, this growth is more controlled, which reduces the rate of the number of cases in the entire State. Note that, in Table 3, the value of the parameter determines the growth efficiency of the disease k is median (k = 0.039). This current scenario presumably occurred due to public policies imposed by the government to contain the dissemination, which is reflected in the curves and in the distance from the peak of contamination.

Results for Bahia State
Besides, Figure 3 also shows the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data). From the fitted model for the data from Bahia, we haveâ = 3, 258 which corresponds to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 484,564 total cases).

Results for Ceará State
The number of new daily cases and the number of accumulated cases in Ceará State are shown in the graphs in Figure 4. In graph 4a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 4b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 4.238) and the coefficient of determination r 2 = 0.998.
The epidemic curve in Ceará State continues to rise. Note that the cumulative number of cases exceeded the first inflection point of the daily new cases curve. Its accelerated growth is evident in graph 4b and in the value of parameter k that determines the growth rate of the disease, which is above 0.05 (Table 3). We also presented in Figure 4 the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data). The fitted model for the data of Ceará State provides usâ = 1, 617, which corresponds to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 147,666 total cases).

Results for Maranhão State
In Figure 5, we presented the number of new daily cases and the number of accumulated cases in Maranhão State. In graph 5a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 5b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 2.045) and the coefficient of determination r 2 = 0.999. The projection of the evolution of the epidemic in Maranhão State predicts in relation to the beginning -in which growth in this State was accelerated -a decrease in the growth rate of the numbers of accumulated cases and, consequently, of the number of new daily cases ( Figure 5). In this State, the peak is predicted for the second half of May. This delay was possibly due to the express measures of restriction of movement of people across the State.
Similarly, the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data) are also provided in the graphs of Figure 5. Based on the fitted model for tha data of Maranhão State, we haveâ = 1, 329, which corresponds to the total member expected of cases per hundred thousand inhabitants until the end of the pandemic (or 55,082 total cases).

Results for Pará State
In Figure 6, we presented the number of new daily cases and the number of accumulated cases in Pará State. In graph 6a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 6b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 1.903) and the coefficient of determination r 2 = 0.999.
With the analysis of the graphs of this Figure, it is possible to verify that Pará State is far from the epidemic peak. Temporary preventive measures with the lockdown implemented in ten cities in the State -whose average number of infected people was 50% higher than the average of the State (AGÊNCIA BRASIL, 2020) -probably slowed the growth of the epidemic curve of State in the recent days. However, the model demonstrates that the growth rate k is still high (k = 0.060) until the present date of data collection. Besides, Figure 6 also shows the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data). From the fitted model whit the data of State Pará, we haveâ = 2, 226, which corresponds to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 191,500 total cases).

Results for Pernambuco State
The number of new daily cases and the number of accumulated cases in Pernambuco State are shown in the graphs in Figure 7. In graph 7a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 7b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 3.911) and the coefficient of determination r 2 = 0.997.
For this State, we noticed a high acceleration in the growth of the number of cases, and the peak is projected to occur soon and that, soon after, there will be a significant reduction in the number of new cases. This fact is confirmed by the estimate of asymptotic parameter a, which is approximatelyâ = 735, corresponding to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 70,244 total cases). From the analysis of the graphs presented in Figure 8, an expressive acceleration in the number of contagions is expected. Note that, in Table 3, the asymptotic parameter a of the growth curve was estimated at approximately 2,761, which corresponds to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 476,685 total cases). As the contagion rate is considered to be median (k = 0.036), it is projected that the time to end the pandemic will be long, as shown in Figure 8.
Similarly, Figure 6 also shows the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data).

Results for São Paulo State
São Paulo State is considered the epicenter of the pandemic in Brazil, where most of the cases are concentrated, leading to the highest infection rate. The number of new daily cases and the number of accumulated cases in São Paulo State are shown in the graphs in Figure 9. In graph 9a, the red line represents the expected number of new daily cases obtained with the derivative of the fitted model. In graph 9b, the red line represents the expected number of accumulated cases. The quality of the fit was assessed with the square root of the mean square error (RSM E = 2.065) and the coefficient of determination r 2 = 0.998.
The estimate of the growth rate k of the number of cases presented in Table  3 is higher than 0.05, which justifies the accelerated growth of the total number of cases. We also presented in Figure 9 the prediction for the next 150 days (from May 23rd) for the number of new daily cases and the cumulative number of cases (red line exceeding the observed data). The fitted model for the data of São Paulo State provides usâ = 455, which corresponds to the total number expected of cases per hundred thousand inhabitants until the end of the pandemic (or 208,932 total cases).

Peak and the end pandemic prediction
The peak of the pandemic is predicted by calculating the maximum point of the prediction curve for new cases daily (red line of the graphic to the left of Figures  2-9). The prediction of the end of the pandemic can be obtained by calculating the time in which the cumulative number of cases reaches the expected value of the total number of cases (given by the estimate of parameter a). Another alternative is to calculate the time taken to account for 97% or 99% of the expected value of the total number of cases.
In Table 5, we present the prediction of the date (Date), number of days (N Days), the accumulated number of infected cases (N Infected) with the respective standard error (s.e.) for two moments of interest: the day peak and end date of the pandemic. The standard error, associated with the cumulative number of cases, was obtained considering the variance of the residual of each model (given by equation 8).
We can see in Table 5 that the São Paulo State will reach the peak around 94 days after the notification of the first case (that is, on May 29th, 2020), while the prediction of the accumulated number of cases will continue to grow more slowly if the rate of growth (parameter k) remains at the estimated value. A possible decrease in this rate can occur if social isolation control strategies are adopted. Other States worth mentioning are Amazonas and Pernambuco. It is projected that for these States the peak will occur in 78 days after the start of the infection. In addition, to the end of the pandemic, these States will have 91,336 and 68,654 cases, respectively. The peak for Ceará and Maranhão States are forecast for the month of June, while the Bahia and Rio de Janeiro States are forecast for July.
For Bahia and Rio de Janeiro States, the fitted models were strongly influenced by high values of the observations on the last days considered in the study. Therefore, it is expected that an increase in the number of observations may provide better predictions in the future.

Conclusions and final considerations
In this paper was presented a study of the State of the pandemic that affects the Federative Republic of Brazil, the Gauss-Newton method of non-linear estimation was used, aiming to establish the models of growth dynamics considering the current situation of eight States in Brazil.
The statistical methodology used to estimate the parameters of the Chapman-Richard model of pandemic growth, the fitted model was complemented with the construction of the confidence intervals to describe all the dynamics of the curve of the officially infected and also provide elements so that peak day and, the end of pandemic providing indicative for effective health, social and economic actions.
In this line of study, one found that the projections provided by the growth curves point out a scenario of general acceleration of contamination, with nuances of heterogeneity in terms of speed of the growth in the considered States. It must be considered that this reality is already configured, as São Paulo has the largest State population and the largest number of infected people. Bahia State it has the lowest density of infected people (1.754/100,000 inhabitants). For all the States in this study, the peak day is projected soon. In this sense, it is very interesting to highlight that the growth models obtained are fundamental for a better understanding of this phenomenon and provide effective and consistent subsidies for decision-making and interventions in public health and planning of socio-economic activities for the return to normality.
As for the structures of the analyzed models, the results show that the model with the shape parameter with m = 2 was the most efficient in treating the evolution of the pandemic in seven States, as they provided statistically significant estimates and confidence intervals and their parameters were adequate for the biological interpretation of the data. It is worth noting that the methodology used in the present study is sensitive to intervention strategies and control of disease proliferation.
In summary, the models predict that the epidemic is close to reaching its peak in the Amazonas, Ceará, Maranhão, Pernambuco and São Paulo States. The indications for the forecasts presented were based on a social isolation of around 45% and that, if efforts to contain the disease continue according to the different presented scenarios, the heterogeneous behavior presented by the curves in the considered States must remain and, certainly, government actions should be carried out on a regional basis.
Finally, as a proposal for future work, other growth models can be adopted and the Bayesian approach can be used to make comparisons with the present study.