Bayesian analysis of the growth of pears in the cultivate Shinseiki using non-linear models

The pear of the Pyrus genus is the third most consumed fruit in Brazil, however, internal production is low and most demand is obtained through imports. Brazil has favorable conditions for its cultivation and one of its main economic sectors is agriculture, being responsible for generating a large number of jobs and income for the country. In this sense, studies related to this fruit can provide information and encourage its production. Using non-linear models, it is possible to identify growth patterns such as fruit length and diameter, which can help determine the ideal harvest point. Therefore, the goal of this article was to compare the Logistic and Gompertz non-linear models in describing the growth curves in diameter and length of the Asian pear tree using the Bayesian approach. The results indicated the Logistic model as the best to describe these two variables and provided asymptotic averages of 70.0607 mm. and 79.706 mm. for length and diameter, respectively.


Introduction
Including more than 20 different species, the pear of the Pyrus genus is the third most consumed fruit in Brazil, after apple and peach.Among them, four species with edible fruits stand out in commercial cultivation: Japanese pear (P.pyrifolia Nakai), European pear (P.communis L.) and Chinese pear (P.bretschneideri Rehd and P. ussuriensis Maxim.)(Jiang et al., 2009;Lombardi et al., 2000).
The biggest producer of pears in the world is China, corresponding to 67.3% of its world production.Its fruit belongs to temperate climate fruit, so the south of Brazil has favorable soil and climate conditions for its cultivation, growth and productivity, concentrating about 98% of national production.Although very consumed in Brazil, internal production corresponds to a small portion of its demand, with the remainder being imported mainly from countries such as Argentina and Chile.The cultivated regions of Brazil have similar environmental conditions to these countries, in addition to having adequate technological and storage infrastructure, evidencing their potential to increase their productivity (FAO, 2017; Lopes et al., 2011;Pasa et al., 2015;Mello, 2013;IBGE, 2019;Rufato et al., 2021).
In addition to having favorable production conditions, one of the main economic sectors in Brazil, is agriculture, which is responsible for a great generation of jobs and income to the country.Fruit growing is prominent in this sector, providing the country with the third place in the world in the production of dried fruits, thus, imports may represent a lesser development in this sector.(KIST, 2018).Among the reasons for the low production of pear trees in Brazil, we can mention the deficiency of adequate rootstocks, excessive vegetative growth, management techniques and also fertilization (Brunetto et al., 2015b;Brunetto et al., 2015a;Machado et al., 2015;Carra et al., 2021;Rufato et al., 2021).Other factors such as the time of harvest should also be considered in studies to provide information and encourage production.Often the point of harvest is identified subjectively, that is, without using a consensual standard, which can cause post-harvest damage if sizes and quality standards are not met for commercialization (Ribeiro et al., 2017;Cavalini et al., 2006).Some chemical, physical and physiological methods can be considered to establish the ideal point.Among these, growth patterns such as fruit length and diameter can help determine the appropriate harvest point, considering whether the fruits can complete their maturation under refrigeration, or if they are harvested ripe and ready for immediate consumption, as is the case with the Asian pear (Nakasu et al., 2007;Fioravanço & Antoniolli, 2016).The study of growth patterns can be done through non-linear models.The growth of a plant or an animal is an inherently non-linear biological process, that is, they are usually faster in their initial phase, decreasing speed later and tending to stability in the adult phase, whose graphical representation gives rise to a sigmoid curve, which suggests the term growth curve.In this sense, non-linear regression models are generally adequate to describe growth curves, as they present parameters with biological interpretation, which allow a greater understanding of the growth process, such as asymptotic length or diameter and growth velocity (Prado et al., 2013).
The literature suggests numerous non-linear regression models to describe growth curves: Logistic, Gompertz, Brody, Von Bertalanffy and Weibull, among others.The non-linear Gompertz and Logistic models were fitted with satisfactory results to describe the growth of green dwarf coconut fruits (Prado et al., 2013), pear fruits (Ribeiro et al., 2017), coffee fruit (Fernandes et al., 2014), Palmer mango fruit (Dias et al., 2014) and pequi fruit (Ribeiro et al., 2018).Note that the other models mentioned above do not receive as much attention in the context of fruit growth, as some of their characteristics may not be attractive for this purpose (Santos et al., 2019;Fernandes et al., 2019).
The parameters of these models can be estimated by several methods, classical or Bayesian.In the parametric Bayesian methodology, the information from the sample data, described by the likelihood function, is combined with the researcher's prior knowledge regarding the parameters, which is described by the distribution, thus generating a joint distribution of the parameters using Bayes' theorem.From the distribution, it is possible to obtain statistical summaries of interest of the parameters, such as mean, mode and median, as well as intervals with a given probability of occurrence of the true value of the parameter.(Guedes et al., 2005;Paulino et al., 2003).The aim of this paper is to compare the non-linear Logistic and Gompertz fitted models in the description of the diameter and length growth curves of the Asian pear tree (Shinseiki cultivar) through the Bayesian approach.

Data set
The experiment was carried out in a commercial orchard in the city of Canguçu, in the state of Rio Grande do Sul, Brazil, from September 15th, 1997 to February 3rd, 1998.10 plants, with five flowers per plant, were marked, in the branches of the external and median region of the pear trees.At 14-day intervals, 20 fruits were collected among the 10 marked plants (two per plant), and the diameter (mm) and length (mm) measurement was evaluated using a digital caliper.In addition to these variables, other physicochemical growth measurements were also taken in the laboratory: fresh matter weight, dry matter weight, pulp firmness, total soluble solids and total titratable acidity.For the variables diameter and length, considered in this study, 11 evaluations were performed in total.

Non-linear models
The nonlinear regression model, according to Seber & Wild (2003), can be defined as follows: where y i is the response variable, f (x i , β) is a nonlinear function, with known form and that depends on the explanatory variable x i and the vector of unknown parameters β, and ϵ i is the assumed independent and identically distributed experimental error of a normal distribution with mean 0 and variance σ 2 , that is, ϵ i iid ∼ N(0, σ 2 ).To describe the growth of diameter and length, the nonlinear Logistic and Gompertz models were used, respectively given by: where y i is the length or diameter of the fruit in mm.; x i is the time in days; β 1 > 0 represents the horizontal asymptote, i.e. the point of stabilization of growth in length or diameter; β 2 > 0 does not present a direct biological interpretation, being a location parameter and is related to the inflection point of the curve; β 3 ∈ R. is the rate at which the length or diameter reaches its asymptotic value and ϵ i iid ∼ N(0, σ 2 ).

Bayesian inference
The uncertainty regarding the unknown quantity θ is modeled by the distribution f (θ).This distribution represents the knowledge about the possible values of the unknown parameters under study before the data are observed, should include all plausible values for the same and should reflect the knowledge that the researcher has about the parameter under study (Gelman et al., 1995).The likelihood function L(θ) can be seen as the representation of what the data has to tell us about the parameter θ, updating your knowledge.
The inference about the parameters is performed through the distribution of the parameters π(θ|Data), which is obtained using Bayes' theorem, given the likelihood and a distribution, as follows: , where π(θ|Data) is the joint posterior distribution of the vector of parameters given the data; L(θ) is the likelihood function; π(θ) is the joint prior distribution for the parameter vector θ and π(Data) is the marginal distribution of y.As 1 π(Data) does not depend on θ, it results in a normalizing constant of π(θ|Data), then we have: Samples of the joint posterior distribution can be obtained using Markov chains that have the posterior distribution as limiting distribution, and which can be obtained using, for example, the Gibbs sampler (Casella & George, 1992) or the Metropolis-Hastings algorithm (Hastings, 1970).From these samples, a summary could be obtained for each parameter of interest and information such as the mean of the posterior distribution, which represents the expected value of θ given the data, or the mode, which represents the most likely value.
The posteriori (4) is analytically intractable, therefore, we used the Monte Carlo Hamiltonian algorithm from the rstan package in R software (R Core Team, 2023).Samples of the posterior distribution of model parameters were obtained from 3 chains and 11,000 iterations each, where the initial 1,000 values were discarded (burnin), so that the chains are not influenced by them.In addition, we considered an interval equal to 10 between each iteration (thin), in order to obtain an approximately independent subsample.The R code is described in the Appendix 1.

Convergence and model checking
For the convergence analysis of the chains, we use the R convergence metric and diagnostic plots of the MCMC sampler for each parameter.Lack of divergences and R close to 1 indicate no problem for each individual posterior fit (recommended cut-off is 1.05), see for instance, Vehtari et al. (2021).To verify the adequacy of the model, residual analysis was performed using plots of standardized residuals vs. predicted values and normal probability plots, these being given by: where μi are the predicted values and var(μ i ) is the estimated standard deviation.We compared the predictive accuracy of the models by the expected log pointwise predictive density (elpd) measure.In general, cross-validation is used to approximate the estimation of the elpd.
Here we use the leave-one-out cross-validation (LOO) method, calculated using Pareto-smoothed importance sampling (PSIS), a procedure for regularizing importance weights (PSIS-LOO), more robust in finite cases with weak priors or influential observations and with low computational cost.The method called elpd loo proceeds as follows: an observation is left out and the model is estimated again without it, the process is repeated for all observations in the set (see Vehtari et al. (2017)).This amount is obtained using the loo function of the LOO package in R software.

Results and discussion
In this section, we present the results of the Bayesian analysis.Tables 1 and 2 show some summaries for each parameter: means, standard errors (SE), standard deviations (SD), 95% credibility intervals (CI) and R for the Logistic and Gompertz models for length and diameter.The values of elpd loo and the metric R could be seen in the tables.For both variables, the Logistics model obtained the larger elpd loo values, indicating that it may be the most adequate for these data.The values of R are very close to 1, indicating a good convergence of the chains, which can also be verified by the Figures 1 and 3 for the length and 2 and 4 for the diameter.It is noted in the figures that both chairs obtained stationary behavior, indicating a good convergence.Additional chain plots can be seen in the Appendix 2.      The adequacy of the Logistics models can be verified through the standardized residuals (Figure 6 and 7).The plots of the residuals versus the observation index, Figures 6a and 7a, reveal that the residuals have a random behavior within the interval [-3,3], and Figures 6b and 7b indicate that the residuals approximately follow a distribution standard normal.Therefore the Logistic non-linear model can be considered suitable for the current data sets.

Conclusions
This article compared the Logistic and Gompertz non-linear models in the description of the diameter and length growth curves of the Asian pear tree (Shinseiki cultivar) through the Bayesian approach.The two models obtained good chain convergence considering the two variables under study, however, the elpd loo criterion indicated the Logistic model as the best.The suitability of this model was also confirmed by residual analysis.Finally, we could obtain the following results: the asymptotic average length of the fruit is 70.0607 mm and the diameter is 79.706 mm.

Appendix 2. Chain plots
Here, we present the chain density curves of the parameters of the Logistic and Gompertz nonlinear models for the length and diameter of the pears.

Figure 1 .
Figure 1.Chain plots of the parameters of the Logistic non-linear model for the length of the pears.

Figure 2 .
Figure 2. Chains plots of the parameters of the Logistic non-linear model for the diameter of the pears.

Figure 3 .
Figure 3. Chains plots of the parameters of the Gompertz non-linear model for the length of the pears.

Figure 4 .
Figure 4. Chains plots of the parameters of the Gompertz non-linear model for the diameter of the pears.

Figure 5 Figure 5 .
Figure5shows the fitted of the Logistic and Gompertz non-linear model for diameter and length for the pear dataset.It is notable that both variables show a sigmoid behavior.We can verify that the length and the diameter of the fruit reach the stabilization of the growth around 125 days.According to the values of elpd loo we can recommend the Logistic non-linear model.From this model, we can

Figure 6 .Figure 7 .
Figure 6.Plots of standardized residuals from the Logistic model for the pear length models: (a) Index and (b) Normal probability plots.

Figure 8 .
Figure 8. Chain density curves of the parameters of the Logistic non-linear model for the length of the pears.

Figure 9 .
Figure 9. Chain density curves of the parameters of the Logistic non-linear model for the diameter of the pears.

Figure 10 .
Figure 10.Chain density curves of the parameters of the Gompertz non-linear model for the length of the pears.

Figure 11 .
Figure 11.Chain density curves of the parameters of the Gompertz non-linear model for the diameter of the pears.

Table 1 .
Means, standard errors (SE), standard deviations (SD), 95% credibility intervals (CI) and R for each parameter and LOOIC for the Logistic and Gompertz models for the length of the pears

Table 2 .
Means, standard errors (SE), standard deviations (SD), 95% credibility intervals (CI) and R for each parameter and LOOIC for the Logistic and Gompertz models for the diameter of the pears