Using Bayesian growth models to predict grape yield

Background and aims: Seasonal differences in vine yield need to be managed to ensure appropriate fruit composition at harvest. Differences in yield are the result of changes in vine management (e.g., the number of nodes retained after harvest) and weather conditions (in particular, temperature) at key vine development stages. Early yield prediction enables growers to manage vines to achieve target yields and prepare the required infrastructure for the harvest. Methods and results: Bunch mass data was collected during the 2016/17, 2017/18 and 2018/19 seasons from a commercial vineyard on the Wairau Plains, Marlborough, New Zealand (41o2’23”S; 173o51’15”E). A Bayesian growth model, assuming a double sigmoidal curve, was used to predict the yield at the end of each season. The accuracy of the prediction was investigated using the Monte-Carlo simulation for yield prediction at various growth stages assuming different prior information. Conclusion: The results show that the model is sensitive to prior assumption and that having a non-informative prior may be more beneficial than having an informative prior based on one unusual year.


INTRODUCTION
Seasonal differences in vine yield need to be managed to ensure appropriate fruit composition at harvest. Excessive yields can slow fruit development and result in infrastructure problems (e.g., inadequate tank space) at harvest. Weather conditions at flowering can cause significant differences in the percentage of flowers setting berries, hence affecting bunch mass (Trought, 2005). The ability to predict harvest bunch mass shortly after flowering is an essential tool for predicting vineyard grape yield. Grape berries exhibit a double sigmoid growth curve (Coombe, 1976). The initial phase (Phase I) is predominantly the result of cell division and expansion, which is followed by a slowing down in growth (fresh mass) phase (Phase II), in which the seed matures. This slow growth ends at veraison, with the onset of another period of berry growth, cell enlargement and metabolic changes (e.g., sugar accumulation and reduction in acidity (Phase III) (Tello and Forneck, 2018).
Sigmoid curves are often used to model various aspects of biological growth (Archontoulis and Miguez, 2015). The double sigmoidal curve, which combines two individual sigmoidal curves, is often used to model crop growth. Salisbury and Ross (2000) (cited by Parra-Coronado et al. (2016)) stated that fruits, such as raspberries, grapes, blackberries, olives and stone fruit, exhibit double sigmoidal growth behaviour. Galassi et al. (2000) produced curves for peach diameter and mass for cultivars in Ferrara and Rome in Italy. Fernandes et al.. (2017) compared various specifications of single and double logistic and Gompertz curves to determine which produced the highest adjusted-R 2 values, concluding that the double sigmoidal curve (rather than the single sigmoids) was the optimal one. Parra-Coronado et al. (2016) specified a mathematical model for pineapple mass which takes into account altitude, as well as Growing Degree Days.
While Tarara et al. (2009) used a single sigmoid curve to monitor the response of grape mass per shoot and fruit to vegetative mass per shoot with respect to Growing Degree Days over five growing seasons, Coombe (1976) provided an intuitive rationale behind the use of the double sigmoidal curve for modelling the development of fleshy fruits, including grapes. The proposed curve incorporates periods of rapid growth intersected by a period of minimal growth and reflects the various stages of berry ripening well (Coombe and McCarthy, 2000), as illustrated in Figure 1. Phenologically, three stages are involved (Coombe, 1992). The first phase involves fruit set, when the grape inflorescences turn into grape berries; a period of rapid grape berry growth can be observed as cell division occurs (Parker, 2012). The second stage involves a lag phase, during which the berry growth slows and seed fresh mass reaches a maximum. The end of this stage (veraison) is followed by ripening (Trought, 2005), as sugars and water accumulate inside the grape berries, coinciding with a second period of rapid growth until, ultimately, the final yield is reached. The model described above can be implemented in various ways. Here, the use of a Bayesian analytic framework is proposed. Bayesian inference is becoming more popular as it is flexible in terms of model formulation, allows for easy incorporation of expert opinion via prior distributions, and, being based on Bayes' formula, can be easily updated continuously. The latter aspect is especially useful, enabling incorporation of additional information into the model to produce predictions, which not only reflect the initial conditions, but also management decisions and natural factors experienced during the growing season. The Bayesian framework is also good at handling missing data. This means that predictions can be Rory Ellis et al. made without requiring all of the information sources to be present at any particular time point. Should particular kinds of information not be available, the Bayesian model is simply able to integrate these missing components, updating itself instead using the information which is provided (Gelman et al., 2014, Chapter 18).
When placing modelling within a dynamic context it is imperative to consider the Value of Information (VoI) aspect. Intuitively, the more data that is available, and the later into the season it has been collected, the more accurate the predictions will be for the end of the growing season. The choice between making a decision early on based on the information available or waiting for better information while losing some opportunities is the subject of economic theory, including the real options theory (Mun, 2002). While placing the motivation of updating the Bayesian model with data within strict economic theoretical context is beyond the scope of this paper, we examine how adding further time points to the data will affect the accuracy of the estimation.
We hereafter illustrate how the Bayesian framework is applied in order to fit a double sigmoid growth model to the grape bunch mass data collected from a commercial vineyard in New Zealand Marlborough region (41 o 2'23"S; 173 o 51'15"E) over three growing seasons. While doing so, the sensitivity of the model to prior information is examined, using both the observed and simulated data, alongside the Value of Information (VoI) aspect of the dynamic modelling. We conclude by summarising our observations and listing the most promising directions for further work in this area.

MATERIALS AND METHODS
Data on grape bunch masses for the 2016/17, 2017/18, and 2018/19 seasons were collected from a commercial vineyard on the Wairau Plains (41 o 2'23"S; 173 o 51'15"E). For the 2016/17 and 2017/18 growing seasons, fifteen replicate plots were established in a single vineyard row. Each plot consisted of four vines planted 1.8 m apart. A shoot with two bunches was randomly selected from each plot on approximately a weekly basis, starting at flowering and continuing until shortly before the commercial harvest. In the 2018/19 the number of plots was increased to 30 and sampling continued as before. Bunches were individually bagged and taken to the laboratory for weighing. There were a total of one, five, and 63 missing data points for the seasons 2016/17, 2017/18, and 2018/19 respectively.

Bayesian Inference
Before describing the details of the model, we would like to quickly review the basics of Bayesian Inference. Consider the data y and the likelihood g(y) and the prior distribution for the parameter of interest g(θ), which describes our understanding of the probable parameter values before the data are observed. The prior distribution incorporates the prior information available to the researcher and may be based on the general understanding of the phenomenon or on prior experience and expert opinion. When no information is available, the prior distributions tend to be very wide and are often called vague. Because the choice of prior may be deemed somewhat subjective, a sensitivity analysis is often performed to determine to what extent it affects the modelling result. The posterior distribution for the parameter may be obtained from the Bayes Theorem as follows: Because the above derivation can rarely be done analytically, computationally intensive Markov Chain Monte Carlo (MCMC) methods are used to produce samples from the posterior distributions, for which summary statistics (such as posterior mean and posterior 95 % credible intervals) can then be obtained.
If the data y can be divided into two batches, y 1 and y 2 , the equation above can be rewritten as The posterior distribution of θ after observing the first batch y 1 thus becomes the prior distribution for the experiment involving y 2 , leading to sequential updating.
A posterior predictive distribution for a new (future) observation given the data y can then be obtained as Again, numeric methods are customarily used to obtain a sample from such a distribution, which can then be summarised via sample statistics such as mean and quantiles. We now turn to the double-sigmoidal growth model.

Double sigmoidal growth model
Let Yi describe the bunch mass observed at time xi , i = 1...n, where n is the total number of observations. In order to guarantee a nonnegative response and control for heteroscedasticity apparent from Figures 1a, 1b, and 1c, it is common to assume that the logarithm of mass has a Gaussian distribution: where τ is the precision or inverse variance, and the mean expected log-mass μ i can be modelled via the double logistic curve as follows: where α 0 and α 1 describe the first and second asymptotes respectively, β0 and β1 are the inflection points, and γ 0 and γ 1 are the slope parameters. Figure 1 illustrates the role of these parameters further. In order to analyse the model within the Bayesian framework, we need to provide prior distributions for these parameters. It should be noted that we expect both slopes γ0 and γ1 to be positive. We also do not expect the second asymptote to be less than the first; i.e., α 1 ≥ α 0 . Note that when α 1 = α 0 , the second term in Equation 2 disappears and the double sigmoidal curve collapses to a single sigmoidal curve. In order to reflect these restrictions, the following priors are assumed: Where TN(μ, τ, L) denote a normal distribution with mean μ and precision τ left-truncated at L. In order to have a priori independent parameters, the model can be reparametrized in terms of Δα = α 1 -α 0 and Δβ = β 1 -β 0 , and the priors in Eq.
(4) and Eq. (6) recast as follows: The analysis consisted of two parts. In the first part, for each growing season, the model was fitted to the grape bunch data available after the first, second, and up to the final (fifteenth) observation time, assuming independent residuals over time, due to the destructive measuring procedure. This was done in order to investigate the trade-off between an earlier prediction and prediction accuracy, as well as to look at the growth curves fitted after one season.
In the second part, four different sets of priors were taken into account when analysing the bunch mass data for the 2018/19 growing season in order to examine the effects of incorporating historical data into the modelling process. The weakly informative priors were based on the general expectations of the shape of the grape growth curve and had high variance, and thus small precision. The more informative prior distributions were obtained by using the means and variances of the posterior sample's Rory Ellis et al.  (2015)). In each case, 10 5 iterations were run after a 2 × 10 4 burn-in and the convergence was visually assessed.

Simulation studies
In order to examine the effect of prior assumptions on yield prediction, as well as to investigate the value of information in the context of yield prediction, 100 data sets were simulated based on the parameters drawn from the posterior distribution, produced after fitting the model to the 2018/19 season's bunch mass data. The simulated data consisted of the mass of 15 grape bunches being measured at 15 time points throughout the growing season. The model described in Equations (1)-(11) was then fitted with the four different priors described in ! The posterior estimated mean growth curves for the grape bunches in the three seasons examined are shown in Figure 2. While all the models show a reasonable fit, there are a couple of aspects worth noting. A clear phase II and double sigmoid curve is apparent in the 2017/18 season between days 25 and 50 (15th January to 8th February 2018) and to a lesser extent in 2018/19. This was not observed in 2016/17, for which the resulting growth appears to be a simple sigmoid curve. Some of these differences may be explained by the seasonal differences in average Rainfall during fruit ripening resulted in the onset of Botrytis cinerea leading to a sharp deterioration of bunch mass in 2017/18 ( Figure 2b). The data for the last observational time point were therefore not included in the model, because the sharp decrease in bunch mass would have affected the harvest mass estimates produced from the Bayesian model for that particular year; here, the standard double sigmoidal growth curve struggles to accommodate the sharp rise in the middle of the season.
The posterior estimates for the average bunch mass were also obtained after adding the data as it became available consecutively to the model along with the various sets of priors. The resulting mean absolute errors are shown in Figure 3. The value of information is evident here, along with the importance of having informed priors, particularly at the starting of the growing season. This is seen where the prediction errors for the model with vague priors are quite poor and erratic, although they do improve markedly once five or more data points are added (23 rd January 2019). Figure 4 shows the posterior mean curves and associated 95 % credible bounds, which were estimated after fitting the model to the entire 2018/19 bunch mass data with the vague, the 2017, the 2018, and the 2017 + 2018 priors respectively. The results show that there is some apparent sensitivity to the choice of priors at different points in the growing season. For example, for the results from the incorporation of two years of priors, the estimation bounds are somewhat larger than those from the other three sets of priors. However, this switches to having lower estimations for the bunch masses at the end of the growing season. Figure 5 compares the posterior estimates for the final bunch masses of the 2018/19 bunch mass data, using each of the three informed sets of priors (2017, 2018, and 2017 + 2018). When estimates are made using two years of data, they become more consistent earlier in the growing season, after about half of the data has been included in the Bayesian model. However, the means of these estimates appear to be lower than Rory Ellis et al.  the actual final yield, as well as the estimates made from the other two models (2017, 2018).

Simulation studies
For each of the 100 simulated datasets, the posterior mean bunch masses at harvest (273 rd day) and the associated 95 % credible interval were evaluated a priori and after each of the 14 observation time points. The results are shown in Figures 6, 7, 8 and 9. The informative priors naturally have narrower credible intervals to begin with. As the season progresses and more observations are collected, the intervals become narrower and the predictions for the average FIGURE 5. The predicted final bunch masses for the 2018/19 growing season, compared over the 3 sets of informed priors. The 5 % and 95 % estimation bounds are included for each set of priors.
The black horizontal line represents the observed average final yield for the 2018/19 growing season.

FIGURE 6.
Posterior mean masses at harvest (273rd day) and the associated 95 % CIs for the 100 simulated datasets modelled using the vague prior a priori and after the 1st (18th December), 6th (31st January) and 15th (31st March) observation time points respectively.

FIGURE 7.
Posterior mean masses at harvest (273rd day) and the associated 95 % CIs for the 100 simulated datasets modelled using the 2017 prior a priori and after the 1st, 6th and 15th observation time points respectively.
The red line indicates equivalence. bunch mass at harvest become individually dependent on the dataset and closer to the black line, indicating a perfect guess. It is worth noting that the prior based on the 2017/18 growing seasons, which was very different from the 2016/17 one, is overly conservative and is not swayed by the data, resulting in relatively poor predictions, even at the end of the season. The model based on priors informed from both of these years of data suffers even more from this, as the increased precision on the asymptote parameters (α 0 and Δα) keeps the means of the posterior estimates more fixed and with tighter bounds. The model with priors informed using two seasons of bunch mass data tended to overestimate the bunch masses with less data, and underestimate with all the data.
The resulting mean absolute error, averaged over the 100 datasets, is shown in Figure 10. Here, the vague prior only starts to perform just as well as the other priors once approximately 14 of the 15 days of data have been included, which is not suitable given the desire to produce a model which can produce reliable estimates early in the growing season. However, the results shown in Figures 10 and 11 suggest a high consistency throughout the growing season when using the other three priors.

DISCUSSION
Bayesian methods are capable of systematically incorporating prior knowledge. This feature is especially relevant to viticulture and to grape growth modelling, since there is substantial, often vineyard-specific, expert knowledge available. The ability of the Bayesian framework to seamlessly update model estimates as new data comes in is especially useful given the dynamic nature of the phenomenon being modelled. Starting out with a yield estimate based on historical data, and perhaps a general weather forecast for the coming season, and Rory Ellis et al.  revising that estimate as new information becomes available is a goal well-worth achieving.
The model examined in this study considers the bunch mass to be a function of time only. This is clearly an oversimplification, since plant growth in general, and grape growth in particular, is known to be affected by temperature, usually expressed as growing degree days (Coombe, 1986). Wang and Engel (1998) introduced a function describing the relationship between the daily temperature and the daily growth date and applied it to wheat growth, whereas, for example, Parker et al. (2011) andParker (2012) extended these ideas to model phenological stages of grape development. Therefore, our model may be improved by replacing days with the growth rate expressed as a function of growing degree days. Climate variables other than temperature, such as the amount of solar radiation, may also influence grape growth and development (Dokoozlian and Kliewer, 1996;Bergqvist et al., 2001;Fernandes de Oliveira and Nieddu, 2016). Additional variations may arise due to characteristics of the land, such as soil and topography Bramley et al., 2011), as well as management practices. When available, this information can be incorporated into the framework by hierarchically modelling the growth curves parameters. Thus, for example, the inflection points β 0 and β 1 can be construed as functions of climate and spatial covariates.
As a way of emphasising this, we took into consideration climate information for the region in which the data was collected. Meteorological data was sourced from the National Institute of Water and Atmospheric Research at the Marlborough Research station 1.1 km south west of the trial site (49 o 21'51'S; 173 o 47' 56"E). Figure 12 demonstrates the temperature anomalies for the three growing seasons relative to the last 10 years. The accumulated temperature experienced in the 2016/17 growing season was lower than in the other seasons. Above-average temperatures were noted in the 2017/18 season over the flowering period (as indicated by the slope of the deviation from the long-term mean), while the other two seasons were close to the long-term average over flowering. Further heatwaves were experienced in both the 2017/18 and 2018/19 seasons at certain times in the period between flowering and véraison (Salinger et al., 2019;Salinger et al., 2020). The dates of flowering and véraison were estimated using the Grapevine Flowering Véraison (GFV) (Parker et al., 2011). This appeared to have more of a direct impact on the yield for the 2017/18 seasons, shown by the increase in the average bunch mass measurements. Figure 13 shows the accumulated rainfall for the three growing seasons. It can be seen that the 2017/18 growing season experienced higher amounts of rainfall during the course of the growing season. The 2018/19 season is typified by extended periods of no rainfall in the second half of the season. This may explain the very limited amount of growth  in the bunch masses seen after the period of véraison in this particular case.
It can be seen in Figure 2a that the 6-parameter double sigmoidal curve was not a good fit for the grape growth observed in the 2016/17 season. Archontoulis and Miguez (2015) have reviewed a wide variety of nonlinear regression models used in agricultural research and they have made comparisons between various specifications of sigmoid functions. Of particular interest to us are the Richards model (Richards,1959), the Gompertz model (Gompertz, 1825) and the Weibull curve (Weibull, 1951), all of which we intend to compare with our current implementation of the double sigmoidal model. It is also worth noting another more recent model developed by Yin et al. (2003), which involves incorporating the maximum growth rate of the fruit to help calculate fruit mass.
The current model assumed conditionally independent priors and was estimated via the standard Metropolis-Hasting sampler (Gelman et al., 2014). Taking joint prior specifications into account will increase the flexibility of the model, allowing a wider range of expert opinion formulations to be adapted, and may also improve the efficiency of the algorithm.
Finally, applying the resulting range of parametrisations to a wider spatio-temporal set of data will improve our understanding of the modelling framework, and ultimately produce a better tool for early grape yield prediction.

CONCLUSION
In this study, we illustrated the use of a Bayesian framework for fitting a standard double sigmoidal growth curve to the Sauvignon blanc grape bunch mass data collected over the 2016/17 2017/18 and 2018/19 growing seasons in Marlborough, New Zealand. We also performed a simulation study to investigate both the sensitivity of the model to prior assumptions and the value of information. The latter refers to the role of additional consecutive observations throughout the season in improving the accuracy of the estimation of the grape bunch mass at harvest.
The results from this analysis show that the model is sensitive to prior assumptions made for the parameters of the double sigmoidal model. From an early yield prediction perspective, the incorporation of non-informative (vague) priors to the model resulted in poor results, only becoming similar to the results seen from models using more informed priors once half of the growing season data was incorporated into the Bayesian model. This led to the conclusion that some information about the parameterisation of the double sigmoidal model is influential in producing useful results.