Grapevine bunch weight estimation using image-based features: comparing the predictive performance of number of visible berries and bunch area

Recent advances in machine vision technologies have provided a multitude of automatic tools for recognition and quantitative estimation of grapevine bunch features in 2D images. However, converting them into bunch weight (BuW) is still a big challenge. This paper aims to compare the explanatory power of the number of visible berries (#vBe) and the bunch area (BuA) in 2D images, in order to predict BuW. A set of 300 bunches from four grapevine cultivars were picked at harvest and imaged using a digital RGB camera. Then each bunch was manually assessed for several morphological attributes and, from each image, the #vBe was visually assessed while BuA was segmented using manual labelling combined with an image processing software. Single and multiple regression analysis between BuW and the image-based variables were performed and the obtained regression models were subsequently validated with two independent datasets. The high goodness of fit obtained for all the linear regression models indicates that either one of the imagebased variables can be used as an accurate proxy of actual bunch weight and that a general model is also suitable. The comparison of the explanatory power of the two image-based attributes for predicting bunch weight showed that the models based on the predictor #vBe had a slightly lower coefficient of determination (R2) than the models based on BuA. The combination of the two image-based explanatory variables in a multiple regression model produced predictor models with similar or noticeably higher R2 than those obtained for single-predictor models. However, adding a second variable produced a higher and more generalised gain in accuracy for the simple regression models based on the predictor #vBe than for the models based on BuA. Our results recommend the use of the models based on the two image-based variables, as they were generally more accurate and robust than the single variable models. When the gains in accuracy produced by adding a second image-based feature are small, the option of using only a single predictor can be chosen; in such a case, our results indicate that BuA would be a more accurate and less cultivardependent option than the #vBe.


Grapevine bunch weight estimation using image-based features: comparing the predictive performance of number of visible berries and bunch area
Carlos M. Lopes* 1 and Jorge Cadima 2 1 LEAF -Linking Landscape, Environment, Agriculture and Food -research centre, Associated Laboratory TERRA, Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, 1349-017 Lisboa 2 Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL); Instituto Superior de Agronomia, Universidade de Lisboa. Recent advances in machine vision technologies have provided a multitude of automatic tools for recognition and quantitative estimation of grapevine bunch features in 2D images. However, converting them into bunch weight (BuW) is still a big challenge. This paper aims to compare the explanatory power of the number of visible berries (#vBe) and the bunch area (BuA) in 2D images, in order to predict BuW. A set of 300 bunches from four grapevine cultivars were picked at harvest and imaged using a digital RGB camera. Then each bunch was manually assessed for several morphological attributes and, from each image, the #vBe was visually assessed while BuA was segmented using manual labelling combined with an image processing software. Single and multiple regression analysis between BuW and the image-based variables were performed and the obtained regression models were subsequently validated with two independent datasets. The high goodness of fit obtained for all the linear regression models indicates that either one of the imagebased variables can be used as an accurate proxy of actual bunch weight and that a general model is also suitable. The comparison of the explanatory power of the two image-based attributes for predicting bunch weight showed that the models based on the predictor #vBe had a slightly lower coefficient of determination (R 2 ) than the models based on BuA. The combination of the two image-based explanatory variables in a multiple regression model produced predictor models with similar or noticeably higher R 2 than those obtained for single-predictor models. However, adding a second variable produced a higher and more generalised gain in accuracy for the simple regression models based on the predictor #vBe than for the models based on BuA. Our results recommend the use of the models based on the two image-based variables, as they were generally more accurate and robust than the single variable models. When the gains in accuracy produced by adding a second image-based feature are small, the option of using only a single predictor can be chosen; in such a case, our results indicate that BuA would be a more accurate and less cultivardependent option than the #vBe. k e y w o r d s berry number, bunch attributes, model validation, regression model, Vitis vinifera L., yield components, yield estimation.

INTRODUCTION
The benefits that an accurate vineyard yield estimation can bring to the entire grape and wine production chain are well-known (Clingeleffer et al., 2001;Whitty et al., 2017). Among others, the following applications of vineyard yield estimation should be highlighted: planning cluster thinning needs (in order to prevent excessive production and consequent poor wine quality); planning and organisation of the harvest (labour, equipment, others); planning cellar needs (scheduling grape intake, allocating tank space, purchasing tanks, barrels, oenological products, bottles and others); planning purchases and/or grape sales; establishment of grape prices and management of wine stocks; management of grape and wine market and programming investments and development of marketing strategies. This multiplicity of potential applications and benefits means that yield estimation and forecasting play an essential role in viticulture and the wine industry.
Of the several methods that can be used to estimate vineyard yield, those based on the estimation of yield components (e.g., number of berries or bunches and bunch weight) are the most widely used (Whitty et al., 2017). These methods can be used at different phenological stages, from bud break to harvest. One of the best examples is the method based on the counting of bunches in canopy sample segments, combined with bunch weight evaluation obtained by destructive sampling at onset of veraison (lag phase) (Clingeleffer et al., 2001;Martin et al., 2003;de la Fuente et al., 2015). Based on historical data, bunch weight at harvest can then be obtained using a berry growth factor and yield at harvest estimated by multiplying the estimated number of bunches by the estimated bunch weight at harvest (Clingeleffer et al., 2001). However, besides being destructive, this practice is manual, and thus very labour-intensive, and it can prove inaccurate, as it extrapolates for the entire vineyard the assessment performed on a small percentage of bunches.
Recent advances in machine vision and machine learning technologies have provided a multitude of automated tools for recognition and quantitative estimation of several yield components, mainly bunch and berry traits in 2D images (for a review see Seng et al., 2018). The images are taken using RGB digital cameras deployed on terrestrial vehicles, and they are then processed using machine vision technologies in order to differentiate the yield components (e.g., shoots, flowers, bunches and berries) from other canopy components.
After an accurate segmentation, multiple traits and metrics can automatically be extracted from the yield components images, which can then be used as proxies for vineyard yield estimation.
Total berry number is also considered a very good vineyard yield predictor as it includes variations in the number of bunches and in the number of berries per bunch (Clingeleffer et al., 2001;Whitty et al., 2017). The detection and counting of visible grape berries in 2D images (#vBe) is nowadays feasible by automated non-invasive image analysis and machine learning techniques (e.g., Nuske et al., 2014;Diago et al., 2015;Rose et al., 2016;Aquino et al., 2017;Aquino et al., 2018;Pérez-Zavala et al., 2018;Zabawa et al., 2020); for example, when using neural network semantic segmentation, Zabawa et al. (2020) achieved between 85.6 % and 94.0 % accuracy (depending on the training system) for non-invasive berry detection. However, as only a fraction of the berries is visible in an image, estimating actual total berry number per bunch (#Be) would require hidden berries to be estimated using modelling techniques to extract 3D information from 2D images (Schöler and Steinhage, 2015;Rist et al., 2018;Xin et al., 2020;among others) or, alternatively, the use of statistical models based on relationships obtained between #vBe and #Be (Liu et al., 2020). Once #Be has been estimated, yield can be estimated using an equation previously obtained from a linear relationship between #Be and actual yield (Nuske et al., 2014), or by multiplying #Be by an estimation of berry mass, which can be obtained using automated 2D and 3D image post-processing techniques Ivorra et al., 2015;Rose et al., 2016;Rist et al., 2018;among others).
Regarding the bunches, several studies have demonstrated that the 2D image bunch area (BuA) -expressed in pixels -is well correlated with bunch weight (Dunn and Martin, 2004;Diago et al., 2012;Font et al., 2015, Hacking et al., 2019Victorino et al., 2020); this allows regression models to be obtained that show a good fit between estimated and actual yield in conditions of no bunch occlusions by the canopy and/or other bunches. For example, using RGB images and a supervised classifier based on the Mahalanobis distance to assess the yield of fieldgrown grapevines, Diago et al. (2012) obtained a determination coefficient of 0.76 for the relationship between the actual yield in the images and the corresponding number of pixels.
Bunch volume is also a feature that has been much explored in recent studies using several machine vision and modelling techniques to extract 3D information from 2D images (Nuske et al., 2014;Font et al., 2015;Herrero-Huerta et al., 2015;Ivorra et al., 2015;Tello et al., 2016;Rist et al., 2018). However, the automatic extraction of the volumetric information is a much more complex computational task than the extraction of the bi-dimensional projection of the bunch.
Despite the aforementioned advances in machine vision, 3D modelling and other methods for the automated estimation of grapevine yield components, to obtain a general model to convert image-visible bunch attributes into bunch mass is still challenging. This is mainly due to the naturally large variation that can be found in bunch architecture and compactness among cvs. .
The aim of this paper is to assess and compare the explanatory power of the two most used image-based bunch attributes (#vBe and BuA) in order to predict grapevine bunch weight at harvest. 2D image-based bunch attributes were extracted in laboratory conditions by visual photointerpretation (#vBe) and manual segmentation (BuA) from 300 images of four grapevine cultivars. These variables were then used to find accurate predictive models for bunch weight, which can also be useful in assisting the algorithms used for vineyard yield estimation based on fully automated image analysis.

Site and plant material
The bunches used in this study for the training datasets were picked during the 2015 vintage in two vineyard plots at Lisbon University's Instituto Superior de Agronomia experimental vineyard, located in Lisbon, Portugal, within the Lisbon Winegrowing Region (38º 42' 27.5'' N, 9º 10' 56.3'' W and 62 m above sea level). The climate is of Mediterranean type with Atlantic influence. The 2015 growing season was a dryer season (136 mm from March to end of September). The soil is a clay loam with 1.6 % organic matter and a pH of 7.8. On the white vineyard plot (9 year-old vines), the cultivars 'Viosinho' (VI) and 'Alvarinho' (AL) were used; they were cultivated side by side and grafted into 1103 Paulsen rootstock with a 1.0 m within row spacing and 2.5 m spacing between north-south oriented rows.
On the red vineyard plot (16 year-old vines), the cultivars 'Syrah' (SY) and 'Touriga Nacional' (TN) were used; they were cultivated side by side, grafted onto 140 Ru rootstock and spaced 1.2 and 2.5 m within and between north-south oriented rows, respectively. All the vines were trained on a vertical shoot positioning system with two pairs of movable wires, and they were spur-pruned on a Royat cordon system (unilateral for the white cv. plots and bilateral for the red plots). All vines were uniformly pruned by keeping 12-14 and 16-18 nodes per vine on the white and red cvs. respectively. As opposed to the red vineyard plot (rainfed), the white plot was drip-irrigated with undervine irrigation lines consisted of pressure compensating 2.5 L/h emitters at 1.0 m spacing (one per vine positioned between two adjacent vines). Irrigation started at the beginning of June (at fruit set) and stopped one week before harvest (around mid August). Watering was carried out twice a week, thus replacing 50 % of crop evapotranspiration. For the remaining vineyard practices, similar standard cultural practices were applied in both vineyard plots, with canopy management practices comprising desuckering, shoot positioning and summer trimming to about 1.2 m above the cordon.
With the aim of encompassing the maximum bunch weight variability, for each cv., all the bunches of five representative vines were picked individually at commercial maturity, then labelled and transported to the laboratory for detailed assessments (e.g., Figure 1). Total number of bunches per cv. ranged from 75 (AL) to 98 (VI).

Laboratory Assessments
In the laboratory, each bunch was positioned in a random position in front of a white background and imaged with a corresponding tag, using a digital RGB camera located at a constant distance from the bunches. Then actual bunch weight (BuW) and volume (BuV) were assessed using a table scale (KERN FCB v1.4) and the water displacement method respectively. The bunches were then destemmed by hand and the following attributes were assessed per bunch: number of berries (#Be), total berry weight (BeW) and rachis main axis length (RaL). Average weight per berry (avBeW) and a bunch compactness (BuC) index were derived (Table 1) from the obtained data.
The images were manually labelled using ImageJ® software and the total area of grapes was computed in pixels and converted into cm 2 of bunch area using the tag dimensions (8.0 x 1.8 cm coloured rectangle) to calculate the conversion ratio. Furthermore, the number of visible berries in each image (including partially hidden ones) was visually assessed and manually labelled and counted.
For model validation, two independent bunch datasets were used; one from the white cv. 'Viosinho' collected in the 2014 season (VI14; n = 100) and the other from the red cv. 'Cabernet-Sauvignon' collected in the 2015 season (CS15; n = 109). Both validation datasets were collected in the same vineyard plots and subjected to similar cultural practices to those described for the training datasets.

Data analysis
As total number of bunches varied among all cvs., in order to obtain a dataset with an equal number of bunches, the lowest number of bunches (75, cv. 'Alvarinho') was used and a randomised selection applied to the other three datasets. A database was built by adding some calculated variables derived from measured variables. Table 1 shows the final set of variables divided into two groups.
A correlation analysis was carried out to assess the relationships between all pairs of variables. In order to obtain regression models to estimate BuW, single and multiple regression analyses between BuW (response variable) and the image-based variables (predictor variables) were taken into account. As there was a slight curvilinearity in the relationships and the constant variance assumption required for linear regressions was violated, a logarithmic transformation of both variables was applied (Keene, 1995). The regression models were fitted to the pooled dataset of the four cvs. and subsequently validated with two independent datasets. Simple linear regressions with either log-number of visible berries or log-bunch area were fitted, as was a multiple linear regression with both these predictors. Standard t-tests (p < 0.05) were carried out for coefficients significantly different from zero in the multiple linear regressions of log-transformed variables to assess whether adding the second predictor was significant in each case. This is equivalent to carrying out a partial F test comparing the R 2 values in the two-predictor model and in each simple linear regression.
Partial F tests were carried out to compare the pooled models with ANCOVA-type models using cv. as an additional factor (which produces the same fitted equations as models using only the data for each cv.). However, as the goal was to produce a robust model for general use with any cv., rather than a cultivar-specific model, only the pooled models were validated.
Despite the fact that log-transformations of the variables were used to fit the models, goodness of fit was assessed by comparing the values predicted by the pooled models (P i ) to the observed The following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991), with , 1991), with and um value 1 indicating cal software (R Core denoting the mean values of predicted and observed bunch weights: • mean absolute error: following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991) (1) • normalised mean absolute error: following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991), with and ting the mean values of predicted and observed bunch weights: ( elling efficiency is a dimensionless indicator, with values close to the maximum value 1 indicating quality fits. All the statistical analysis were performed using the R statistical software (R Core , 2020). (2) • root mean square error: The following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991) Modelling efficiency is a dimensionless indicator, with values close to the maximum value 1 indicating good quality fits. All the statistical analysis were performed using the R statistical software (R Core Team, 2020).
• percent bias: following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991), with and ting the mean values of predicted and observed bunch weights: elling efficiency is a dimensionless indicator, with values close to the maximum value 1 indicating quality fits. All the statistical analysis were performed using the R statistical software (R Core , 2020). (4) • modelling efficiency: following deviance measures were used (Schaeffer, 1980;Loague and Green, 1991), with and oting the mean values of predicted and observed bunch weights: delling efficiency is a dimensionless indicator, with values close to the maximum value 1 indicating d quality fits. All the statistical analysis were performed using the R statistical software (R Core m, 2020).

(5)
Modelling efficiency is a dimensionless indicator, with values close to the maximum value 1 indicating good quality fits. All the statistical analysis were performed using the R statistical software (R Core Team, 2020).

Variability and correlations of bunch attributes
The boxplots with descriptive statistics of the actual and image-based bunch attributes for the four cvs. show considerable variability in all the assessed bunch attributes in all training datasets ( Figure 2). Regarding BuW, the VI dataset had the highest median value, while TN showed the lowest (Figure 2A). #Be was very similar among cultivars with medians ranging from 79 for TN to 94 for VI ( Figure 2B).
The avBeW variable was found to have much higher variability among cvs., with the highest median value found for VI, closely followed by SY, and the lowest for TN ( Figure 2C). The highest BuV median was also found for the cvs. VI and SY and the lowest for TN ( Figure 2D). The highest RaL median was found for cv. SY, closely followed by VI and TN, and the lowest for AL ( Figure 2E). The highest median for number of berries per rachis length -an objective measure of bunch compactness (Hed et al., 2009;Tello and Ibáñez, 2014) -was found for cv. AL, while the other three cvs. ( Figure 2F). Of the actual bunch attributes, BuW, #Be and BuV showed the highest variability, with the SY and AL datasets having the highest coefficient of variation (CV) (from 42.5 % to 46.6 %), and VI the lowest (from 35.1 to 37.8 %). The avBeW had the lowest variability, with the highest CV being found for TN (16.8 %) and the lowest for AL (8.5 %). CV was intermediate for the variables RaL and BuC.
Regarding the image-based attributes, the relative differences among cvs. for the variables #vBe and BuA mirrored those described for #Be and BuW respectively ( Figures 2G and 2H). The cv. SY showed the highest median for the derived variable percentage of visible berries (%vBe), closely followed by TN, while VI had the lowest median and AL an intermediate value ( Figure 2I).
Regarding the CV, #vBe and BuA had the highest variability, with the SY dataset having the highest CV (~ 38 %) and the other three cultivars showing a CV of similar magnitude (ranging from 30.1 to 32.9 % ).
The %vBe variable had the lowest variability, with SY having the highest CV (19.0 %) and VI the lowest (12.3 %).  In terms of the correlations between the variables extracted from the 2D images (BuA and #vBe) and BuW, very high, positive and significant (p < 0.001) r values were found. The correlation between the derived variable %vBe and BuW showed a lower and negative r value (Table 2).
Besides the correlations with BuW, it is worth highlighting some other relationships, mainly those between the variables extracted from 2D images. There were very strong, positive and significant (p < 0.001) correlations between the variable BuA and the variables #Be, BuV, RaL and #vBe. Very high, positive and significant correlations were also found between the variable #vBe and the variables #Be, BuV and RaL. Furthermore, significant (p < 0.01) but negative correlations were found between %vBe and all the other assessed variables (Table 2A).
The correlation coefficients between the logtransformed variables (Table 2B) were in general of higher magnitude than between the original pairs of corresponding variables, indicating stronger linearity in the relationships within the log-transformed data; this is a result that will be subsequently used.
In general, the correlation analyses performed separately per cv. (n = 75) yielded results that mirror those reported in Table 2 for the pooled data. However some differences among cvs. were observed in terms of the magnitude and significance of the correlation coefficients linked to BuW ( Figure 3A). Regarding actual bunch attributes, the r value was much lower (0.17) and non-significant when correlating avBeW with BuW in the SY dataset compared to the other three cv. datasets, in which higher and significant r (p < 0.001) was obtained for the same variables. The opposite result was obtained for RaL in correlation with BuW: the SY dataset showed the highest r value and VI the lowest. Regarding the correlation of image-based variables with BuW, the r value was very high and significant (p < 0.001) for all four cv. datasets, with the highest r (0.95) for BuA in cv. AL and the lowest (0.82) for #vBe in cv. TN ( Figure 3A).

Testing the use of visible berries on the image as a proxy of actual number of berries
The average values regarding the proportion of visible berries on the 2D images were similar for VI and AL (49.0 and 50.5 % respectively), but they were lower than those obtained in the TN and SY datasets (55.8 and 57.5 % respectively).
When creating the scatterplot fitted with linear regressions of the #Be (response variable) over #vBe (predictor variable), a linear relationship between both variables appeared acceptable for all datasets, despite there being a slight curvature. However, the constant variance assumption underlying the regression analysis was violated. The logarithmic transformation of both variables stabilised the variance and improved the linearity in the four datasets ( Figure 3B). Table 3 shows the fitted regression models of the log-transformed data, with very high and significant coefficients of determination (R 2 ) in all cases. Figure 4A shows a scatterplot of the variables BuW and #vBe for all training datasets, showing an increase in the scatterpoint dispersion as  the values of both variables increase, as well as a slightly curvilinear trend. The violation of the constant variance assumption was noticed for all datasets, and therefore the logarithmic transformations of both variables were adopted to stabilise variances and improve linearity ( Figure 4B). The fitted regression models ( Table  4) show that the highest R 2 was obtained for the AL model and the lowest for the TN and pooled models. Standard t-tests on the ANCOVA model, which allows each cultivar to have its own linear regression between ln(BuW) and ln(#vBe), indicate that the differences between the cultivarspecific slope (exponent in Table 4) for cv. AL and the remaining cvs. are slightly significant (0.02 < p < 0.10), but they are not significant for the other cvs. Figure 5A shows the scatter plot of BuW and BuA for all training datasets. Visually, there is an increase in the scatterpoint dispersion as the value of both variables increases and also a slight curvilinear trend. Indeed, the constant variance assumption of the linear regression model relating BuW (response variable) and BuA (predictor variable) appears unrealistic, as was also observed with #vBe relationships, in particular for the VI and SY datasets. In order to improve linearity and stabilise variances, the logarithmic transformation of both variables was used in all cases, thus ensuring comparable models for all four datasets and for the pooled data ( Figure 5B).   Table 5 shows the R 2 values for the linear regressions of ln(BuW) on ln(BuA), as well as the fitted power relationships that result from converting back to the original (non-logarithmic) scales for both variables. In all four datasets, R 2 was very high and significant. The ANCOVA model allowing each cv. to have its own relationship indicates that the slopes (exponents in Table 5) for the cvs. AL and VI are significantly different (p = 0.0025), with less or non significant differences among other pairs of cvs.

Bunch area in the 2D image
The comparison of the R 2 values between the two single-predictor regressions (Table 4 vs Table 5) shows that, for three cvs., the best individual predictor of ln(BuW) is ln(BuA), the exception being VI.

Combining image-based explanatory variables
Based on the high performance of the two variables extracted from the 2D images (BuA and #vBe) a multiple linear regression model was used to estimate ln(BuW), with both log-transformed variables as predictors. As shown in Table 6, in general the R 2 values of the two-predictor regressions were significantly higher than those for the best single-predictor regressions, except in the case of the predictor ln(BuA) for SY and TN, which showed an R 2 of similar magnitude. While these improvements in the R 2 coefficient were significant, they were often modest and more pronounced for the models based on the predictor #vBe (gains ranging from 0.02 for VI to 0.10 for TN) than for the models based on BuA, (gains of

Model Validation
In order to test the potential of the above fitted pooled models by applying different conditions and cultivars than those used during the fitting process, a validation was performed using two independent datasets (VI14 and CS15). Figure 6 shows the scatter plots of the observed vs fitted values (on the original scale) of #Be ( Figure 6A) and BuW ( Figures 6B, 6C and 6D) using the fitted models for the pooled data (Equations 10, 15, 20 and 25 respectively). It can be observed that overall there is a good agreement between the observed and fitted values for the two datasets, as well as some variability that increases with an increase in magnitude of the observed values.
Regarding the model that estimates #Be using #vBe as a predictor ( Figure 6A), the scatter points are very close to the 1/1 line, indicating a very good overall fit for both datasets. However, a closer look reveals an underestimation of #Be in the VI14 dataset and an overestimation in CS15 (less pronounced The scatterplot related to the model that estimates BuW based on the predictor #vBe ( Figure  6B, Equation 15) clearly indicates a different kind of model bias for each validation dataset, with an underestimation of BuW in VI14 and an overestimation in CS15. In the other two scatterplots involving the estimation of BuW ( Figures 6C and 6D), the dispersion of points in relation to the 1/1 line, seems to show that the model based on both predictors produces the best fit for both cvs.
The linear regression between observed (dependent variable) and estimated (independent variable) values obtained for each of the four pooled models (Equations 10, 15, 20 and 25) also shows very high and significant R 2 and an intercept and slope very close to 0 and 1 respectively (data not shown).
The results shown in the scatterplots with the fitted values from the models for BuW prediction are corroborated by the statistical validation measures, which generally indicate a fairly good fit (Table 7). When comparing the validation performance of the two single predictor models, it can be observed that, for the combined validation dataset (both cultivars), the model based on #vBe (Equation 15) has a higher MAE and NMAE, a similar RMSE and EF and a smaller Pbias, than the model based on BuA. When comparing the same single predictor models with each validation dataset, it can be observed that for VI14 the model based on the predictor #vBe showed smaller errors (MAE, NMAE and RMSE), an almost similar Pbias and a higher EF than the model based on the predictor BuA. For the CS15 dataset a better performance of the model based on the predictor BuA was observed in all the statistical measures of validation. It also becomes apparent that the much smaller Pbias of the combined validation dataset using predictor #vBe masks strong biases of opposite signs for each cultivar. Regarding the validation performance of the model based on the two explanatory variables (Equation 25), smaller TABLE 6. Fitted power law models resulting from the multiple linear regression of the logarithm of bunch weight (BuW, g) over two predictors: the logarithm of bunch area (BuA; cm 2 ) and the logarithm of the number of visible berries on the image (#vBe). Models are shown for each of the four cvs. (n = 75) and for the pooled data (n = 300). Values of the coefficient of determination (R 2 ) are relative to the linear regression in the log-scale. Values of the adjusted R 2 differ by, at most, 0.01. The exponent of predictor BuA is in all cases significantly different from zero (p < 0.001). The p-values in the standard t-tests for the exponents of #vBe are shown in the right column and are not significantly different from zero for 'Syrah' and 'Touriga Nacional'.   errors (MAE, NMAE and RMSE) and higher EF were observed in the combined validation dataset compared to the two single predictor models. When individual cultivars were considered, in the CS15 dataset the single-predictor BuA model performed slightly better than the model with both predictors, but there were clear advantages on all indicators in the VI14 dataset (Table 7).

Variability and correlations between bunch attributes
Bunch weight was one of the bunch attributes with the largest variability in all the four training datasets (Figure 2), which is due to the natural variability of bunch morphology among genotypes (Tello and Ibáñez, 2014 ;Tello and Ibáñez, 2018) combined with the inter and intravine variability within each cv. in a given vineyard (Rey-Caramés et al., 2015).
In general, the averaged assessed bunch attributes compare well to the descriptions found in cultivar catalogues (IVV, 2011), where VI, AL and TN are described as having short bunches (code OIV nº 202; OIV, 2007) with a medium compactness (code OIV nº 204), while SY bunches are described as being medium-sized with a loose to medium compactness. For bunch size, the average rachis length obtained in the training samples corroborate this general description, but a larger BuC was recorded for AL compared to the other cvs.; these are differences that may be due to the methods used to describe BuC. Indeed, while the OIV code for bunch compactness is based on a visual assessment of the way that berries are arranged in the bunch and to the amount of free space they leave (OIV, 2007), BuC was quantified as the number of berries per cm of main rachis, as proposed by Hed et al. (2009). This is considered a more objective criterion, as it avoids the subjectivity linked to the OIV visual assessment (Tello and Ibáñez, 2014).
The high and significant positive correlation coefficients between BuW and most of the assessed bunch attributes highlight their potential for use as proxies of actual BuW. Among the actual bunch attributes, BuV was the trait that correlated the best with BuW and was therefore the bunch feature with the highest explanatory power for BuW variability, as has also been reported by other authors for other grapevine cvs., sites and seasons (Nuske et al., 2014;Font et al., 2015;Hacking et al., 2019). This suggests that if machine vision and modelling techniques are able to accurately extract 3D information from 2D images, BuV could play a key role in automatic vineyard yield estimation. However, BuV is not easy to measure by automated image analysis, particularly in field conditions. So far, the novel automated methods for BuV estimation, besides being computationally complex, are mainly based on the 3D modelling of an entire bunch (Nuske et al., 2014;Font et al., 2015;Herrero-Huerta et al., 2015;Ivorra et al., 2015;Tello et al., 2016;Rist et al., 2018, among others), which is not very common in field conditions. Indeed, in most cases of field-grown grapevines, only a fraction of the bunches can be detected by image sensors. Besides the difficulties related to the natural irregularity of bunch architecture, 3D bunch modelling produces an apparent 3D volume corresponding to the morphological volume which, in general, overestimates the actual bunch volume, as it does not take into account the empty space between berries (Tello and Ibáñez, 2018).
The bunch compactness index used in this workone of the most widely used quantitative indices describing bunch density (Hed et al., 2009;Sabbatini and Howell, 2010;Pallioti et al., 2012;Lopes et al., 2020) -also had high and significant positive correlation coefficients with BuW, indicating that it too is a bunch feature that can play an important role in vineyard yield estimation. For example, it can be used as an auxiliary variable to improve the accuracy of the models used to convert the #vBe on the 2D images into actual number of berries. This is supported by our results, namely by the negative correlation observed between BuC and %vBe (Table 2), which indicates that the higher the BuC, the lower the fraction of visible berries. However, like BuV, BuC is a 3D feature which is not as easy to estimate via image analysis as are 2D features. Nevertheless, recent advances in imaging methods and 3D modelling techniques Ivorra et al., 2015, Tello et al., 2016Chen et al., 2018) makes them very promising tools for automating the assessment of BuC.

Number of visible berries
Actual number of berries, #Be, is a very important yield component for explaining seasonal yield variability at vineyard level (Clingeleffer et al., 2001;Whitty et al., 2017).
In our investigation, among the bunch assessed attributes, #Be showed the second best correlation with BuW (Table 2), confirming the importance of this yield component as a proxy of BuW. Therefore, an accurate automated estimation of this bunch trait will be an important contribution to vineyard yield estimation methods. Despite several image analysis methods having recently been developed for automating the detection and counting of visible berries in 2D images (e.g., Aquino et al., 2017;Zabawa et al., 2020), the estimation of #Be using image analysis is still a difficult and complex task, mostly due to berries being partially hidden by other berries. To solve this problem, several modelling techniques have been used to extract 3D information from 2D images (Sholer and Steinhage, 2015;Rist et al., 2018;Pérez-Zavala et al., 2018;Xin et al., 2020;Liu et al., 2020). However, these methods have the drawback of needing complex devices and being computationally demanding.
In the present study, a simpler approach was tested to estimate #Be, involving the use of statistical models based on relationships between #vBe and #Be. Our results showed that a very high percentage of #Be variability can be explained by the models that are based on the log-transformation of #vBe (Table 3), indicating a high goodness of fit for all single cv. datasets. The similar high goodness of fit showed by the model based on the pooled data of the four training datasets, seems to indicate that this model has the potential to become a general model for other cvs. Furthermore, when tested against the combined validation dataset, this pooled model showed very low error and bias and a very high modelling efficiency, indicating a very good fit (Mayer and Butler, 1993). However, it should be underlined that this high model performance is not likely to be achieved when using automated methods for berry detection in field conditions. While in this study the predictor variable (#vBe) was obtained manually in laboratory conditions and by the visual assessment of each 2D image of an entire bunch -and hence with almost no error -when using automated methods to detect and count visible grape berries in field 2D images, higher errors can be expected. Indeed, with manual detection it is possible to count all visible berries, even those that are very small or only partially visible; however, automated methods are much more prone to errors. These errors can result from several sources, like the conditions in which the images are acquired (view angle and illumination), the visible bunch fraction imaged, the algorithms used for berry counting and detection, and berry features, such as colour, diameter, circularity, density and homogeneity of distribution within the bunch Aquino et al., 2017;Buayai et al., 2021;Pérez-Zavala et al., 2018).
Considering the strong correlation between #Be and BuW (Table 2), accurate estimates of #Be, once obtained, can be used to predict BuW from a previously obtained linear relationship between the two variables (Nuske et al., 2014).
Regarding the direct use of #vBe to estimate BuW, the very high and significant determination coefficients obtained in all four single cv. regression models (Table 4; Equation 11 to 14), shows that the logarithm of #vBe explains a very high percentage of bunch log-weight variability. These results are in line with other reports (Nuske et al., 2014;Diago et al., 2015;Aquino et al., 2017;Aquino et al., 2018), confirming that the #vBe is a good predictor of bunch mass. The high R 2 values also obtained by the model based on the pooled data (Equation 15) indicate an almost similar goodness of fit as those for each single cv. models, suggesting that the former model has high potential for use with other cvs. However, as well as the significant differences detected in the slopes of the regression lines for individual cv. training datasets, the validation test also showed the pooled model to have a different performance for the two independent datasets (Table 7). This is likely due to the variability of bunch attributes, either among the cvs. (e.g., BuC = 9.7 in VI14 vs 7.3 in CS15) or among seasons for the same cv. (e.g., VI: BuC = 9.7 in 2014 vs 8.4 in 2015). Therefore, it seems that, despite the high goodness of fit and the good validation performance of the predicted model based on the pooled #vBe, its use as a generalised model across different cultivars and seasons could bring some uncertainty.

Bunch area
Among the two image-based variables, BuA had the strongest correlations with BuW, both for each cv. dataset (except for VI) and for the pooled data ( Figure 3A and Table 2). The very high and significant R 2 of all the single cv. regression models (Table 5) indicates that ln(BuA) accounts for a very high proportion of bunch logweight variability, confirming that this variable is a robust image-based explanatory variable of BuW. The good performance of BuA as yield predictor has also been observed in several studies at different sites and with other cvs., in vineyards where the canopy was manipulated to promote bunch exposure to the cameras (Dunn and Martin, 2004;Diago et al., 2012;Font et al., 2015;Hacking et al., 2019). The model based on the pooled data of the four training datasets (Equation 20) showed a similar goodness of fit to the single cv. regression models, indicating that it might work well across different cvs. Nevertheless, as would be expected of any generalist models, improvements could be made by recalibrating the models for individual cvs. The validation with two additional datasets performed better on all the statistical measures of validation for CS15 compared to VI14 (Table 7).

Combining the two image-based variables
As mentioned above, both single-predictor models for BuW evidenced strong goodness of fit (Tables  4 and 5). However, the higher R 2 of almost all the models based on the predictor BuA indicate that, with the exception of cv. VI, BuA is a better predictor than #vBe. When the two imagebased explanatory variables are combined in a multiple regression model for BuW estimation, R 2 values cannot be lower than those obtained in the submodels based on single predictors (Table 6). However, the increase in R 2 was not noticeable in all cases. While for models with the single predictor #vBe, adding the second variable increased the value of R 2 for all single cv. models, starting with the simple linear regressions based on the predictor BuA, only slightly increased R 2 (to two decimal places) in the case of the VI, AL and pooled models. This indicates that there was only a slight gain in accuracy when combining the two explanatory variables, and only with the single cv. models for VI and AL at the expense of a less parsimonious model. For SY and TN, there was no significant gain in using the two predictors over the single predictor BuA. Such contrasting results were also observed in the validation of the three BuW pooled predictive models, in which the model based on the two explanatory variables (Equation 25) performed better than each of the two single predictor models (Equations 15 and 20) in the VI14 dataset, but not in CS15. In CS15 the model based on BuA showed the best statistical measures of validation, while the model based on #vBe produced the least accurate predictions. These contrasting results underline the difficulties in choosing a single explanatory variable for all cultivars and seasons; nevertheless, when combining the two validation datasets, the multiple regression pooled model gave better validation results (except for Pbias) than the single variable pooled models. This improvement in model accuracy and greater robustness seem to indicate that the two image-based variables, when used together, can better integrate the variation in the bunch features related to number of berries and berry weight, the two yield components that determine bunch weight. However, it should be underlined that the gains in accuracy of the multiple regression model might not compensate the additional costs and work load that are required to measure the second bunch feature; i.e., either BuA or #vBe. Furthermore, in this study the models were obtained using manual detection and segmentation, thus an increase in errors is to be expected when using fully automated methods of measurement. It is likely that this increase will be higher for the multiple regression model than for the single variable models, as each segmentation process produces its own errors.

CONCLUSIONS
To obtain accurate predictive models for automated vineyard yield estimation, this paper compared the explanatory power of the two most used imagebased bunch attributes (number of visible berries and bunch area in 2D images) for predicting bunch weight. The models were fitted with data from four grapevine cultivars. The high goodness of fit of all the simple linear regression models indicates that either one of the image-based variables can be used as an accurate proxy of actual bunch weight. The goodness of fit showed by the models based on the pooled data indicates that a general model is also suitable. However, its application across different cultivars and seasons should be approached with caution, as indicated by some observed differences in the statistical measures of validation among the two independent datasets. Furthermore, the significant differences found when comparing the pooled models using cultivar as an additional factor indicate that cultivarspecific models can improve predictions.
The comparison of the explanatory power of the two image-based bunch attributes for predicting bunch weight showed that the models based on the predictor #vBe had a slightly lower R 2 than the models based on BuA, whether for each single cultivar (except for Viosinho) or for the pooled dataset. The combination of the two image-based explanatory variables in a multiple regression model produced predicted models with similar or noticeably higher R 2 than that obtained for single-predictor models. However, adding a second variable to the regression produced a higher and more generalised gain in accuracy for the simple regression models based on the predictor #vBe than for the models based on BuA.
Despite the aforementioned differences and some contrasting results obtained in the validation tests of the three BuW pooled predictive models, based on our results it is possible to recommend using models based on the two image-based variables because they were generally more accurate and robust than the single variable models. When the gains in accuracy produced by adding a second image-based feature are small, a single predictor should be used. In such a case, our results indicate that BuA would be a more accurate and less cultivar dependent option than the #vBe. Further work is needed in order to test these models in field conditions at earlier phenological phases and with datasets from other cultivars and seasons.